Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Archive for the ‘Information Value’ Category

by: Philsimon
23  Aug  2010

Resource Mistakes, Part I

As I continue to familiarize myself with the The MIKE2.0 Framework, one thing has become entirely apparent to me: it’s based in large part on having the right resources at the right time. In this very important sense, the MIKE2.0 Framework is the same as any other methodology for implementing new systems. In a new series of post, I’ll discuss some of the biggest mistakes that organizations make during information management projects (IM). In this post, I’ll cover timing as it relates to allocating resources.

Hurry Up and Wait

When I’m not writing, speaking, or chasing down tennis or golf balls, I’m typically on a consulting project. Like many people, I’m a hired gun available on a first-come, first-served basis. While there are certainly exceptions, most large organizations tend to struggle locking people like me down.

Consider the following example. Back in early June of this year, a firm for which I regularly subcontract (call it BU2B here) recently submitted me for a one year project for a large new system implementation. I didn’t hear anything for two months and assumed that either the project never started or that I wasn’t chosen. C’est la vie, right?

Wrong.

Fast forward to August 17th. I get a call from a recruiter at BU2B that its client needs to talk to me–today. Forget the fact that I am on site, billing my current client. BU2B tells me that this call has to happen today. I explain that that’s just not possible but that I’ll be free on the 18th for pretty much the entire day. Long story short: it has to be the 17th, even at night. Unable to make a firm commitment with a “burning plank” deadline, I have to pass.

Of course, this begs the questions:

  • Why wait two months to find key resources for such an important project?
  • What was going to be decided at 8:30 pm on Tuesday that couldn’t be decided at 8:30 am on Wednesday?
  • Why would an organization wait two months and then give a candidate two hours? Does this seem reasonable?
  • Does an organization really think that it’s getting the right or best resource with such a tight time line?
  • If this is the way that this company operates, would I really want to get on a plane every week and go there?

Trust me. This isn’t sour grapes talking. I’m very comfortable with rejection, especially since I went to a 70 percent male college. But does this story sound familiar?

Simon Says

Don’t wait until the last minute to find consultants and contractors, particularly as your project approaches key dates. Follow these guidelines and you can maximize the chance of a smooth transition and minimize the chance of scurrying at the last moment:

  • Let everyone know well ahead of time when projects are supposed to begin
  • Lock down resources well before those key dates
  • Identify backups just in case stuff happens
  • If an extension is necessary for an existing resource, attempt to arrange this as early as possible. Don’t wait until Friday morning to see if a key person is available on Monday.
  • By all means, don’t complain when that resource has found another gig

Tags:
Category: Information Development, Information Strategy, Information Value
2 Comments »

by: Philsimon
02  Aug  2010

Charlie Rose, Customer Service, and the Master Twitter Record

My post last week detailed the customer service-oriented frustrations that many of us face while dealing with large companies.Writing about it was cathartic and probably saved me at least one trip to a shrink.

In a related and inspired post, Maria Ogneva writes about a future world in which companies and customers are able to efficiently interact via different media:

  • phone
  • email
  • Twitter handle
  • physical address (required to be sure, although it just seems so dated)

    It’s an interesting post and one well worth checking out.

    All of this makes me wonder a few things:

    • Do many large organizations even have a master customer record anywhere?
    • Which people and departments can access this record? What about updating it?
    • What specific data elements are on that record?
    • Since Twitter handle is probably not one of those elements, from a technical standpoint, how hard would it be to add? Adding it is sure preferable to the separate creation of a master Twitter Record (MTR) that would invariably get out of sync with a “main” one.

      Now, no one here is claiming that Twitter ought to be the primary means for a company to deal with its customers. First, not everyone is on Twitter. Second, that aside for a moment, a direct message of no more than 140 characters is probably too restrictive to resolve an even moderately complex customer issue. Finally, it’s so easy to retweet that companies would (probably justifiably) fear making certain responses public–at least so easily.

      On the other hand, why not have that club in the bag? Use it when needed. this would be like my rarely-used three iron on the golf course. I don’t use it often, but when I need it, I’m sure glad that I have it.

      Big Company Customer Service

      I don’t buy into the notion that a large organization cannot provide state-of-the-art customer service. It’s a matter of priorities and will. There’s no business or technology limitation to being able to take care of the people who take care of you.

      Case in point: Amazon.com.

      Recently on the Charlie Rose show, Amazon CEO Jeff Bezos talked about the company’s relentless focus on the customer from day one. There were those early on who claimed that Amazon was a “cute little company” but, once the heavyweights like Wal-Mart embraced e-commerce, they would crush the Amazons of the world. Bezos laughs now about the “Amazon.toast” references from the mid-1990s.

      Reports of Amazon’s demise were premature. Today, to say that the company does a good job managing customer information to create a seamless buying experience is an understatement. Also, consider that Amazon isn’t really “just  Amazon.” Consider that many small companies maintain stores on the Amazon site. To buy a book or pair of shoes, one need not reenter his or her information multiple times to make a purchase.

      With that in mind, is it really too much to ask a bit telecomm company to get its act together?

      Simon Says

      Look, when a company of any size makes a customer service error, the aggrieved customer is going to tell people about it. Lots of people. The company cannot control anything that the customer says or does after that point. This is a far cry, though, from claiming that the company is helpless against a constant stream of negative feedback via tweets, emails, blog posts, discussion boards, and the like. Use a mistake as an opportunity to rectify a bad situation. Big mouths and active fingers work both ways: Those same customers are likely to point out that at least the company did the right thing in the end–and doing the right thing requires good data on those customers.

      Feedback

      What say you?

      Tags: ,
      Category: Information Value, Master Data Management
      2 Comments »

      by: Robert.hillard
      14  Feb  2010

      Agile on the rebound

      In August of last year I was asked to write an article for the end-of-year issue of Information Age about what I thought was important in 2010.  As we all know, the last year has been tumultuous and predicting what was important in the next month has been hard enough let alone predicting the next year.

      Nonetheless, I’m always up for a challenge, wrote the article and then promptly forgot about it.  When the magazine came out, I was curious to re-read what I’d written.  The theme of the article was a focus on agility in the face of uncertainty.  This is easy to say, but writing for the CIO audience I emphasised that in tough times business tries to reduce cost (which means focusing on efficiency of existing processes) and defend against market events (which means emergency system changes).  Neither of these trends generally leads to flexible business systems.

      In the article I suggest the demand that will come during the recovery will be unpredictable, making life for the reactive CIO almost impossible.  I used books which many non-technology executives will have read (The Black Swan and Predictably Irrational) to help CIOs to make the case to focus on the data asset.

      The article is available online: Information Age (Agile on the rebound).

      Category: Information Strategy, Information Value
      No Comments »

      by: Larry.dubov
      14  Aug  2009

      Quantifying Data Quality with Information Theory

      Information Theory Approach to Data Quality for MDM

      Introduction

      Over the past decade data quality has been a major focus for data management professionals, data government organizations, and other data quality stakeholders across the enterprise. Still the quality of data remains low for many organizations. To a considerable extent this is caused by a lack of scientifically or at least consistently defined data quality metrics. Data professionals are still lacking a common methodology that would enable them to measure data quality objectively in terms of scientifically defined metrics and compare data sets in terms of their quality across systems, departments and corporations.  

       

      Even though many data profiling metrics exist, their usage is not scientifically justified. Consequently enterprises and their departments apply their own standards or apply no standards at all.

       

      As a result, regulatory agencies, executive management and data governance organizations are lacking a standard, objective and scientifically defined way to articulate data quality requirements and measure data quality improvement progress. An elusiveness of data quality results in that job performance of the enterprise roles responsible for data quality lacks consistently defined criteria, which ultimately causes limited progress in data quality improvements.

       

      A quantitative approach to data quality, if developed and adopted by data management community, would enable data professionals to better prioritize data quality issues and take corrective actions proactively and efficiently.

       

      In this article we will discuss a scientific approach to data quality for MDM based on Information Theory. This approach seems to be a good candidate to address the aforementioned problem.

       

      Approaches to Data Quality

      At a high level there are two well-known and broadly used approaches to data quality. Typically both of them are used to a certain degree by every enterprise.

       

      The first approach is mostly application driven and oftentimes referred to as a “fit-for-purpose” approach. Oftentimes business users determine that certain application queries or reports do not return the right data. For instance if a query that is supposed to fetch top 10 Q2 customers does not return some of the customers the business expects to see, in depth data analysis follows. The data analysis may determine that some customer records are duplicated and some transaction records have incorrect or missing transaction dates. This type of finding can trigger some activities aimed at understanding of the data issues and corrective actions.

       

      An advantage of this approach to data quality is that it is aligned with tactical needs of business functions, groups and departments. A disadvantage of this approach is that it addresses data quality issues re-actively based upon business request or even complaint. Some data quality issues may not be easy to discover and business users cannot decide which report is right and which one is wrong. The organization may eventually draw a conclusion that their data is bad but would not be able to indicate what exactly needs to be fixed in the data, which limits the IT’s abilities to fix the issues. When multiple LOB’s and functions across the enterprise struggle with their specific data quality issues separately, it is difficult to quantify the overall state of data quality and define priorities with which data quality problems are to be addressed by the enterprise.

       

      The second approach is based on data profiling. Data profiling tools are intended to make a data quality improvement process more pro-active and measurable. A number of data profiling metrics is typically introduced to screen for missing and invalid attributes, duplicate records, duplicate attribute values that are supposed to be unique, frequency of attributes, cardinality of attributes and their allowed values, standardization and validation of certain data formats for simple and complex attribute types, violations of referential integrity, etc. A limitation of the data profiling techniques is in that an additional analysis is required to understand which of the metrics are most important for the business and why. It may not be easy to come up with a definitive answer and translate it into a data quality improvement action plan. The variety of data profiling metrics is not based on science but rather driven by the variety of ways relational database technology can report on data quality issues.

       

      Each of the two approaches above has its niche and significance. When the quality of master data is in question an alternative and more strategic approach can be considered by data governance organizations. This approach avoids detailed analysis of business applications while providing a solid scientific foundation for its metrics.

      Information Theory Approach to Data Quality for MDM  

      Master data are those data which are foundational to business processes, are usually widely distributed, which, when well managed, are directly contributing
      to the success of an organization, and when not well managed pose the most risk. Customer, Patient, Citizen, Member, Client, Member, Broker, Product, Financial Instrument, Drug are the entities oftentimes referred to as master data entities while company specific selection of master entities is driven by the enterprise business and focus.

       

      Master Data Service (MDS) defines its primary function as the creation of the “golden view” of the master entities. We will assume that MDS has successfully created and maintains the “golden view” of entity F in the data hub. This “golden record” can be dynamic or persistent. There exist a number of data sources across the enterprise with the data corresponding to domain F. This includes the source systems that feed the data hub and other data sources that may be not integrated with the data hub. We will define an external dataset f which data quality is to be quantified with respect to F. For the purpose of this discussion f can represent any data set such as a single data source or multiple sources.

       

      Our goal is to compare the source data set f with the entity data set F. The data quality of the data set f will be characterized by how well it represents the benchmark entity F defined as the “golden view” for the data in domain F. We are making an assumption here that the “golden view” was created algorithmically and then validated by the data stewards.

       

      In Information Theory the information quantity associated with the entity F is expressed in terms of the entropy:

                                                    

                          H(F) = – ∑ Pk log Pk,                                                                                            (1)   

                                                    

      where Pk are the probabilities of the attribute (token) values in the “golden” data set F. Index “K” runs over all records in F and all attributes. The base in the log function is 2.

       

      H(F) represents the quantity of information in the “golden” representation of entity F.

       

      Similarly for the comparison data set f

       

                          H(f) = – ∑ pi log pi,                                                                                            (2)   

       

      We will use small “p” for the probabilities associated with f while capital letter “P” is used for the probabilities characterizing the “golden” entity record.

       

      Mutual entropy J(f,F) characterizes how well f represents F.

                         

      J(f,F) = H(f) + H(F) – H(f,F)                                                                        (3)   

       

      In (3) H(f,F) is the joint entropy of f and F. It is expressed in terms of probabilities of combined events, e.g. the probability that the name = “Smith” in “the golden record” F and name = “Schmidt” in the source record linked to the same entity. The behavior of J qualifies this function as a good candidate quantifying the data quality of f with respect to F. When the data quality is low, the correlation between f and F is low. In an extreme case of a very low data quality f doesn’t correlate with F and these variables are independent. Then

       

                          H(f,F) = H(f) + H(F)                                                                                      (4)   

       

      and

       

                          J(f,F) = 0                                                                                                       (5)   

       

      If f represents F extremely well, e.g. f = F, then H(f) = H(F) = H(f,F) and

       

                          J(f,F) = H(F)                                                                                                  (6)   

       

      We define Data Quality of f with respect to F by the following equation:

       

                          DQ(f,F) = J(f,F)/H(F)                                                                                      (7)   

       

      With this definition of data quality DQ changes from 0 to 1, where 0 indicates the data quality of f is minimal; f does not represent F.  When DQ = 1 f perfectly represents F and the data quality of f with respect to F is 100%, and therefore f represents F perfectly well.

       

      The approach can also be used to determine partial attribute/token level data quality. This will provide additional insights into what causes most significant data quality issues.

       

      The data quality improvement should be done iteratively. Changes in the data source data may impact the “golden record”. Then equations (1) and (7) are applied again to recalculate the data quantity and data quality characteristics.

       

      Conclusion

      The article offers an Information Theory based method for quantifying Information Assets and the Data Quality of the Assets through equations (1) and (7). The proposed method leverages the notion of a “golden record” created and maintained in the data hub. The “golden record” is used as the benchmark against which the data quality of other sources is measured.

       

      Organizations can leverage this approach to augment its data governance offerings for MDM and make our data governance approach truly unique. The quantitative approach to data quality will ultimately help data governance organizations develop policies based on scientifically defined data quality and quantity metrics.

       

      By applying this approach consistently on a number of engagements, over time we will accumulate valuable insights into how metrics (1) and (7) apply to real world data characteristics and scenarios. We will develop good practices defining acceptable data quality thresholds, e.g. it might be a future industry policy for P&C insurance business to keep the quality of Customer data above the 92% mark, which sets clearly articulated data governance policy based on scientifically sound approach to data quality metrics.

       

      The developed approach can be incorporated in the future products to enable data governance and provide data governance organizations with new tooling. Data governance will be able to select information sources and assets to be measured, quantify them according to (1) and (7), set the target metrics for data stewards, measure the progress on an on-going basis and report on the data quality improvement progress.

       

      Even though we are mainly focusing on data quality, the quantity of data in equation (1) characterizes the overall significance of a corporate data set from the Information Theory perspective. For M&A the method can be used to measure an additional amount of information that the joint enterprise will have compared to the information owned by the companies separately. The approach developed above will measure both the information acquired due to the difference in the customer bases and the information quantity increment due to better and more precise and useful information about the existing customers.

       

        Simple Illustrative Examples

      In this Appendix we will apply the theory developed above to two simple illustrative cases. We will define the “golden” data set F as follows:

       

      EID

      Name

      State

      1

      Larry

      NJ

      2

      Jim

      GA

      3

      Scott

      CA

      4

      Marty

      CA

       

      The probabilities of attributes values in F are:

       

      Value

      Probability (P)

      log P

      p log p

      Larry

      0.25

      -2

      -0.5

      Jim

      0.25

      -2

      -0.5

      Scott

      0.25

      -2

      -0.5

      Marty

      0.25

      -2

      -0.5

      NJ

      0.25

      -2

      -0.5

      GA

      0.25

      -2

      -0.5

      CA

      0.5

      -1

      -0.5

      Scenario 1

      Dataset f is the same as the “golden” data set. Then

       

                                                      f = F, H(f) = H(F) = 3.5.

       

      The probability matrix for combined values:

       

      Value

      Probability (P)

      log P

      p log p

      Larry, Larry

      0.25

      -2

      -0.5

      Jim, Jim

      0.25

      -2

      -0.5

      Scott, Scott

      0.25

      -2

      -0.5

      Marty, Marty

      0.25

      -2

      -0.5

      NJ,NJ

      0.25

      -2

      -0.5

      GA,GA

      0.25

      -2

      -0.5

      CA, CA

      0.5

      -1

      -0.5

       

       

       

       

      H(f,F) = -∑Pk logPk =

       

      3.5

       

      and

       

                                                               H(F) = H(f) = H(f,F) = 3.5

       

       

                                                   J(f,F) = H(F) + H(f) – H(f,F) = H(F) = 3.5

       

      Equation (7) yields

       

                                                                 DQ = J(f,F)/H(F) = 1

       

      As expected the data quality of f when f = F yields 1 or 100%

       

      Scenario 2

      Dataset for the “golden record” F remains the same as in scenario 1.

       

                                                                    H(F) = 3.5

       

      We will change dataset f by adding a new record: “Larry, CA”. We will assume that the new record for “Larry” represent the same individual as “Larry, NJ”. Therefore records “Larry, NJ” and “Larry, CA” will have the same EID = 1. Data stewards determined that “NJ” is the right value for the attribute State.  Dataset f is as follows:

       

                                                     

      EID

      Name

      State

      1

      Larry

      NJ

      2

      Jim

      GA

      3

      Scott

      CA

      4

      Marty

      CA

      1

      Larry

      CA

       

       The probability matrix for f is:

       

       

       

       

       

       

      Value

      Probability (P)

      log P

      p log p

      Larry

      0.4

      -1.32193

      -0.528771238

      Jim

      0.2

      -2.32193

      -0.464385619

      Scott

      0.2

      -2.32193

      -0.464385619

      Marty

      0.2

      -2.32193

      -0.464385619

      NJ

      0.2

      -2.32193

      -0.464385619

      GA

      0.2

      -2.32193

      -0.464385619

      CA

      0.6

      -0.73697

      -0.442179356

       

       

       

       

      H(f) =

       

       

      3.292878689

       

       

       

      The probability matrix for combined values:

      Value

      Probability (P)

      log P

      p log p

      Larry, Larry

      0.4

      -1.32193

      -0.528771238

      Jim, Jim

      0.2

      -2.32193

      -0.464385619

      Scott, Scott

      0.2

      -2.32193

      -0.464385619

      Marty, Marty

      0.2

      -2.32193

      -0.464385619

      NJ,NJ

      0.2

      -2.32193

      -0.464385619

      GA,GA

      0.2

      -2.32193

      -0.464385619

      CA, CA

      0.4

      -1.32193

      -0.528771238

      NJ, CA

      0.2

      -2.32193

      -0.464385619

       

       

       

       

      H(f,E) =

       

       

      3.84385619

       

      Substituting the values for H(F), H(f) and H(f,F) into 7 we will obtain:

       

                                       J(f,F) = 3.5 + 3.292878689 – 3.84385619 = 2.949022499

       

                                        DQ = J(f,F)/H(F) = 2.949022499/3.5 = 0.842577857 or ~ 84%

      Category: Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Value, Master Data Management
      5 Comments »

      by: Sean.mcclowry
      27  May  2009

      Information Development concepts becoming more mainstream

      For an interesting point of view on how how quantitative social science is becoming more making mainstream, check out Steve Miller’s great article: Hopefully we’ll see more of these methods developed in an open and collaborative fashion through frameworks like MIKE2.0.   Something we haven’t done well enough is engage the academic community becoming part of the collaborative community?

      Seen any great published work on this space?  Help add it to MIKE2.0 bookmarks ..


      Tags: , , ,
      Category: Information Development, Information Value
      2 Comments »

      by: Robert.hillard
      22  Sep  2007

      Information is finite

      In this post, I want to remind readers that information is not abstract, it is something real and follows the laws of physics.  Information theory talks about encoding information using predictable patterns, which have to be represented by some type of device.  Such a device would typically use electrical energy in some form to represent each bit so it is no surprise that conservation of energy laws apply: that is creating one piece of information must destroy another.

      Why does this matter?  Information is finite and discovering one thing inevitably means that something else is either lost or in some way reduced in value.  Anyone with a physics background might think of this as an extension on the Heisenberg uncertainty principle.

      From a business management perspective, it means that the enterprise customer list cannot be used in an infinite number of ways without degenerating the value of the content.  While intuitively true, I argue that it is also mathematically true, through the fact that applying information such as customer details also derives information about its application.  Deriving information about its application must reduce information (significant or otherwise) from elsewhere.  Usually, this reduction is significant – the process of finding out a customer needs a new service usually reduces the confidence in earlier analysis and limits the ability to target the same customer in other ways.

      Category: Information Management, Information Strategy, Information Value
      No Comments »

      by: Robert.hillard
      15  Aug  2007

      How should an executive judge the quality of data models?

      Why would an executive care?  There are two main reasons why every business and technology executive should consider the quality of data modelling to be core to their success.

      The first is that information is a valuable economic asset (as argued in MIKE2.0 in the article the Economic Value of Information).  Customer data, performance data, analytical information all combine to be an asset that is often worth multiples of billions of dollars.  If a company had billions of dollars worth of gold, I’d expect business executives to want to review and understand how such a valuable asset was housed!  Given that the data model is usually the main home for the information asset, the same should also be true.  The data model cannot be delegated to junior technical staff!

      Increasingly there is another reason for elevating the data model.  Legacy information is becoming an obstacle to business transformation.  As the price of storage dropped during the 1990s, new systems began also storing ancillary data about the parties involved in each transaction and substantially more context for the event.  Context could, for example, include the whole sales relationship tracking leading up to a transaction, or the staff contract changes that led to a salary change.  With the context as part of the legal record, there are operational, regulatory and strategic reasons requiring that any new or transforming business function do nothing to corrupt the existing detail.  The data model is the only tool we have to map new business requirements to old data.

      Given the complexity of data modelling, it’s not surprising that executives have shied away from speaking to technologists about the detail of individual models.  A discussion on normalization principles would be enough to put most decision makers off!

      In the Small Worlds Data Transformation Measure article MIKE2.0 introduces a set of simple metrics to indicate whether on average the data models of an organization or doing a good job of managing the information asset.  Using the principle that information makes most sense in the context of the enterprise, it measures the level of connectivity and the degree of separation on average across a subset or all of the data models housing the information assets.

      Category: Information Management, Information Strategy, Information Value, MIKE2.0
      No Comments »

      Calendar
      Collapse Expand Close
      TODAY: Thu, September 2, 2010
      September2010
      SMTWTFS
      2930311234
      567891011
      12131415161718
      19202122232425
      262728293012
      Archives
      Collapse Expand Close
      Recent Comments
      Collapse Expand Close