Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.
by: Alandduncan
24  May  2014

Data Quality Profiling: Do you trust in the Dark Arts?

Why estimating Data Quality profiling doesn’t have to be guess-work

Data Management lore would have us believe that estimating the amount of work involved in Data Quality analysis is a bit of a “Dark Art,” and to get a close enough approximation for quoting purposes requires much scrying, haruspicy and wet-finger-waving, as well as plenty of general wailing and gnashing of teeth. (Those of you with a background in Project Management could probably argue that any type of work estimation is just as problematic, and that in any event work will expand to more than fill the time available…).

However, you may no longer need to call on the services of Severus Snape or Mystic Meg to get a workable estimate for data quality profiling. My colleague from QFire Software, Neil Currie, recently put me onto a post by David Loshin on SearchDataManagement.com, which proposes a more structured and rational approach to estimating data quality work effort.

At first glance, the overall methodology that David proposes is reasonable in terms of estimating effort for a pure profiling exercise – at least in principle. (It’s analogous to similar “bottom/up” calculations that I’ve used in the past to estimate ETL development on a job-by-job basis, or creation of standards Business Intelligence reports on a report-by-report basis).

I would observe that David’s approach is predicated on the (big and probably optimistic) assumption that we’re only doing the profiling step. The follow-on stages of analysis, remediation and prevention are excluded – and in my experience, that’s where the real work most often lies! There is also the assumption that a pre-existing checklist of assessment criteria exists – and developing the library of quality check criteria can be a significant exercise in its own right.

However, even accepting the “profiling only” principle, I’d also offer a couple of additional enhancements to the overall approach.

Firstly, even with profiling tools, the inspection and analysis process for any “wrong” elements can go a lot further than just a 10-minute-per-item-compare-with-the-checklist, particularly in data sets with a large number of records. Also, there’s the question of root-cause diagnosis (And good DQ methods WILL go into inspecting the actual member records themselves). So for contra-indicated attributes, I’d suggest a slightly extended estimation model:

* 10mins: for each “Simple” item (standard format, no applied business rules, fewer that 100 member records)
* 30 mins: for each “Medium” complexity item (unusual formats, some embedded business logic, data sets up to 1000 member records)
* 60 mins: for any “Hard” high-complexity items (significant, complex business logic, data sets over 1000 member records)

Secondly, and more importantly – David doesn’t really allow for the human factor. It’s always people that are bloody hard work! While it’s all very well to do a profiling exercise in-and-of-itself, the result need to be shared with human beings – presented, scrutinised, questioned, validated, evaluated, verified, justified. (Then acted upon, hopefully!) And even allowing for the set-aside of the “Analysis” stages onwards, then there will need to be some form of socialisation within the “Profiling” phase.

That’s not a technical exercise – it’s about communication, collaboration and co-operation. Which means it may take an awful lot longer than just doing the tool-based profiling process!

How much socialisation? That depends on the number of stakeholders, and their nature. As a rule-of-thumb, I’d suggest the following:

* Two hours of preparation per workshop ((If the stakeholder group is “tame”. Double it if there are participants who are negatively inclined).
* One hour face-time per workshop (Double it for “negatives”)
* One hour post-workshop write-up time per workshop
* One workshop per 10 stakeholders.
* Two days to prepare any final papers and recommendations, and present to the Steering Group/Project Board.

That’s in addition to David’s formula for estimating the pure data profiling tasks.

Detailed root-cause analysis (Validate), remediation (Protect) and ongoing evaluation (Monitor) stages are a whole other ball-game.

Alternatively, just stick with the crystal balls and goats – you might not even need to kill the goat anymore…

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Bsomich
24  May  2014

Community Update.

Missed what’s been happening in the information management community? Check out our community update:

 logo.jpg

Big Data Solution Offering 

Have you seen the latest offering in our Composite Solutions suite?

The Big Data Solution Offering provides an approach for storing, managing and accessing data of very high volumes, variety or complexity.

Storing large volumes of data from a large variety of data sources in traditional relational data stores is cost-prohibitive, and regular data modeling approaches and statistical tools cannot handle data structures with such high complexity. This solution offering discusses new types of data management systems based onNoSQL database management systems and MapReduce as the typical programming model and access method.

Read our Executive Summary for an overview of this solution.

We hope you find this new offering of benefit and welcome any suggestions you may have to improve it.

Sincerely,

MIKE2.0 Community

Popular Content

Did you know that the followingwiki articles are most popular on Google? Check them out, and feel free to edit or expand them!

What is MIKE2.0?
Deliverable Templates
The 5 Phases of MIKE2.0
Overall Task List
Business Assessment Blueprint
SAFE Architecture
Information Governance Solution

Contribute to MIKE:

Start a new article, help witharticles under construction or look for other ways to contribute.

Update your personal profile to advertise yourself to the community and interact with other members.

Useful Links:
Home Page
Login
Content Model
FAQs
MIKE2.0 Governance

Join Us on
42.gif

Follow Us on
43 copy.jpg

Join Us on
images.jpg

 

This Week’s Blogs for Thought:

Key Questions to Get Started with Data Governance

“Five Whiskies in a Hotel” is the clue – i.e. five questioning words begin with “W” (Who, What, When, Why, Where), with one beginning with “H” (How). These simple question words give us a great entry point when we are trying to capture the initial set of issues and concerns around data governance – what questions are important/need to be asked.

Read more.

Things Interrupting the Internet of Things

I have written about several aspects of the Internet of Things (IoT) in previous posts on this blog (The Internet of HumansThe Quality of Things, and Is it finally the Year of IoT?). Two recent articles have examined a few of the things that are interrupting the progress of IoT. In his InformationWeek article Internet of Things: What’s Holding Us Back, Chris Murphy reported that while “companies in a variety of industries — transportation, energy, heavy equipment, consumer goods, healthcare, hospitality, insurance — are getting measurable results by analyzing data collected from all manner of machines, equipment, devices, appliances, and other networked things” (of which his article notes many examples), there are things slowing the progress of IoT.

Read more.
Is Your Data Quality Boring? 

Let’s be honest here. Data Quality is good and worthy, but it can be a pretty dull affair at times. Information Management is something that “just happens”, and folks would rather not know the ins-and-outs of how the monthly Management Pack gets created. Yet I’ll bet that they’ll be right on your case when the numbers are “wrong.” Right?

So here’s an idea. The next time you want to engage someone in a discussion about data quality, don’t start by discussing data quality. Don’t mention the processes of profiling, validating or cleansing data. Don’t talk about integration, storage or reporting. And don’t even think about metadata, lineage or auditability.

Read more.

Forward this message to a friendQuestions?

If you have any questions, please email us at mike2@openmethodology.org.

 

Category: Information Development
No Comments »

by: Ocdqblog
22  May  2014

Things Interrupting the Internet of Things

I have written about several aspects of the Internet of Things (IoT) in previous posts on this blog (The Internet of HumansThe Quality of Things, and Is it finally the Year of IoT?). Two recent articles have examined a few of the things that are interrupting the progress of IoT.

In his InformationWeek article Internet of Things: What’s Holding Us Back, Chris Murphy reported that while “companies in a variety of industries — transportation, energy, heavy equipment, consumer goods, healthcare, hospitality, insurance — are getting measurable results by analyzing data collected from all manner of machines, equipment, devices, appliances, and other networked things” (of which his article notes many examples), there are things slowing the progress of IoT.

One of its myths, Murphy explained, “is that companies have all the data they need, but their real challenge is making sense of it. In reality, the cost of collecting some kinds of data remains too high, the quality of the data isn’t always good enough, and it remains difficult to integrate multiple data sources.”

The fact is a lot of the things we want to connect to IoT weren’t built for Internet connectivity, so it’s not as simple as just sticking a sensor on everything. Poor data quality, especially timeliness, but also completeness and accuracy, is impacting the usefulness of the data transmitted by things currently connected to sensors. Other issues Murphy explores in his article include the lack of wireless network ubiquity, data integration complexities, and securing IoT data centers and devices against Internet threats.

“The clearest IoT successes today,” Murphy concluded, “are from industrial projects that save companies money, rather than from projects that drive new revenue. But even with these industrial projects, companies shouldn’t underestimate the cultural change they need to manage as machines start telling veteran machine operators, train drivers, nurses, and mechanics what’s wrong, and what they should do, in their environment.”

In his Wired article Why Tech’s Best Minds Are Very Worried About the Internet of Things, Klint Finley reported on the findings of a survey about IoT from the Pew Research Center. While potential benefits of IoT were noted, such as medical devices monitoring health and environmental sensors detecting pollution, potential risks were highlighted, security being one of the most immediate concerns. “Beyond security concerns,” Finley explained, “there’s the threat of building a world that may be too complex for our own good. If you think error messages and applications crashes are a problem now, just wait until the web is embedded in everything from your car to your sneakers. Like the VCR that forever blinks 12:00, many of the technologies built into the devices of the future may never be used properly.”

The research also noted concerns about IoT expanding the digital divide, ostracizing those who can not, or choose not to, participate. Data privacy concerns were also raised, including the possibility of dehumanizing the workplace by monitoring employees through wearable devices (e.g., your employee badge being used to track your movements).

Some survey respondents hinted at a division similar to what we hear about the cloud, differentiating a public IoT from a private IoT. In the latter case, instead of sacrificing our privacy for the advantages of connected devices, there’s no reason our devices can’t connect to a private network instead of the public internet. Other survey respondents believe that IoT has been overhyped as the next big thing and while it will be useful in some areas (e.g., the military, hospitals, and prisons) it will not pervade mainstream culture and therefore will not invade our privacy as many fear.

What Say You about IoT?

Please share your viewpoint about IoT by leaving a comment below.

 

Category: Data Quality, Information Development
No Comments »

by: Alandduncan
16  May  2014

Five Whiskies in a Hotel: Key Questions to Get Started with Data Governance

A “foreign” colleague of mine once told me a trick his English language teacher taught him to help him remember the “questioning words” in English. (To the British, anyone who is a non-native speaker of English is “foreign.” I should also add that as a Scotsman, English is effectively my second language…).

“Five Whiskies in a Hotel” is the clue – i.e. five questioning words begin with “W” (Who, What, When, Why, Where), with one beginning with “H” (How).

These simple question words give us a great entry point when we are trying to capture the initial set of issues and concerns around data governance – what questions are important/need to be asked.

* What data/information do you want? (What inputs? What outputs? What tests/measures/criteria will be applied to confirm whether the data is fit for purpose or not?)
* Why do you want it? (What outcomes do you hope to achieve? Does the data being requested actually support those questions & outcome? Consider Efficiency/Effectiveness/Risk Mitigation drivers for benefit.)
* When is the information required? (When is it first required? How frequently? Particular events?)
* Who is involved? (Who is the information for? Who has rights to see the data? Who is it being provided by? Who is ultimately accountable for the data – both contents and definitions? Consider multiple stakeholder groups in both recipients and providers)
* Where is the data to reside? (Where is it originating form? Where is it going to?)
* How will it be shared? (How will the mechanisms/methods work to collect/collate/integrate/store/disseminate/access/archive the data? How should it be structured & formatted? Consider Systems, Processes and Human methods.)

Clearly, each question can generate multiple answers!

Aside: in the Doric dialect of North-East of Scotland where I originally hail from, all the “question” words begin with “F”:
Fit…? (What?) e.g. “Fit dis yon feel loon wint?” (What does that silly chap want?)
Fit wye…? (Why?) e.g. “Fit wye div ye wint a’thin’?” (Why do you want everything?)
Fan…? (When?) e.g. “Fan div ye wint it?” (When you you want it?)
Fa…? (Who?) e.g. “Fa div I gie ‘is tae?” (Who do I give this to?)
Far…? (Where?) e.g. “Far aboots dis yon thingumyjig ging?” (Where exactly does that item go?)
Foo…? (How?) e.g. “Foo div ye expect me tae dae it by ‘e morn?” (How do you expect me to do it by tomorrow?)

Whatever your native language, these key questions should get the conversation started…

Remember too, the homily by Rudyard Kipling:

https://www.youtube.com/watch?v=e6ySN72EasU

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Alandduncan
13  May  2014

Is Your Data Quality Boring?

https://www.youtube.com/watch?v=vEJy-xtHfHY

Is this the kind of response you get when you mention to people that you work in Data Quality?!

Let’s be honest here. Data Quality is good and worthy, but it can be a pretty dull affair at times. Information Management is something that “just happens”, and folks would rather not know the ins-and-outs of how the monthly Management Pack gets created.

Yet I’ll bet that they’ll be right on your case when the numbers are “wrong”.

Right?

So here’s an idea. The next time you want to engage someone in a discussion about data quality, don’t start by discussing data quality. Don’t mention the processes of profiling, validating or cleansing data. Don’t talk about integration, storage or reporting. And don’t even think about metadata, lineage or auditability. Yaaaaaaaaawn!!!!

Instead of concentrating on telling people about the practitioner processes (which of course are vital, and fascinating no doubt if you happen to be a practitioner), think about engaging in a manner that is relevant to the business community, using language and examples that are business-oriented. Make it fun!

Once you’ve got the discussion flowing in terms of the impacts, challenges and inhibitors that get in the way of successful business operations, then you can start to drill into the underlying data issues and their root causes. More often than not, a data quality issue is symptomatic of a business process failure rather than being an end in itself. By fixing the process problem, the business user gains a benefit, and the data in enhanced as a by-product. Everyone wins (and you didn’t even have to mention the dreaded DQ phrase!)

Data Quality is a human thing – that’s why its hard. As practitioners, we need to be communicators. Lead the thinking, identify the impact and deliver the value.

Now, that’s interesting!

https://www.youtube.com/watch?v=5sGjYgDXEWo

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Alandduncan
12  May  2014

The Information Management Tube Map

Just recently, Gary Allemann posted a guest article on Nicola Askham’s Blog, which made an analogy between Data Governance and the London Tube map. (Nicola also on Twitter. See also Gary Allemann’s blog, Data Quality Matters.)

Up until now, I’ve always struggled to think of a way to represent all of the different aspects of Information Management/Data Governance; the environment is multi-faceted, with the interconnections between the component capabilities being complex and not hierarchical. I’ve sometimes alluded to there being a network of relationship between elements, but this has been a fairly abstract concept that I’ve never been able to adequately illustrate.

And in a moment of perspiration, I came up with this…

http://informationaction.blogspot.com.au/2014/03/the-information-management-tube-map.html

I’ll be developing this further as I go but in the meantime, please let me know what you think.

(NOTE: following on from Seth Godin’ plea for more sharing of ideas, I am publishing the Information Management Tube Map under Creative Commons License Attribution Share-Alike V4.0 International. Please credit me where you use the concept, and I would appreciate it if you could reference back to me with any changes, suggestions or feedback. Thanks in advance.)

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Bsomich
10  May  2014

MIKE2.0 Community Update

Missed what’s been happening in the data management community? Check out our bi-weekly update:

 
 logo.jpg

Did You Know? 

MIKE’s Integrated Content Repository brings together the open assets from the MIKE2.0 Methodology, shared assets available on the internet and internally held assets. The Integrated Content Repository is a virtual hub of assets that can be used by an Information Management community, some of which are publicly available and some of which are held internally.

Any organisation can follow the same approach and integrate their internally held assets to the open standard provided by MIKE2.0 in order to:

  • Build community
  • Create a common standard for Information Development
  • Share leading intellectual property
  • Promote a comprehensive and compelling set of offerings
  • Collaborate with the business units to integrate messaging and coordinate sales activities
  • Reduce costs through reuse and improve quality through known assets

The Integrated Content Repository is a true Enterprise 2.0 solution: it makes use of the collaborative, user-driven content built using Web 2.0 techniques and technologies on the MIKE2.0 site and incorporates it internally into the enterprise. The approach followed to build this repository is referred to as a mashup.

Feel free to try it out when you have a moment- we’re always open to new content ideas.

Sincerely,

MIKE2.0 Community  

Contribute to MIKE:

Start a new article, help witharticles under construction or look for other ways to contribute.

Update your personal profile to advertise yourself to the community and interact with other members.

Useful Links:
Home Page
Login
Content Model
FAQs
MIKE2.0 Governance

Join Us on
42.gif

Follow Us on
43 copy.jpg

Join Us on
images.jpg

 

Did You Know?
All content on MIKE2.0 and any contributions you make are published under the Creative Commons license. This allows you free re-use of our content as long as you add a brief reference back to us.

 

This Week’s Blogs for Thought:

An Open Source Solution for Better Performance Management

Today, many organizations are facing increased scrutiny and a higher level of overall performance expectation from internal and external stakeholders. Both business and public sector leaders must provide greater external and internal transparancy to their activities, ensure accounting data faces up to compliance challenges, and extract the return and competitive advantage out of their customer, operational and performance information.

Read more.

Finding Ugly Data with Pretty Pictures

While data visualization often produces pretty pictures, as Phil Simon explained in his new book The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions, “data visualization should not be confused with art. Clarity, utility, and user-friendliness are paramount to any design aesthetic.” Bad data visualizations are even worse than bad art since, as Simon says, “they confuse people more than they convey information.”

Read more.
Is the cloud really safe? 

With the rise of Apple’s iCloud and file-sharing solutions such as Dropbox and Sharepoint, it’s no surprise that SaaS and cloud computing solutions are increasing in popularity. From reduced costs in hosting and infrastructure to increased mobility, backups and accessibility, the benefits are profound. Need proof? A recent study by IDC estimates that spending on public IT cloud services will expand at a compound annual growth rate of 27.6%, from $21.5 billion in 2010 to $72.9 billion in 2015.

Read more.

 

Forward this message to a friend

 

Category: Information Development
No Comments »

by: Bsomich
01  May  2014

An Open Source Solution for Better Performance Management

Today, many organizations are facing increased scrutiny and a higher level of overall performance expectation from internal and external stakeholders. Both business and public sector leaders must provide greater external and internal transparancy to their activities, ensure accounting data faces up to compliance challenges, and extract the return and competitive advantage out of their customer, operational and performance information: Managers, investors and regulators have a new perspective on performance and compliance, which may include:

  • No tolerance of any gap in financial controls
  • Increased personal accountability for the accuracy of results
  • Focus on fair representation and performance over accounting standards
  • Increased need to understand the drivers of both financial and operational results

Senior executives across industry agree that achieving growth and performance objectives requires improved analytics, defining and implementing a process to track, monitor and measure innovation and performance, and making available more timely, accurate information for decision-making. MIKE2.0′s open source Enterprise Performance Management offering can help organizations achieve these critical objectives.

Learn more about this solution.

800px-EPM_Framework_1107.2

Category: Information Development
No Comments »

by: Ocdqblog
30  Apr  2014

Finding Ugly Data with Pretty Pictures

While data visualization often produces pretty pictures, as Phil Simon explained in his new book The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions, “data visualization should not be confused with art. Clarity, utility, and user-friendliness are paramount to any design aesthetic.” Bad data visualizations are even worse than bad art since, as Simon says, “they confuse people more than they convey information.”

Simon explained how data scientist Melinda Thielbar recommends using data visualization to help an analyst communicate with a nontechnical audience, as well as help the data communicate with the analyst.

“Visualization is a great way to let the data tell a story,” Thielbar explained. “It’s also a great way for analysts to fool themselves into believing the story they want to believe.” This is why she recommends developing the visualizations at the beginning of the analysis to allow the visualizations that really illustrate the story behind the data to stand out, a process she calls “building windows into the data.” When you look through a window, you may not like what you see.

“Data visualizations may include bad, suspect, duplicate, or incomplete data,” Simon explained. This can be a good thing, however, since data visualizations “can help users identify fishy information and purify data faster than manual hunting and pecking. Data quality is a continuum, not a binary. Use data visualization to improve data quality.” Even when you are looking at what appears to be the pretty end of the continuum, Simon cautioned that “just because data is visualized doesn’t necessarily mean that it is accurate, complete, or indicative of the right course of action.”

Especially when dealing with volume aspect of big data, data visualization can help find outliers faster. While detailed analysis is needed to determine whether the outlier is a business insight or a data quality issue, data visualization can help you shake those needles out of the haystack and into a clear field of vision.

Among its many other uses, which Simon illustrates well in his book, finding ugly data with pretty pictures is one way data visualization can be used for improving data quality.

Category: Data Quality
No Comments »

by: Bsomich
26  Apr  2014

Community Update.

 

 logo.jpg

New to MIKE2.0? Here’s a Structural Overview…

If you’re not already familiar, here is an intro to the structure of the MIKE2.0 methodology and associated content:

  • A New Model for the Enterprise provides an intro rationale for MIKE2.0
  • What is MIKE2.0? is a good basic intro to the methodology with some of the major diagrams and schematics
  • Introduction to MIKE2.0 is a category of other introductory articles
  • Mike 2.0 How To – provides a listing of basic articles of how to work with and understand the MIKE2.0 system and methodology.
  • Alternative Release Methodologies describes current thinking about how the basic structure of MIKE2.0 can itself be modified and evolve. The site presently follows a hierarchical model with governance for major changes, though branching and other models could be contemplated.

We hope you find this of benefit and welcome any suggestions you may have to improve it.

Sincerely,

MIKE2.0 Community

New! Popular Content

Did you know that the followingwiki articles are most popular on Google? Check them out, and feel free to edit or expand them!

What is MIKE2.0?
Deliverable Templates
The 5 Phases of MIKE2.0
Overall Task List
Business Assessment Blueprint
SAFE Architecture
Information Governance Solution

Contribute to MIKE:

Start a new article, help witharticles under construction or look for other ways to contribute.

Update your personal profile to advertise yourself to the community and interact with other members.

Useful Links:
Home Page
Login
Content Model
FAQs
MIKE2.0 Governance

Join Us on
42.gif

Follow Us on
43 copy.jpg

Join Us on
 images.jpg

 

 

This Week’s Blogs for Thought:

Outsourcing our Memory to the Cloud

On the recent Stuff to Blow Your mind podcast episode Outsourcing Memory, hosts Julie Douglas and Robert Lamb discussed how, from remembering phone numbers to relying on spellcheckers, we’re allocating our cognitive processes to the cloud. Have you ever tried to recall an actual phone number stored in your cellphone, say of a close friend or relative, and been unable to do so?” Douglas asked. She remarked how that question would have been ridiculous ten years ago, but nowadays most of us would have to admit that the answer is yes.

Read more.

How CIOs Can Discuss the Contribution of IT

Just how productive are Chief Information Officers or the technology that they manage?  With technology portfolios becoming increasingly complex it is harder than ever to measure productivity.  Yet boards and investors want to know that the capital they have tied-up in the information technology of the enterprise is achieving the best possible return.
For CIOs, talking about value improves the conversation with executive colleagues.  Taking them aside to talk about the success of a project is, even for the most strategic initiatives, usually seen as a tactical discussion.  Changing the topic to increasing customer value or staff productivity through a return on technology capital is a much more strategic contribution.

Read more.
Let the Computers Calculate and the Humans Cogitate

Many organizations are wrapping their enterprise brain around the challenges of business intelligence, looking for the best ways to analyze, present, and deliver information to business users.  More organizations are choosing to do so by pushing business decisions down in order to build a bottom-up foundation.

However, one question coming up more frequently in the era of big data is what should be the division of labor between computers and humans?
Read more.

 

 

Forward to a Friend!

Know someone who might be interested in joining the Mike2.0 Community? Forward this message to a friendQuestions?

If you have any questions, please email us at mike2@openmethodology.org.

 

 

 

Category: Information Development
No Comments »

Calendar
Collapse Expand Close
TODAY: Tue, October 21, 2014
October2014
SMTWTFS
2829301234
567891011
12131415161718
19202122232425
2627282930311
Recent Comments
Collapse Expand Close