Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Archive for the ‘Information Development’ Category

by: Bsomich
31  Mar  2014

Community Update

Missed what happened in the MIKE2.0 community this past week? Read on:

 

 logo.jpg

The Five Phases of MIKE2.0

In order to realize results more quickly, the MIKE2.0 Methodology has abandoned the traditional linear or waterfall approach to systems development. In its stead, MIKE2.0 has adopted an iterative, agile approach called continuous implementation. This approach divides the development and rollout of anentire system into a series of implementation cycles. These cycles identify and prioritize the portions of the system that can be constructed and rolled out before the entire system is complete. Each cycle also includes

  • A feedback step to evaluate and prioritize the implementation results
  • Strategy changes
  • Improvement requests on the future implementation cycles.

Following this approach, there are five phases to the MIKE2.0 Methodology:

Feel free to check them out when you have a moment.

Sincerely,

MIKE2.0 Community

Contribute to MIKE:

Start a new article, help with articles under construction or look for other ways to contribute.

Update your personal profile to advertise yourself to the community and interact with other members.

Useful Links:
Home Page
Login
Content Model
FAQs
MIKE2.0 Governance

Join Us on
42.gif

Follow Us on
43 copy.jpg

Join Us on
images.jpg

Did You Know?
All content on MIKE2.0 and any contributions you make are published under the Creative Commons license. This allows you free re-use of our content as long as you add a brief reference back to us.

This Week’s Food for Thought:

Now that’s magic!

When I was a kid growing up in the UK, Paul Daniels was THE television magician. With a combination of slick high drama illusions, close-up trickery and cheeky end-of-the-pier humour, (plus a touch of glamour courtesy of The Lovely Debbie McGee TM), Paul had millions of viewers captivated on a weekly basis and his cheeky catch-phrases are still recognised to this day.Read more.

Login with Social Media

With a little work, social networks have the potential to be as valuable in confirming an identity as a passport.  It is the power of the crowd that can prove the integrity of the account holder, perhaps best described as crowdsourcing identity.

There are usually two goals of identity.  The first is to confirm you are you who you say you are and the second is to work out your relationship to other people.
Read more.

Does your company need data visualization apps?

Few learned folks dispute the fact that the era of Big Data has arrived. Debate terms if you like, but most of us are bombarded with information these days. The question is turning to, How do we attempt to understand all of this data?

Read more.

 

  Forward this message to a friend

Category: Information Development
No Comments »

by: Ocdqblog
30  Mar  2014

On Sharing Data

While security and privacy issues prevent sensitive data from being shared (e.g., customer data containing personal financial information or patient data containing personal health information), do you have access to data that would be more valuable if you shared it with the rest of your organization—or perhaps the rest of the world?

We are all familiar with the opposite of data sharing within an organization—data silos. Somewhat ironically, many data silos start with data that was designed to be shared with the entire organization (e.g., from an enterprise data warehouse), but was then replicated and customized in order to satisfy the particular needs of a tactical project or strategic initiative. This customized data often becomes obsolesced after the conclusion (or abandonment) of its project or initiative.

Data silos are usually denounced as evil, but the real question is whether the data hoarded within a silo is sharable—is it something usable by the rest of the organization, which may be redundantly storing and maintaining their own private copies of the same data, or are the contents of the data silo something only one business unit uses (or is allowed to access in the case of sensitive data).

Most people decry data silos as the bane of successful enterprise data management—until you expand the scope of data beyond the walls of the organization, where the enterprise’s single version of the truth becomes a cherished data asset (i.e., an organizational super silo) intentionally siloed from the versions of the truth maintained within other organizations, especially competitors.

We need to stop needlessly replicating and customizing data—and start reusing and sharing data.

Historically, replication and customization had two primary causes:

  • Limitations in technology (storage, access speed, processing speed, and a truly sharable infrastructure like the Internet) meant that the only option was to create and maintain an internal copy of all data.
  • Proprietary formats and customized (and also proprietary) versions of common data was viewed as a competitive differentiation—even before the recent dawn of the realization that data is a corporate asset.

Hoarding data in a proprietary format and viewing “our private knowledge is our power” must be replaced with shared data in an open format and viewing “our shared knowledge empowers us all.”

This is an easier mantra to recite than it is to realize, not only within an organization or industry, but even more so across organizations and industries. However, one of the major paradigm shifts of 21st century data management is making more data publicly available, following open standards (such as MIKE2.0) and using unambiguous definitions so data can be easily understood and reused.

Of course, data privacy still requires sensitive data not be shared without consent, and competitive differentiation still requires intellectual property not be shared outside the organization. But this still leaves a vast amount of data, which if shared, could benefit our increasingly hyper-connected world where most of the boundaries that used to separate us are becoming more virtual every day. Some examples of this were made in the recent blog post shared by Henrik Liliendahl Sørensen about Winning by Sharing Data.

Tags: , , , ,
Category: Information Development
No Comments »

by: Alandduncan
29  Mar  2014

Now that’s magic!

When I was a kid growing up in the UK, Paul Daniels was THE television magician. With a combination of slick high drama illusions, close-up trickery and cheeky end-of-the-pier humour, (plus a touch of glamour courtesy of The Lovely Debbie McGee TM), Paul had millions of viewers captivated on a weekly basis and his cheeky catch-phrases are still recognised to this day.

Of course. part of the fascination of watching a magician perform is to wonder how the trick works. “How the bloody hell did he do that?” my dad would splutter as Paul Daniels performed yet another goofy gag or hair-raising stunt (no mean fear, when you’re as bald as a coot…) But most people don’t REALLY want to know the inner secrets, and ever fewer of us are inspired to spray a riffle-shuffled a pack of cards all over granny’s lunch, stick a coin up their nose or grab the family goldfish from its bowl and hide it in the folds of our nether-garments. (Um, yeah. Let’s not go there…)

Penn and Teller are great of course, because they expose the basic techniques of really old, hackneyed tricks and force more innovation within the magician community. They’re at their most engaging when they actually do something that you don’t get to see the workings of. Illusion maintained, audience entertained.

As data practitioners, I think we can learn a few of these tricks. I often see us getting too hot-and-bothered about differentiating data, master data, reference data, metadata, classification scheme, taxonomy, dimensional vs relational vs data vault modelling etc. These concepts are certainly relevant to our practitioner world, but I don’t necessarily believe they need to be exposed at the business-user level.

For example, I often hear business users talking about “creating the metadata” for an event or transaction, when they’re talking about compiling the picklist of valid descriptive values and mapping these to the contextualising descriptive information for that event (which by my reckoning, really means compiling the reference data!). But I’ve found that business people really aren’t all that bothered about the underlying structure or rigour of the modelling process.

That’s our job.

There will always be exceptions. My good friend and colleague Ben Bor is something a special case and has the talent to combine data management and magic.

But for the rest of us mere mortals, I suggest that we keep the deep discussion of data techniques for the Data Magic Circle, and just let the paying customers enjoy the show….

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Bsomich
18  Mar  2014

Weekly IM Update.

 logo.jpg

What is an Open Methodology Framework?

An Open Methodology Framework is a collaborative environment for building methods to solve complex issues impacting business, technology, and society.  The best methodologies provide repeatable approaches on how to do things well based on established techniques. MIKE2.0′s Open Methodology Framework goes beyond the standards, techniques and best practices common to most methodologies with three objectives:

  • To Encourage Collaborative User Engagement
  • To Provide a Framework for Innovation
  • To Balance Release Stability with Continuous Improvement

We believe that this approach provides a successful framework accomplishing things in a better and collaborative fashion. What’s more, this approach allows for concurrent focus on both method and detailed technology artifacts. The emphasis is on emerging areas in which current methods and technologies lack maturity.

The Open Methodology Framework will be extended over time to include other projects. Another example of an open methodology, is open-sustainability which applies many of these concepts to the area of sustainable development. Suggestions for other Open Methodology projects can be initiated on this article’s talk page.

We hope you find this of benefit and welcome any suggestions you may have to improve it.

Sincerely,

MIKE2.0 Community

Popular Content

Did you know that the following wiki articles are most popular on Google? Check them out, and feel free to edit or expand them!

What is MIKE2.0?
Deliverable Templates
The 5 Phases of MIKE2.0
Overall Task List
Business Assessment Blueprint
SAFE Architecture
Information Governance Solution

Contribute to MIKE:

Start a new article, help with articles under construction or look for other ways to contribute.

Update your personal profile to advertise yourself to the community and interact with other members.

Useful Links:
Home Page
Login
Content Model
FAQs
MIKE2.0 Governance

Join Us on
42.gif

Follow Us on
43 copy.jpg

Join Us on images.jpg

 

This Week’s Blogs for Thought:

Is it finally the year of IoT?
In previous posts on this blog, The Internet of Humans and The Quality of Things, I have pondered aspects of the Internet of Things (IoT), which is something many analysts have promised for several years would soon be a pervasive phenomenon. There is reason to believe, however, that it is finally the year of IoT.
In a sense IoT is already with us, Christopher Mims explained in his Quartz three-part series about IoT.

Read more.

Grover: A Business Syntax for Semantic English

Grover is a semantic annotation markup syntax based on the grammar of the English language. Grover is related to the Object Management Group’s Semantics of Business Vocabulary and Rules (SBVR), explained later. Grover syntax assigns roles to common parts of speech in the English language so that simple and structured English phrases are used to name and relate information on the semantic web. By having as clear a syntax as possible, the semantic web is more valuable and useful.
An important open-source tool for semantic databases is SemanticMediaWiki that permits everyone to create a personal “wikipedia” in which private topics are maintained for personal use. The Grover syntax is based on this semantic tool and the friendly wiki environment it delivers, though the approach below might also be amenable to other toolsets and environments.Read more.

The Data Doctor is in

Being a data management practitioner can be tough.
You’re expected to work your data quality magic, solve other people’s data problems, and help people get better business outcomes. It’s a valuable, worthy and satisfying profession. But people can be infuriating and frustrating, especially when the business user isn’t taking responsibility for the quality of their own data.

It’s a bit like being a Medical Doctor in general practice.
Read more.

 

Forward to a Friend!

Know someone who might be interested in joining the Mike2.0 Community? Forward this message to a friendQuestions?

If you have any questions, please email us at mike2@openmethodology.org.

 


If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line or si

Category: Information Development
No Comments »

by: Ocdqblog
12  Mar  2014

Is it finally the Year of IoT?

In previous posts on this blog, The Internet of Humans and The Quality of Things, I have pondered aspects of the Internet of Things (IoT), which is something many analysts have promised for several years would soon be a pervasive phenomenon. There is reason to believe, however, that it is finally the year of IoT.

In a sense IoT is already with us, Christopher Mims explained in his Quartz three-part series about IoT. “For one thing, anyone with a smartphone has already joined the club. The average smartphone is brimming with sensors — an accelerometer, a compass, GPS, light, sound, altimeter. It’s the prototypical internet-connected listening station, equally adept at monitoring our health, the velocity of our car, the magnitude of earthquakes, and countless other things that its creators never envisioned. Smartphones are also becoming wireless hubs for other gadgets and sensors.”

The Invisible Buttons of our increasingly Cybered Spaces

While the challenge previously thwarting it was that connecting devices to IoT was expensive and difficult, Mims predicts that in 2014 we will see the convergence of ever-smarter smartphones and other connected things that are cheaper and easier to use. These connected things are often referred to as invisible buttons.

“An invisible button,” Mims explained, “is simply an area in space that is ‘clicked’ when a person or object, such as a smartphone, moves into that physical space. If invisible buttons were just rigidly defined on-off switches, they wouldn’t be terribly useful. But because the actions they trigger can be modified by an infinitude of other variables, such as the time of day, our previous actions, the actions of others, or what Google knows about our calendar, they quickly become a means to program our physical world.”

An example was written about by Matt McFarland in his Washington Post article How iBeacons could change the world forever. “With iOS 7,” McFarland reported, “Apple unveiled iBeacon, a feature that uses Bluetooth 4.0, a location-based technology. This makes it possible for sensors to detect — within inches — how close a phone is.” McFarland’s article lists nine interesting examples of how IoT technology like iBeacon could enhance the average person’s life.

The End of the Interface as We Know It

“That we currently need a cell phone,” Mims explained, “to act as a proximity sensor is just an artifact of where the technology is at present. The same can be accomplished with any number of other internet-connected sensors.” In fact, the rise of what is known as anticipatory computing might be signaling the end of the interface as we know it. As Mims explained, IoT doesn’t just give us another way to explicitly tell computers what we want. “Instead, by sensing our actions, the internet-connected devices around us will react automatically, and their representations in the cloud will be updated accordingly. In some ways, interacting with computers in the future could be more about telling them what not to do — at least until they’re smart enough to realize that we are modifying our daily routine.”

Every Internet needs its HTML

A critical piece of the IoT puzzle, however, remains to be solved, Mims concluded. “What engineers lack is a universal glue to bind all the of the ‘things’ in the internet of things to each other and to the cloud. A lack of standards means most devices on the internet of things are going to use existing methods of connection — Wi-Fi, Bluetooth and the like — to connect to one another. But eventually, something like HTML, the language of the web, will be required to make the internet of things realize its potential.”

As Robert Hillard blogged, the race to IoT is a marathon. While 2014 does appear to be the Year of IoT, it is still early in the marathon and IoT has many more miles to run. It is going to be a fun run, no doubt about it, but IoT will need open standards, and the interoperability they provide, in order to get us across the finish line.

Tags: , ,
Category: Information Development
No Comments »

by: John McClure
06  Mar  2014

Grover: A Business Syntax for Semantic English

Grover is a semantic annotation markup syntax based on the grammar of the English language. Grover is related to the Object Management Group’s Semantics of Business Vocabulary and Rules (SBVR), explained later. Grover syntax assigns roles to common parts of speech in the English language so that simple and structured English phrases are used to name and relate information on the semantic web. By having as clear a syntax as possible, the semantic web is more valuable and useful.

An important open-source tool for semantic databases is SemanticMediaWiki that permits everyone to create a personal “wikipedia” in which private topics are maintained for personal use. The Grover syntax is based on this semantic tool and the friendly wiki environment it delivers, though the approach below might also be amenable to other toolsets and environments.

Basic Approach. Within a Grover wiki, syntax roles are established for classes of English parts of speech.

  • Subject:noun(s) -- verb:article/verb:preposition -- Object:noun(s)

refines the standard Semantic Web pattern:

  • SubjectURL -- PredicateURL -- ObjectURLwhile in a SemanticMediaWiki environment, with its relative URLs, this is the pattern:
  • (Subject) Namespace:pagename -- (Predicate) Property:pagename -- (Object) Namespace:pagename.

 

nouns
In a Grover wiki, topic types are nouns, more precisely nounal expressions, are concepts. Every concept is defined by a specific semantic database query, these queries being the foundation of a controlled enterprise vocabulary. In Grover every pagename is the name of a topic and every pagename includes a topic-type prefix. Example: Person:Barack Obama and Title:USA President of the United States of America, two topics related together through one or more predicate relations, for instance “has:this”. Wikis are organized into ‘namespaces’ — its pages’ names are each prefixed with a namespace-name, which function equally as topic-type names. Additionally, an ‘interwiki prefix’ can indicate the URL of the wiki where a page is located — in a manner compatible with the Turtle RDF language.

Nouns (nounal expressions) are the names of topic-types and or of topics; in ontology-speak, nouns are class resources or nouns are individual resources but rarely are nouns defined as property resources (and thereby used as a ‘predicate’ in the standard Semantic Web pattern, mentioned above). This noun requirement is a systemic departure from today’s free-for-all that allows nouns to be part of the name of predicates, leading to the construction of problematic ontologies from the perspective of common users.verbsIn a Grover wiki, “property names” are an additional ontology component forming the bedrock of a controlled semantic vocabulary. Being pages in the “Property” namespace means these are prefixed with the namespace name, “Property”. However the XML namespace is directly implied, for instance has:this implies a “has” XML Namespace. The full pagename of this property is “Property:has:this. The tenses of a verb — infinitive, past, present and future — are each an XML namespace, meaning there are separate have, has, had and will-have XML Namespaces. The modalities of a verb are also separate XML Namespace, may and must. Lastly the negation form for verbs (involving not) are additional XML Namespaces.

The “verb” XML Namespace name is only one part of a property name. The other part of a property name is either a preposition or it is a grammatical author. Together, these comprise an enterprise’s controlled semantic vocabulary.

prepositions
As in English grammar, prepositions are used to relate an indirect object or object of a preposition, to a subject in a sentence. Example: “John is at the Safeway” uses a property named “is:at” to yield the triple Person:John -- is:at -- Store:Safeway. There are approximately about one hundred english prepositions possible for any particular verbal XML Namespace. Examples: had:from, has:until and is:in.
articles
As in English grammar, articles such as “a” and “the” are used to relate direct objects or predicate nominatives to a subject in a sentence. As for prepositions above, articles are associated with a verb XML Namespace. Example: has:a:, has:this, has:these, had:some has:some and will-have:some.

adjectivesIn a Grover wiki, definitions in the “category” namespace include adjectives, such as “Public” and “Secure”. These categories are also found in a controlled modifier vocabulary. The category namespace also includes definitions for past participles, such as “Secured” and “Privatized”. Every adjective and past participle is a category in which any topic can be placed. A third subclass of modifiers include ‘adverbs’, categories in which predicate instances are placed.

That’s about all that’s needed to understand Grover, the Business Syntax for Semantic English! Let’s use the Grover syntax to implement a snippet from the Object Management Group’s Semantics of Business Vocabulary and Rules (SBVR) which has statements such as this for “Adopted definition”:

adopted definition
Definition: definition that a speech community adopts from an external source by providing a reference to the definition.
Necessities: (1) The concept ‘adopted definition’ is included in Definition Origin. (2) Each adopted definition must be for a concept in the body of shared meanings of the semantic community of the speech community.

 

Now we can use Grover’s syntax to ‘adopt’ the OMG’s definition for “Adopted definition”.
Concept:Term:Adopted definition -- is:within -- Concept:Definition
Concept:Term:Adopted definition -- is:in -- Category:Adopted
Term:Adopted definition -- is:a -- Concept:Term:Adopted definition
Term:Adopted definition -- is:also -- Concept:Term:Adopted definition
Term:Adopted definition -- is:of -- Association:Object Management Group
Term:Adopted definition -- has:this -- Reference:http://www.omg.org/spec/SBVR/1.2/PDF/
Term:Adopted definition -- must-be:of -- Concept:Semantic Speech Community
Term:Adopted definition -- must-have:some -- Concept:Reference

This simplified but structured English permits the widest possible segment of the populace to participate in constructing and perfecting an enterprise knowledge base built upon the Resource Description Framework.

More complex information can be specified on wikipages using standard wiki templates. For instance to show multiple references on the “Term:Adopted definition” page, the “has:this” wiki template can be used:
{{has:this|Reference:http://www.omg.org/spec/SBVR/1.1/PDF/;Reference:http://www.omg.org/spec/SBVR/1.2/PDF/}}
Multi-lingual text values and resource references would be as follows, using the wiki templates (a) {{has:this}} and (b) {{skos:prefLabel}}
{{has:this |@=en|@en=Reference:http://www.omg.org/spec/SBVR/1.2/PDF/}}
{{skos:prefLabel|@=en;de|@en=Adopted definition|@de=Angenommen definition}}

One important feature of the Grover approach is its modification of our general understanding about how ontologies are built. Today, ontologies specify classes, properties and individuals; a data model emerges from listings of range/domain axioms associated with a propery’s definition. Instead under Grover, an ontology’s data models are explicitly stated with deontic verbs that pair subjects with objects; this is an intuitively stronger and more governable approach for such a critical enterprise resource as the ontology.

Category: Business Intelligence, Enterprise Content Management, Enterprise Data Management, Enterprise2.0, Information Development, Semantic Web
No Comments »

by: Alandduncan
04  Mar  2014

The (Data) Doctor Is In: ADD looks for a data diagnosis…

Being a data management practitioner can be tough.

You’re expected to work your data quality magic, solve other people’s data problems, and help people get better business outcomes. It’s a valuable, worthy and satisfying profession. But people can be infuriating and frustrating, especially when the business user isn’t taking responsibility for the quality of their own data.

It’s a bit like being a Medical Doctor in general practice.

The patent presents with some early indicative symptoms. The MD then performs a full diagnosis and recommends a course of treatment. It’s then up to the patient whether or not they take their MD’s advice…

AlanDDuncan: “Doctor, Doctor. I get very short of breath when I go upstairs.”
MD: Yes, well. Your Body Mass Index is over 30, you’ve got consistently high blood pressure, your heatbeat is arrhythmic, and cholesterol levels are off the scale.”
ADD: “So what does that mean, doctor?”
MD: “It means you’re fat, you drink like a fish, you smoke like a chimney, your diet consists of fried food and cakes and you don’t do any exercise.”
ADD: “I’m Scottish.”
MD: “You need to change your lifestyle completely, or you’re going to die.”
ADD: “Oh. So, can you give me some pills?….”

If you’re going to get healthy with your data, you’ll going to have to put the pies down, step away from the Martinis and get off the couch folks.

Category: Business Intelligence, Data Quality, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Bsomich
01  Mar  2014

Community Update

Missed what happened in the MIKE2.0 community? Check out our bi-weekly update:

 

 
 logo.jpg

Business Drivers for Better Metadata Management

There are a number Business Drivers for Better Metadata Management that have caused metadata management to grow in importance over the past few years at most major organisations. These organisations are focused on more than just a data dictionary across their information – they are building comprehensive solutions for managing business and technical metadata.

Our wiki article on the subject explores many factors contributing to the growth of metadata and guidance to better manage it:  

Feel free to check it out when you have a moment.

Sincerely,MIKE2.0 Community

Contribute to MIKE:

Start a new article, help with articles under construction or look for other ways to contribute.

Update your personal profile to advertise yourself to the community and interact with other members.

Useful Links:
Home Page
Login
Content Model
FAQs
MIKE2.0 Governance

Join Us on
42.gif

Follow Us on
43 copy.jpg

Join Us onimages.jpg

Did You Know?
All content on MIKE2.0 and any contributions you make are published under the Creative Commons license. This allows you free re-use of our content as long as you add a brief reference back to us.

 

This Week’s Blogs for Thought:

Big Data Strategy: Tag, Cleanse, Analyze

Variety is the characteristic of big data that holds the most potential for exploitation, Edd Dumbill explained in his Forbes article Big Data Variety means that Metadata Matters. “The notion of variety in data encompasses the idea of using multiple sources of data to help understand a problem. Even the smallest business has multiple data sources they can benefit from combining. Straightforward access to a broad variety of data is a key part of a platform for driving innovation and efficiency.”

Read more.

The Race to the IoT is a Marathon

The PC era is arguably over and the age of ubiquitous computing might finally be here.  Its first incarnation has been mobility through smartphones and tablets.  Many pundits, though, are looking to wearable devices and the so-called “internet of things” as the underlying trends of the coming decade. It is tempting to talk about the internet of things as simply another wave of computing like the mainframe, mid-range and personal computer.  However there are as many differences as there are similarities.Read more.

The Data of Damocles

While the era of Big Data invokes concerns about privacy and surveillance, we still tender our privacy as currency for Internet/mobile-based services as the geo-location tags, date-time stamps, and other information associated with our phone calls, text messages, emails, and social networking status updates become the bits and bytes of digital bread crumbs we scatter along our daily paths as our self-surveillance avails companies and governments with the data needed to track us, target us with personalized advertising, and terrorize us with the thought of always being watched.

Read more.

 

  Forward this message to a friend

Category: Information Development
No Comments »

by: Ocdqblog
25  Feb  2014

Big Data Strategy: Tag, Cleanse, Analyze

Variety is the characteristic of big data that holds the most potential for exploitation, Edd Dumbill explained in his Forbes article Big Data Variety means that Metadata Matters. “The notion of variety in data encompasses the idea of using multiple sources of data to help understand a problem. Even the smallest business has multiple data sources they can benefit from combining. Straightforward access to a broad variety of data is a key part of a platform for driving innovation and efficiency.”

But the ability to take advantage of variety, Dumbill explained, is hampered by the fact that most “data systems are geared up to expect clean, tabular data of the sort that flows into relational database systems and data warehouses. Handling diverse and messy data requires a lot of cleanup and preparation. Four years into the era of data scientists, most practitioners report that their primary occupation is still obtaining and cleaning data sets. This forms 80% of the work required before the much-publicized investigational skill of the data scientist can be put to use.”

Which begs the question Mary Shacklett asked with her TechRepublic article Data quality: The ugly duckling of big data? “While it seems straightforward to just pull data from source systems,” Shacklett explained, “when all of this multifarious data is amalgamated into vast numbers of records needed for analytics, this is where the dirt really shows.” But somewhat paradoxically, “cleaning data can be hard to justify for ROI, because you have yet to see what clean data is going to deliver to your analytics and what the analytics will deliver to your business.”

However, Dumbill explained, “to focus on the problems of cleaning data is to ignore the primary problem. A chief obstacle for many business and research endeavors is simply locating, identifying, and understanding data sources in the first place, either internal or external to an organization.”

This is where metadata comes into play, providing a much needed context for interpreting data and helping avoid semantic inconsistencies that can stymie our understanding of data. While good metadata has alway been a necessity, big data needs even better metadata. “The documentation and description of datasets with metadata,” Dumbill explained, “enhances the discoverability and usability of data both for current and future applications, as well as forming a platform for the vital function of tracking data provenance.”

“The practices and tools of big data and data science do not stand alone in the data ecosystem,” Dumbill concluded. “The output of one step of data processing necessarily becomes the input of the next.” When approaching big data, the focus on analytics, as well as concerns about data quality, not only causes confusion about the order of those steps, but also overlooks the important role that metadata plays in the data ecosystem.

By enhancing the discoverability of data, metadata essentially replaces hide-and-seek with tag. As we prepare for a particular analysis, metadata enables us to locate and tag the data most likely to prove useful. After we tag which data we need, we can then cleanse that data to remove any intolerable defects before we begin our analysis. These three steps—tag, cleanse, analyze—form the basic framework of a big data strategy.

It all begins with metadata management. As Dumbill said, “it’s not glamorous, but it’s powerful.”

Tags: , ,
Category: Data Quality, Information Development
No Comments »

by: Bsomich
15  Feb  2014

MIKE2.0 Community Update.

Missed what happened in the MIKE2.0 community this week? Read our community update below:

 
 logo.jpg

Data Governance: How competent is your organization?

One of the key concepts of the MIKE2.0 Methodology is that of an Organisational Model for Information Development. This is an organisation that provides a dedicated competency for improving how information is accessed, shared, stored and integrated across the environment.

Organisational models need to be adapted as the organisation moves up the 5 Maturity Levels for organisations in relation to the Information Development competencies below:

Level 1 Data Governance Organisation – Aware

  • An Aware Data Governance Organisation knows that the organisation has issues around Data Governance but is doing little to respond to these issues. Awareness has typically come as the result of some major issues that have occurred that have been Data Governance-related. An organisation may also be at the Aware state if they are going through the process of moving to state where they can effectively address issues, but are only in the early stages of the programme.
Level 2 Data Governance Organisation – Reactive
  • A Reactive Data Governance Organisation is able to address some of its issues, but not until some time after they have occurred. The organisation is not able to address root causes or predict when they are likely to occur. “Heroes” are often needed to address complex data quality issues and the impact of fixes done on a system-by-system level are often poorly understood.
Level 3 Data Governance Organisation – Proactive
  • A Proactive Data Governance Organisation can stop issues before they occur as they are empowered to address root cause problems. At this level, the organisation also conducts ongoing monitoring of data quality to issues that do occur can be resolved quickly.
Level 4 Data Governance Organisation – Managed
Level 5 Data Governance Organisation – Optimal

The MIKE2.0 Solution for the the Centre of Excellence provides an overall approach to improving Data Governance through a Centre of Excellence delivery model for Infrastructure Development and Information Development. We recommend this approach as the most efficient and effective model for building these common set of capabilities across the enterprise environment.

Feel free to check it out when you have a moment and offer any suggestions you may have to improve it.

Sincerely,

MIKE2.0 Community

Contribute to MIKE:Start a new article, help with articles under construction or look for other ways to contribute.

Update your personal profile to advertise yourself to the community and interact with other members.

Useful Links:
Home Page
Login
Content Model
FAQs
MIKE2.0 Governance

Join Us on
42.gif

Follow Us on
43 copy.jpg

Join Us on

images.jpg

Did You Know?
All content on MIKE2.0 and any contributions you make are published under the Creative Commons license. This allows you free re-use of our content as long as you add a brief reference back to us.

 

This Week’s Blogs for Thought:

Share the Love… of Data Quality!

A recent news article on Information-Management.com suggested a link between inaccurate data and “lack of a centralized approach.” But I’m not sure that “lack of centralization” is the underlying issue here; I’d suggest the challenge is generally more down to “lack of a structured approach”, and as I covered in my blog post “To Centralise or not to Centralise, that is the Question”, there are organizational cultures that don’t respond well (or won’t work at all) to a centralized approach to data governance.

Read more.

Avoid Daft Definitions for Sound Semantics

A few weeks ago, while reading about the winners at the 56th Annual Grammy Awards, I saw that Daft Punk won both Record of the Year and Album of the Year, which made me wonder what the difference is between a record and an album. Then I read that Record of the Year is awarded to the performer and the production team of a single song. While Daft Punk won Record of the Year for their song “Get Lucky”, the song was not lucky enough to win Song of the Year (that award went to Lorde for her song “Royals”). My confusion about the semantics of the Grammy Awards prompted a quick trip to Wikipedia, where I learned that Record of the Year is awarded for either a single or individual track from an album.

Read more.

Social Data: Asset and Liability

In late December of 2013, Google Chairman Eric Schmidt admitted that ignoring social networking had been a big mistake. ”I guess, in our defense, we were busy working on many other things, but we should have been in that area and I take responsibility for that,” he said.
Brass tacks: Google’s misstep and Facebook’s opportunistic land grab of social media have resulted in a striking data chasm between the two behemoths. As a result, Facebook can do something that Google just can’t.

Read more.

Forward to a Friend!Know someone who might be interested in joining the Mike2.0 Community? Forward this message to a friend

Questions? Please email us at mike2@openmethodology.org.

Category: Information Development
No Comments »

Calendar
Collapse Expand Close
TODAY: Wed, October 22, 2014
October2014
SMTWTFS
2829301234
567891011
12131415161718
19202122232425
2627282930311
Recent Comments
Collapse Expand Close