Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Completion of the Data Quality Assessment Report Deliverable Template

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search
Under review.png
This article is a stub. It is currently undergoing major changes as it is in the very early stages of development and is only a placeholder. Please help improve MIKE2.0 by adding to this article.
This deliverable template is used to describe a sample of the MIKE2.0 Methodology (typically at a task level). More templates are now being added to MIKE2.0 as this has been a frequently requested aspect of the methodology. Contributors are strongly encouraged to assist in this effort.
Deliverable templates are illustrative as opposed to fully representative. Please help add examples to this template that are representative of the proposed output.

The Completion of the Data Quality Assessment Report step will complete and issue for signoff the Data Quality Assessment Report. Sections of the Data Quality Assessment Report and metadata repository should be populated throughout the End-to-End profiling exercise. This step is to complete remaining sections and to make a final recommendation on whether this data should be loaded into the target system. On completion, there should be a formal walkthrough, review and final signoff.

Contents

= Examples

Listed below is an example of Data Quality Assessment Completion Report:

Example for a Data Quality Assessment completion report

forfait sosh rio b and you portabilité calcul IMC rio orange

Introduction

Overview


< Include a short paragraph about the Information Managemenrt Programme, Client Objectives and any other general information


>


< Also includes a history of the project. Who is the client, why is the project being done, what is the overall deliverable? What stage is the project at when this document is being written? What will be done with after this document is complete?


>

Purpose of Document


< What is the purpose of this document? Can you use an action word: defines, documents, outlines. Or is this a response to a request, in that it merely reports? Wll this document be updated as the project timeframe continues, or will appendices be added? Some of this information will influence the scope.


>


< This is not the purpose of the project itself - that is in section two. Rather, what is this specific documents purpose?


>

Inputs to this Document

Input to this deliverable include the Business and Technical Blueprint, HL Information Requirements, Roadmap Overview and functional scope for this increment.

Output of this document

<< The output of this document should provide the business a summary of key data quality issues and in some cases some identified business rules.

Mike 2 Methodology


<< This template is associated with or relates to Activity 3.11 of the MIKE2 Methodology.

In Scope


< Define the scope of this deliverable


>

Out of Scope

< Outline the items that are out of scope for this deliverable

>

Approach

< This section should narrow in focus. Start with an overall, high-level summary of the project, and then finish with the work and efforts of producing the data summary. That way the assumptions can talk about the work of data quality.

>

The objectives of the data quality investigation were to:

< Identify the state of mandatory data fields.

>

< Identify the state of additional data fields.

>

< Reference to requirements of project if they exist.

>

Investigations run against the data included:

< Character Discrete – individual field profiling.

>

< Character Concatenate – group / cross field profiling.

>

< Word Investigation – data rule profiling.

>

< Were any business rules tested?

>

Assumptions

< Manage expectations. Define what access you have been given to the systems. Is there any natural evolution of the systems that could change the data whilst/after analysis is conducted?

>

< Which systems are being investigated?

>

< Are there any common systems that are not being investigated?

>

Audience


< Describe the intended audience for this document. Was this document produced for management, analyst and technical leads


>

Data Quality Findings

This section presents the results of the investigations performed.

Old rule of writing: Tell the reader what you are going to tell them; tell them; and then tell them again. So, before going into the detail of the individual tests you did, use the first paragraph(s) to define what the overall result was. If there are a lot of data sources or sections to cover, give each one a brief section each. The example below shows how this can be accomplished:

>

Example

The frequency of data quality exceptions in Source A was low, but weighted towards highly significant issues. Some critical data quality issues were found, including "dummy" client numbers being used for quoting purposes and incomplete ABN details.

Source B contained several critical data quality issues. Examples are incomplete contact details for children policy owners and phone numbers in incorrect formats. The significance of these issues makes Source B a strong consideration for initial data cleansing actions.

Investigation of Business Rules showed that the business would be unable to apply the majority of these until data quality issues in source tables are resolved. There is a heavy reliance on Activity Date within Business Rules. The previous data quality analysis showed that this information is a critical or high priority issue in data sources.

< This lets people who are skimming the document get the critical information.

>

Key investigation findings have been plotted on a Significance Vs. Percentage of Exceptions graph. The graph is based on the level of importance of the test towards achieving the required data quality standard, and the percentage of exceptions raised for the test.

< This sort of graph is excellent to visually communicate with the reader, especially audience members who do not benefit from detailed text. The success of communicating via this graph is to clearly define what makes a data quality issue significant. The frequency of an error is self-explanatory, but significance is what will move the issue from left-to-right in the following graph. A table to indicate significance is included in the Appendices, but consider promoting it to this section. Why wouldn

’ t you? There is too much content in this summary.

>

The quadrants of the graph are defined as:

  • Critical DQ Issues – data quality issues that are highly significant, and caused a high number of exceptions during testing. These DQ exceptions need to be resolved as highest priority.
  • Prevalent but not Significant – data quality issues that occur frequently but are low in priority in achieving rapid data quality improvement.
  • Low Priority – issues that would improve the overall data quality, but are low impact and frequency.
  • High Priority – high significance issues that have a low number of exceptions, which can quickly improve the overall data quality.

< Now the different sections of investigation are covered. This template assumes that there are multiple disparate data sources; such as separate databases that contain tables and or views. Accordingly the reporting of data quality issues, especially character discrete analysis, is covered in each section for a specific table.

>

< However, when data from multiple sources or tables is combined, it is commonly done so via business rules. A section at the end exists for this and depending on the project specifics, it may be prudent to bring this information to the fore of the data analysis findings.

>

Data Source A

Investigation of Data Source A was carried out on the following groups:


< Table, column details.


>


< Subset of data, selected from greater population?


>


< Any specific considerations applied to the data. Was it filtered to only include certain records? Was it at a point in time, and as such may have changed?


>

The following graph plots Significance Vs. Percentage of Exceptions for Data Source A:

Center

Data quality issues that fall within the Critical and High Priority quadrants should be resolved first, including:

The following tables detail relevant statistics for each group of Data Source A.

Investigation A.1

Data Source  
No. Records  
Investigation Field Valid Data % Incorrect % Null (%) Comments
 


< Examples.


>

 


< Examples.


>

   


< Incorrect and Null data may be plotted separately on the graph.


>

Additional comments for Investigation A.1:

Data Source B

Investigation of Data Source B was carried out on the following groups:


< Table, column details.


>


< Subset of data, selected from greater population?


>


< Any specific considerations applied to the data. Was it filtered to only include certain records? Was it at a point in time, and as such may have changed?


>

The following graph plots Significance Vs. Percentage of Exceptions for Data Source B:

Center

Data quality issues that fall within the Critical and High Priority quadrants should be resolved first, including:

The following tables detail relevant statistics for each group of Data Source B.


< Consider swapping this information in tables to landscape view.


>

Investigation B.1

Data Source  
No. Records  
Investigation Field Valid Data % Incorrect % Null (%) Comments
 


< Examples.


>

 


< Examples.


>

   


< Incorrect and Null data may be plotted separately on the graph.


>

Additional comments for Investigation A.1:

Business Rule Analysis

Business Rule.1

< First define the business rule in plain English. Not pseudo code or SQL; plain English. What needs to be brought together from the business side to deliver what result? What useful new information does it deliver to the business? Moreover, how is it used?

>

< Then, if appropriate, present the literal definition of the business rule. This is important if a mathematical equation or statistical result is being presented.

>

Investigations were performed to test the adherence of records to these business rules. The following observations are noted:

< Patterns in the data.

>

< Common problems in completing the business rule.

>

< How many records pass or fail the business rule.

>

< Or, what percentages does the business rule divide records into?

>

< Tables may be useful to communicate some of the above information.

>

< In conclusion, make a recommendation as to the success of the business rule. Can it be implemented now, or does there need to be more work completed? If any reference is made to data quality issues in sources, double check that you have mentioned these in the appropriate section. Use hyperlinks to help readers navigate the electronic copy of this document.

>

Appendices

Investigation Results Spreadsheets

< Insert link to file(s) of results.

>

Definitions of Data Quality Importance/Significance

Significance Rating Definition Expected Actions
Notation 0 Data Quality errors that affect source information that is inactive or scheduled for discontinuation. None. Analysis of such a DQ issue is performed to explain a recurring issue that could appear significant on first impressions.
Low 1 Mistakes and incorrect formatting, etc. in additional data fields. Suitable opportunity for further education, training of users.
Low/Medium 2 Highly visible data Quality issues that affect the look and feel of data and its presentation. Often, high volume errors that necessitate data cleansing activities.
Medium 3 Any Data Quality issues that affect core pieces of information, but are not used in critical activities. Consider the likely prevalence of such an error when assigning significance, as this may warrant a higher ranking.
Medium/High 4 Data Quality lapses that affect data used to support the main decision making processes or business activities. Assessed for mitigating strategies such as automated transformations, cleansing.
High 5 Inaccuracies that weaken conclusions being drawn from data or stop communication. Depending on occurrence may be suitable for action before critical errors.
Very High 6 Data failings that prevent critical decisions from being made. Prioritised based on prevalence and targeted for immediate correction.

Data Matrix per Data Source Investigation

< This is the data used to construct the graphs featured in the report.

>

Data Source A

Investigation Field Issue Importance/
Significance
Occurrence
       

Data Source B

Investigation Field Issue Importance/
Significance
Occurrence
       
Wiki Contributors
Collapse Expand Close