Archive for the ‘Information Strategy’ Category
Information overload is as much an overwhelming feeling as it is a measurable reality. We often feel an impossible obligation to be across everything, which leaves us wanting to give up and absorb nothing that hits our various screens. Despite all this, the good news is that the majority of the information we need seems to appear just in time.
Where does that leave those of us who are control freaks? I am not comfortable to know that the right information will find me the majority of the time. I want to know that the information I need is guaranteed to find me every time!
The trouble is, guarantees are expensive. This is related to the debate between search based big data solutions and enterprise data warehouses. Google provides a “near enough” search solution that, given the massive amount of data it trawls through, usually seems to find what we need. Knowledge and business intelligence solutions provide the predictable information flows but come at a huge cost.
Of course, the real sense of serendipity comes when information arrives unsought just when we need it. It can come through the right article being highlighted in a social media feed, a corporate policy being forwarded or the right coffee conversation with a colleague. Of course, serendipity isn’t random coincidence and there is much we can do to improve the odds of it happening when we need it most.
Before doing so, it is important to know what things have to be predictable and reliable. A list is likely to include financial reports, approvals and other controls. What’s more, a scan of any email inbox is likely to show a significant number of messages that need to be read and often actioned. Despite its tyranny on our working lives, email works too well!
Serendipity depends on the quality of our networks, both in terms of who we know and the amount of activity the passes between the nodes. A good way to understand the power of relationships in an information or social network is through the theory of “small worlds” (see chapter 5 of my book Information-Driven Business).
Ironically, in an era when people talk about electronic isolation, social networks, that is who we know, are more important than ever. Serendipity relies on people who we know, at least vaguely, promoting content in a way that we are likely to see.
Just as control freaks worry about relying on serendipity, those that are more relaxed run the risk of relying too much on information finding its way mysteriously to them at the right time. Those that don’t understand why it works, won’t understand when it won’t work.
Far from making experts and consultants redundant, this increasing trend towards having the right information available when it’s needed is making them more necessary than ever before. The skill experts bring is more than information synthesis, something that artificial intelligence is increasingly good at doing and will become even better at in the near future. The job of experts is to find connections that don’t exist on paper, the cognitive leaps that artificial intelligence can’t achieve (see Your insight might just save your job).
The first thing is to be active posting updates. Networks operate through quid quo pro, in the long-term we get back as much as we give. In the office, we call this gossip. Too much gossip and it just becomes noise but the right amount and you have an effective social network. Those people who only ever silently absorb information from their colleagues quickly become irrelevant to their social circle and gradually get excluded.
The second is to be constantly curious, like a bowerbird searching and collecting shiny pieces of information, without necessarily knowing how they will all fit together. The great thing about our modern systems is that massive amounts of tagged content is easy to search in weeks, months and years to come.
Finally, have some sort of framework or process for handling information exchange and picking a channel based on: criticality (in which case email is still likely to be the best medium), urgency (which favours various forms of messaging for brief exchanges), targeted broadcast (which favours posts explicitly highlighted/copied to individuals) or general information exchange (which favours general posts with curated social networks). Today, this is very much up to each individual to develop for themselves, but we can expect it to be part of the curriculum of future generations of children.
No matter how often it seems to happen, almost by magic, information serendipity is no accident and shouldn’t be left to chance.
Stop reading now if your organisation is easier to navigate today than it was 3, 5 or 10 years ago. The reality that most of us face is that the general ledger that might have cost $100,000 to implement twenty or so years ago will now cost $1 million or even $10 million. Just as importantly, it is getting harder to implement new products, services or systems.
The cause of this unsustainable business malaise is the complexity of the technology we have chosen to implement.
For the general ledger it is the myriad of interfaces. For financial services products it is the number of systems that need to keep a record of every aspect of business activity. For telecommunications it is the bringing together the OSS and BSS layers of the enterprise. Every function and industry has its own good reasons for the added complexity.
However good the reasons, the result is that it is generally easier to innovate in a small nimble enterprise, even a start-up, than in the big corporates that are the powerhouse of our economies.
While so much of the technology platform creates efficiencies, often enormous and essential to the productivity of the enterprise, it generally doesn’t support or even permit rapid change. It is really hard to design the capacity to change into the systems that support the organisation. The more complex an environment becomes the harder it is to implement change.
Most organisations recognise the impact of complexity and try to reduce it by implementing an enterprise architecture in one form or another. Supporting the architecture is a set of principles which, if implemented in full, will support consistency and dramatically reduce the cost of change. Despite the best will in the world, few businesses or governments succeed in realising their lofty architectural principles.
The reason is that, while architecture is seen as the solution, it is too hard to implement. Most IT organisations run their business through a book of projects. Each project signs-up to an architecture but quickly implements compromises as challenges arise.
It’s no wonder that architects are perhaps the most frustrated of IT professionals. At the start of each project they get wide commitment to the principles they espouse. As deadlines loom, and the scope evolves, project teams make compromises. While each compromise may appear justified they have the cumulative effect of making the organisation more rather than less complex.
Complexity has a cost. If this cost is full appreciated, the smart organisation can see the value in investing in simplification.
While architects have a clear vision of what “simple” looks like, they often have a hard time putting a measure against it. It is this lack of a measure that makes the economics of technology complexity hard to manage.
Increasingly though, technologists are realising that it is in the fragmentation of data across the enterprise that real complexity lies. Even when there are many interacting components, if there is a simple relationship between core information concepts then the architecture is generally simple to manage.
Simplicity can be achieved through decommissioning (see Value of decommissioning legacy systems) or by reducing the duplication of data. This can be measured using the Small Worlds measure as described in MIKE2.0 or chapter 5 of my book Information-Driven Business. The idea is further extended as “Hillard’s Graph Complexity” in Michael Blaha’s book, UML Database Modelling Workbook.
In summary, the measure looks at how many steps are required to bring together key concepts such as customer, product and staff. The more fragmented information is, the more difficult any business change or product implementation becomes.
Consider the general ledger discussed earlier. In its first implementation in the twentieth century, each key concept associated with the chart of accounts would have been managed in a master list whereas by the time we implement the same functionality today there would be literally hundreds if not thousands of points where various parts of the chart of accounts are required to index interfaces to subsidiary systems across the enterprise.
One approach to realising these benefits is to have dedicated simplification projects. Unfortunately these are the first projects that get cut if short-term savings are needed.
Alternatively, imagine if every project that adds complexity (a little like adding pollution) needed to offset that complexity with equal and opposite “simplicity credits”. Having quantified complexity, architects are well placed to define whether each new project simplifies the enterprise or adds complexity.
Some projects simply have no choice but to add complexity. For example, a new marketing campaign system might have to add customer attributes. However, if they increase the complexity they should buy simplicity “offsets” a little like carbon credits.
The implementation of a new general ledger might provide a great opportunity to reduce complexity by bringing various interfaces together or it could add to it by increasing the sophistication of the chart of the accounts.
In some cases, a project may start off simplifying the enterprise by using enterprise workflow or leveraging a third-party cloud solution, however in the heat of implementation be forced to make compromises that make it a net complexity “polluter”.
The CIO has a role to act as the steward of the enterprise and measure this complexity. Project managers should not be allowed to forget their responsibility to leave the organisation cleaner and leaner at the conclusion of their project. They should include the cost of this in their project budget and purchase offsetting credits from others if they cannot deliver within the original scope due complicating factors.
Those that are most impacted by complexity can pick their priority areas for funding. Early wins will likely reduce support costs and errors in customer service. Far from languishing in the backblocks of the portfolio, project managers will be queueing-up to rid the organisation of many of these long-term annoyances to get the cheapest simplicity credits that they can find!
Anthropologist Robin Dunbar has used his research in primates over recent decades to argue that there is a cognitive limit to the number of social relationships that an individual can maintain and hence a natural limit to the breadth of their social group. In humans, he has proposed that this number is 150, the so-called “Dunbar’s number”.
In the modern organisation, relationships are maintained using data. It doesn’t matter whether it is the relationship between staff and their customers, tracking vendor contracts, the allocation of products to sales teams or any other of the literally thousands of relationships that exist, they are all recorded centrally and tracked through the data that they throw off.
Social structures have evolved over thousands of years using data to deal with the inability of groups of more than 150 to effectively align. One of the best examples of this is the 11th century Doomsday Book ordered by William the Conqueror. Fast forward to the 21st century and technology has allowed the alignment of businesses and even whole societies in ways that were unimaginable 50 years ago.
Just as a leadership team needs to have a group of people that they relate to that falls within the 150 of Dunbar’s number, they also need to rely on information which allows the management system to extend that span of control. For the average executive, and ultimately for the average executive leadership team, this means that they can really only keep a handle on 150 “aspects” of their business, reflected in 150 “key data elements”. These elements anchor data sets that define the organisation.
Key Data Elements
To overcome the constraints of Dunbar’s number, mid-twentieth century conglomerates relied on a hierarchy with delegated management decisions whereas most companies today have heavily centralised decision making which (mostly) delivers a substantial gain in productivity and more efficient allocation of capital. They can only do this because of the ability to share information efficiently through the introduction of information technology across all layers of the enterprise.
This sharing, though, is dependent on the ability of an executive to remember what data is important. The same constraint of the human brain to know more than 150 people also applies to the use of that information. It is reasonable to argue that the information flows have the same constraint as social relationships.
Observing hundreds of organisations over many years, the variety of key data elements is wide but their number is consistently in the range of one to a few hundred. Perhaps topping out at 500, the majority of well-run organisations have nearer to 150 elements dimensioning their most important data sets.
While decisions are made through metrics, it is the most important key data elements that make up the measures and allow them to be dimensioned.
Although organisations have literally hundreds of thousands of different data elements they record, only a very small number are central to the running of the enterprise. Arguably, the centre can only keep track of about 150 and use them as a core of managing the business.
Another way of looking at this is that the leadership team (or even the CEO) can really only have 150 close relationships. If each relationship has one assigned data set or key data element they are responsible for then the overall organisation will have 150.
Choosing the right 150
While most organisations have around 150 key data elements that anchor their most important information, few actually know what they are. That’s a pity because the choice of 150 tells you a lot about the organisation. If the 150 don’t encompass the breadth of the enterprise then you can gain insight into what’s really important to the management team. If there is little to differentiate the key data elements from those that a competitor might choose then the company may lack a clear point of difference and be overly dependent on operational excellence or cost to gain an advantage.
Any information management initiative should start by identifying the 150 most important elements. If they can’t narrow the set down below a few hundred, they should be suspicious they haven’t gotten to the core of what’s really important to their sponsors. They should then look to ask the question of whether these key data elements span the enterprise or pick organisational favourites; whether they offer differentiation or are “me too” and whether they are easy or hard for a competitor to emulate.
The identification of the 150 key data elements provides a powerful foundation for any information and business strategy. Enabling a discussion on how the organisation is led and managed. While processes evolve quickly, the information flows persist. Understanding the 150 allows a strategist to determine whether the business is living up to its strategy or if its strategy needs to be adjusted to reflect the business’s strengths.
Technology can make us lazy. In the 1970s and 80s we worried that the calculator would rob kids of insight into the mathematics they were learning. There has long been evidence that writing long-hand and reading from paper are far superior vehicles for absorbing knowledge than typing and reading from a screen. Now we need to wonder whether that ultimate pinnacle of humanity’s knowledge, the internet, is actually a negative for businesses and government.
The internet has made a world of experience available to anyone who is willing to spend a few minutes seeking out the connections. Increasingly we are using big data analytics to pull this knowledge together in an automated way. Either way, the summed mass of human knowledge often appears to speak as one voice rather than the cacophony that you might expect of a crowd.
Is the crowd killing brilliance?
The crowd quickly sorts out the right answer from the wrong when there is a clear point of reference. The crowd is really good at responding to even complex questions. The more black or white the answer is, the better the crowd is at coming to a conclusion. Even creative services, such as website design, are still problems with a right or wrong answer (even if there is more than one) and are well suited to crowd sourcing.
As the interpretation of the question or weighting of the answer becomes more subjective, it becomes harder to discern the direction that the crowd is pointing with certainty. The lone voice with a dissenting, but insightful, opinion can be shouted down by the mob.
The power of the internet to answer questions is being used to test new business ideas just as quickly as to find out the population of Nicaragua. Everything from credit cards to consumer devices are being iteratively crowd sourced and crowd tested to great effect. Rather than losing months to focus groups, product design and marketing, smart companies are asking their customers what they want, getting them involved in building it and then getting early adopters to provide almost instant feedback.
However, the positive can quickly turn negative. The crowd comments early and often. The consensus usually reinforces the dominant view. Like a bad reality show, great ideas are voted off before they have a chance to prove themselves. If the idea is too left-field and doesn’t fit a known need, the crowd often doesn’t understand the opportunity.
Automating the crowd
In the 1960s and 1970s, many scientists argued that an artificial brain would display true intelligence within the bounds of the twentieth century. Research efforts largely ground to a halt as approach after approach turned out to be a dead-end.
Many now argue that twenty-first century analytics is bridging the gap. By understanding what the crowd has said and finding the response to millions, hundreds of millions and even billions of similar scenarios the machine is able to provide a sensible response. This approach even shows promise of meeting the famous Turning test.
While many argue that big data analytics is the foundation of artificial intelligence, it isn’t providing the basis of brilliant or creative insight. IBM’s Watson might be able to perform amazing feats in games of Jeopardy but the machine is still only regurgitating the wisdom of the crowd in the form of millions of answers that have been accumulated on the internet.
No amount of the crowd or analytics can yet make a major creative leap. This is arguably the boundary of analytics in the search for artificial intelligence.
Digital Disruption could take out white collar jobs
For the first time digital disruption, using big data analytics, is putting white collar jobs at the same risk of automation that blue collar worker have had to navigate over the last fifty years. Previously we assumed process automation would solve everything, but our organisations have become far too complex.
Business process management or automation has reached a natural limit in taking out clerical workers. As processes have become more complex, and their number of interactions has grown exponentially, it has become normal for the majority of instances to display some sort of exception. Employees have gone from running processes to handling exceptions. The change in job function has largely masked the loss of traditional clerical works since the start of mass rollout of business IT.
Most of this exception handling, though, requires insight but no intuitive leap. When asked, employees will tell you that their skill is to know how to connect the dots in a standard way to every unique circumstance.
Within organisations, email and, increasingly, social platforms have been the tools of choice for collaboration and crowdsourcing solutions to individual process exceptions. Just as big data analytics is automating the hunt for answers on the internet, it is now starting to offer the promise of the same automation within the enterprise.
In the near future, applications driven by big data analytics will allow computers to move from automating processes to also handling any exceptions in a way that will feel almost human to customers of everything from bank mortgages to electric utilities.
Where to next for the jobs?
Just as many white collar jobs have moved from running processes in the 70s and 80s to handling their exceptions in the 90s and new millennium, these same jobs need to move now to find something new.
At the same time, the businesses they work for are being disrupted by the same digital forces and are looking for new sources of revenue.
These two drivers may come together to offer an opportunity for those who spent their time handling exceptions either for customer or internal processes. Future opportunities are in spotting opportunities in business through intuitive insights and creative leaps and turning them into product or service inventions rather than seeking permission from the crowd who will force a return to the conservative norm.
Perhaps this is why design thinking and similar creative approaches to business have suddenly joined the mainstream.
Data profiling is an excellent diagnostic method for gaining additional understanding of the data. Profiling the source data helps inform both business requirements definition and detailed solution designs for data-related project, as well as enabling data issues to be managed ahead of project implementation.
Profiling of a data set will be measured with reference to and agreed Data Quality Dimensions (e.g. per those proposed in the recent DAMA white paper).
Profiling may be required at several levels:
• Simple profiling with a single table (e.g. Primary Key constraint violations)
• Medium complexity profiling across two or more interdependent tables (e.g. Foreign Key violations)
• Complex profiling across two or more data sets, with applied business logic (e.g. reconciliation checks)
Note that field-by-field analysis is required to truly understand the data gaps.
Any data profiling analysis must not only identify the issues and underlying root causes, but must also identify the business impact of the data quality problem (measured by effectiveness, efficiency, risk inhibitors). This will help identify any value in remediating the data – great for your data quality Business Case. Root cause analysis also helps identify any process outliers and and drives out requirements for remedial action on managing any identified exceptions.
Be sure to profile your data and take baseline measures before applying any remedial actions – this will enable you to measure the impact of any changes.
I strongly recommend Data Quality Profiling and root-cause analysis to be undertaken as an initiation activity as part of all data warehouse, master data and application migration project phases.
Over the years, I’ve tended to find that asking any individual or group the question “What data/information do you want?” gets one of two responses:
“I don’t know.” Or;
“I don’t know what you mean by that.”
End of discussion, meeting over, pack up go home, nobody is any the wiser. Result? IT makes up the requirements based on what they think the business should want, the business gets all huffy because IT doesn’t understand what they need, and general disappointment and resentment ensues.
Clearly for Information Management & Business Intelligence solutions, this is not a good thing.
So I’ve stopped asking the question. Instead, when doing requirements gathering for an information project, I go through a workshop process that follows the following outline agenda:
Context setting: Why information management / Business Intelligence / Analytics / Data Governance* is generally perceived to be a “good thing”. This is essentially a very quick précis of the BI project mandate, and should aim at putting people at ease by answering the question “What exactly are we all doing here?”
(*Delete as appropriate).
Business Function & Process discovery: What do people do in their jobs – functions & tasks? If you can get them to explain why they do those things – i.e. to what end purpose or outcome – so much the better (though this can be a stretch for many.)
Challenges: what problems or issues do they currently face in their endeavours? What prevents them from succeeding in their jobs? What would they do differently if they had the opportunity to do so?
Opportunities: What is currently good? Existing capabilities (systems, processes, resources) are in place that could be developed further or re-used/re-purposed to help achieve the desired outcomes?
Desired Actions: What should happen next?
As a consultant, I see it as part of my role to inject ideas into the workshop dialogue too, using a couple of question forms specifically designed to provoke a response:
“What would happen if…X”
“Have you thought about…Y”
“Why do you do/want…Z”.
Notice that as the workshop discussion proceeds, the participants will naturally start to explore aspects that relate to later parts of the agenda – this is entirely ok. The agenda is there to provide a framework for the discussion, not a constraint. We want people to open up and spill their guts, not clam up. (Although beware of the “rambler” who just won’t shut up but never gets to the point…)
Notice also that not once have we actively explored the “D” or “I” words. That’s because as you explore the agenda, any information requirements will either naturally fall out of the discussion as it proceed, or else you can infer the information requirements arising based on the other aspects of the discussion.
As the workshop attendees explore the different aspects of the session, you will find that the discussion will touch upon a number of different themes, which you can categorise and capture on-the-fly (I tend to do this on sheets of butchers paper tacked to the walls, so that the findings are shared and visible to all participants.). Comments will typically fall into the following broad categories:
* Functions: Things that people do as part of doing business.
* Stakeholders: people who are involved (including helpful people elsewhere in the organisation – follow up with them!)
* Inhibitors: Things that currently prevent progress (these either become immediate scope-change items if they are show-stoppers for the current initiative, or else they form additional future project opportunities to raise with management)
* Enablers: Resources to make use of (e.g. data sets that another team hold, which aren’t currently shared)
* Constraints: “non-negotiable” aspects that must be taken into account. (Note: I tend to find that all constraints are actually negotiable and can be overcome if there is enough desire, money and political will.)
* Considerations: Things to be aware of that may have an influence somewhere along the line.
* Source systems: places where data comes from
* Information requirements: Outputs that people want
Here’s a (semi) fictitious example:
e.g. ADD: “What does your team do?”
Workshop Victim Participant #1: “Well, we’re trying to reconcile the customer account balances with the individual transactions.”
ADD: And why do you wan to do that?
Workshop Victim Participant #2: “We think there’s a discrepancy in the warehouse stock balances, compared with what’s been shipped to customers. The sales guys keep their own database of customer contracts and orders and Jim’s already given us dump of the data, while finance run the accounts receivables process. But Sally the Accounts Clerk doesn’t let the numbers out under any circumstances, so basically we’re screwed.”
Functions: Sales Processing, Contract Mangement, Order Fulfilment, Stock Management, Accounts Receivable.
Stakeholders: Warehouse team, Sales team (Jim), Finance team.
Inhibitors: Finance don’t collaborate.
Enablers: Jim is helpful.
Source Systems: Stock System, Customer Database, Order Management, Finance System.
Information Requirements: Orders (Quantity & Price by Customer, by Salesman, by Stock Item), Dispatches (Quantity & Price by Customer, by Salesman, by Warehouse Clerk, by Stock Item), Financial Transactions (Value by Customer, by Order Ref)
You will also probably end up with the attendees identifying a number of immediate self-assigned actions arising from the discussion – good ideas that either haven’t occurred to them before or have sat on the “To-Do” list. That’s your workshop “value add” right there….
Workshop Victim Participant #1: “I could go and speak to the Financial Controller about getting access to the finance data. He’s more amenable to working together than Sally, who just does what she’s told.”
Happy information requirements gathering!
For years now the physics community has been taking the leap into computer science through the pursuit of the quantum computer. As weird as the concepts underpinning the idea of such a device are, even weirder is the threat that this machine of the future could pose to business and government today.
There are many excellent primers on quantum computing but in summary physicists hope to be able to use the concept of superposition to allow one quantum computer bit (called a “qubit”) to carry the value of both zero and one at the same time and also to interact with other qubits which also have two simultaneous values.
A quantum computer would be hoped to come up answers to useful questions with far fewer processing steps than a conventional computer as many different combinations would be evaluated at the same time. Algorithms that use this approach are generally in the category of solution finding (best paths, factors and other similar complex problems).
As exciting as the concept of a quantum computer sounds, one of the applications of this approach would be a direct threat to many aspects of modern society. Shor’s algorithm provides an approach to integer factorisation using a quantum computer which is like a passkey to the encryption used across our digital world.
The cryptography techniques that dominate the internet are based on the principle that it is computationally infeasible to find the factors of a large number. However, Shor’s algorithm provides an approach that would crack the code if a quantum computer could actually be built.
Does it matter today?
We’re familiar with businesses of today being disrupted by new technology tomorrow. But just as weird, as the concept of quantum superposition is the possibility that the computing of tomorrow could disrupt the business of today!
We are passing vast quantities of data across the internet. Much of it is confidential and encrypted. Messages that we are confident will remain between the sender and receiver. These include payments, conversations and, through the use of virtual private networks, much of the internal content of both companies and government.
It is possible that parties hoping to crack this content in the future are taking the opportunity to store it today. Due to the architecture of the internet, there is little to stop anyone from intercepting much of this data and storing it without anyone having any hint of its capture.
In the event that a quantum computer, capable of running Shor’s algorithm, is built the first thought will need to be to ask what content could have been intercepted and what secrets might be open to being exposed. The extent of the exposure could be so much greater than might appear at first glance.
How likely is a quantum computer to be built?
There is one commercially available device marketed as a quantum computer, called the D-Wave (from D-Wave Systems). Sceptics, however, have published doubts that it is really operating based on the principles of Quantum Computing. Even more importantly, there is no suggestion that it is capable of running Shor’s algorithm or that it is a universal quantum computer.
There is a great deal of evidence that the principles of quantum computing are consistent with the laws of physics as they have been uncovered over the past century. At the same time as physics is branching into computing, the information theory branch of computing is expanding into physics. Many recent developments in physics are borrowing directly from the information discipline.
It is possible, though, that information theory as applied to information management problems could provide confidence that a universal quantum computer is not going to be built.
Information entropy was initially constructed by Claude Shannon to provide a tool for quantifying information. While the principles were deliberately analogous to thermal entropy, it has subsequently become clear that the information associated with particles is as important as the particles themselves. Chapter 6 of my book, Information-Driven Business, explains these principles in detail.
It turns out that systems can be modelled on information or thermal entropy interchangeably. As a result, a quantum computer that needs to obey the rules of information theory also needs to obey the laws of thermal entropy.
The first law of thermodynamics was first written by Rudolf Clausius in 1850 as: “In all cases in which work is produced by the agency of heat, a quantity of heat is consumed which is proportional to the work done; and conversely, by the expenditure of an equal quantity of work an equal quantity of heat is produced”.
Rewording over time has added sophistication but fundamentally, the law is a restatement of the conservation of energy. Any given system cannot increase the quantity of energy or, as a consequent of the connection between thermal and information entropy, the information that it contains.
Any computing device, regardless of whether it is classical or quantum in nature, consumes energy based on the amount of information that is being derived as determined by the information entropy of the device. While it is entirely possible that massive quantities of information could be processed in parallel, there is no escaping the requirement to adhere to this requirement with a quantum computer truly delivering this level of computing requiring the same order of energy as the thousands or even millions of classical computers required to deliver the same result.
I anticipate that developers of quantum computers will either find that the quantity of energy required to process is prohibitive or that their qubits will constantly frustrate their every effort to maintain coherence for long enough to complete useful algorithms.
Could I be wrong?
Definitely! In a future post I propose to create a scorecard tracking the predictions I’ve made over the years.
However, anyone who claims to really understand quantum mechanics is lying. Faced with the unbelievably complex wave functions required for quantum mechanics which seem to defy any real world understanding, physicist David Mermin famously advised his colleagues to just “Shut up and calculate!”.
Because of the impact of a future quantum computer on today’s business, the question is far from academic and deserves almost as much investment as the exploration of these quantum phenomena do in their own right.
At the same time, the investments in quantum computing are far from wasted. Even if no universal quantum computer is possible, the specialised devices that are likely to follow the D-Wave machine are going to prove extremely useful in their own right.
Ultimately, the convergence of physics and computer science can only benefit both fields as well as the business and government organisations that depend on both.
Why estimating Data Quality profiling doesn’t have to be guess-work
Data Management lore would have us believe that estimating the amount of work involved in Data Quality analysis is a bit of a “Dark Art,” and to get a close enough approximation for quoting purposes requires much scrying, haruspicy and wet-finger-waving, as well as plenty of general wailing and gnashing of teeth. (Those of you with a background in Project Management could probably argue that any type of work estimation is just as problematic, and that in any event work will expand to more than fill the time available…).
However, you may no longer need to call on the services of Severus Snape or Mystic Meg to get a workable estimate for data quality profiling. My colleague from QFire Software, Neil Currie, recently put me onto a post by David Loshin on SearchDataManagement.com, which proposes a more structured and rational approach to estimating data quality work effort.
At first glance, the overall methodology that David proposes is reasonable in terms of estimating effort for a pure profiling exercise – at least in principle. (It’s analogous to similar “bottom/up” calculations that I’ve used in the past to estimate ETL development on a job-by-job basis, or creation of standards Business Intelligence reports on a report-by-report basis).
I would observe that David’s approach is predicated on the (big and probably optimistic) assumption that we’re only doing the profiling step. The follow-on stages of analysis, remediation and prevention are excluded – and in my experience, that’s where the real work most often lies! There is also the assumption that a pre-existing checklist of assessment criteria exists – and developing the library of quality check criteria can be a significant exercise in its own right.
However, even accepting the “profiling only” principle, I’d also offer a couple of additional enhancements to the overall approach.
Firstly, even with profiling tools, the inspection and analysis process for any “wrong” elements can go a lot further than just a 10-minute-per-item-compare-with-the-checklist, particularly in data sets with a large number of records. Also, there’s the question of root-cause diagnosis (And good DQ methods WILL go into inspecting the actual member records themselves). So for contra-indicated attributes, I’d suggest a slightly extended estimation model:
* 10mins: for each “Simple” item (standard format, no applied business rules, fewer that 100 member records)
* 30 mins: for each “Medium” complexity item (unusual formats, some embedded business logic, data sets up to 1000 member records)
* 60 mins: for any “Hard” high-complexity items (significant, complex business logic, data sets over 1000 member records)
Secondly, and more importantly – David doesn’t really allow for the human factor. It’s always people that are bloody hard work! While it’s all very well to do a profiling exercise in-and-of-itself, the result need to be shared with human beings – presented, scrutinised, questioned, validated, evaluated, verified, justified. (Then acted upon, hopefully!) And even allowing for the set-aside of the “Analysis” stages onwards, then there will need to be some form of socialisation within the “Profiling” phase.
That’s not a technical exercise – it’s about communication, collaboration and co-operation. Which means it may take an awful lot longer than just doing the tool-based profiling process!
How much socialisation? That depends on the number of stakeholders, and their nature. As a rule-of-thumb, I’d suggest the following:
* Two hours of preparation per workshop ((If the stakeholder group is “tame”. Double it if there are participants who are negatively inclined).
* One hour face-time per workshop (Double it for “negatives”)
* One hour post-workshop write-up time per workshop
* One workshop per 10 stakeholders.
* Two days to prepare any final papers and recommendations, and present to the Steering Group/Project Board.
That’s in addition to David’s formula for estimating the pure data profiling tasks.
Detailed root-cause analysis (Validate), remediation (Protect) and ongoing evaluation (Monitor) stages are a whole other ball-game.
Alternatively, just stick with the crystal balls and goats – you might not even need to kill the goat anymore…
A “foreign” colleague of mine once told me a trick his English language teacher taught him to help him remember the “questioning words” in English. (To the British, anyone who is a non-native speaker of English is “foreign.” I should also add that as a Scotsman, English is effectively my second language…).
“Five Whiskies in a Hotel” is the clue – i.e. five questioning words begin with “W” (Who, What, When, Why, Where), with one beginning with “H” (How).
These simple question words give us a great entry point when we are trying to capture the initial set of issues and concerns around data governance – what questions are important/need to be asked.
* What data/information do you want? (What inputs? What outputs? What tests/measures/criteria will be applied to confirm whether the data is fit for purpose or not?)
* Why do you want it? (What outcomes do you hope to achieve? Does the data being requested actually support those questions & outcome? Consider Efficiency/Effectiveness/Risk Mitigation drivers for benefit.)
* When is the information required? (When is it first required? How frequently? Particular events?)
* Who is involved? (Who is the information for? Who has rights to see the data? Who is it being provided by? Who is ultimately accountable for the data – both contents and definitions? Consider multiple stakeholder groups in both recipients and providers)
* Where is the data to reside? (Where is it originating form? Where is it going to?)
* How will it be shared? (How will the mechanisms/methods work to collect/collate/integrate/store/disseminate/access/archive the data? How should it be structured & formatted? Consider Systems, Processes and Human methods.)
Clearly, each question can generate multiple answers!
Aside: in the Doric dialect of North-East of Scotland where I originally hail from, all the “question” words begin with “F”:
Fit…? (What?) e.g. “Fit dis yon feel loon wint?” (What does that silly chap want?)
Fit wye…? (Why?) e.g. “Fit wye div ye wint a’thin’?” (Why do you want everything?)
Fan…? (When?) e.g. “Fan div ye wint it?” (When you you want it?)
Fa…? (Who?) e.g. “Fa div I gie ‘is tae?” (Who do I give this to?)
Far…? (Where?) e.g. “Far aboots dis yon thingumyjig ging?” (Where exactly does that item go?)
Foo…? (How?) e.g. “Foo div ye expect me tae dae it by ‘e morn?” (How do you expect me to do it by tomorrow?)
Whatever your native language, these key questions should get the conversation started…
Remember too, the homily by Rudyard Kipling:
Is this the kind of response you get when you mention to people that you work in Data Quality?!
Let’s be honest here. Data Quality is good and worthy, but it can be a pretty dull affair at times. Information Management is something that “just happens”, and folks would rather not know the ins-and-outs of how the monthly Management Pack gets created.
Yet I’ll bet that they’ll be right on your case when the numbers are “wrong”.
So here’s an idea. The next time you want to engage someone in a discussion about data quality, don’t start by discussing data quality. Don’t mention the processes of profiling, validating or cleansing data. Don’t talk about integration, storage or reporting. And don’t even think about metadata, lineage or auditability. Yaaaaaaaaawn!!!!
Instead of concentrating on telling people about the practitioner processes (which of course are vital, and fascinating no doubt if you happen to be a practitioner), think about engaging in a manner that is relevant to the business community, using language and examples that are business-oriented. Make it fun!
Once you’ve got the discussion flowing in terms of the impacts, challenges and inhibitors that get in the way of successful business operations, then you can start to drill into the underlying data issues and their root causes. More often than not, a data quality issue is symptomatic of a business process failure rather than being an end in itself. By fixing the process problem, the business user gains a benefit, and the data in enhanced as a by-product. Everyone wins (and you didn’t even have to mention the dreaded DQ phrase!)
Data Quality is a human thing – that’s why its hard. As practitioners, we need to be communicators. Lead the thinking, identify the impact and deliver the value.
Now, that’s interesting!
TODAY: Sun, April 23, 2017April2017