Archive for June, 2013
The traditional notion of data warehousing is the increasing accumulation of structured data, which distributes information across the organization, and provides the knowledge base necessary for business intelligence.
In a previous post, I pondered whether a contemporary data warehouse is analogous to an Enterprise Brain with both structured and unstructured data, and the interconnections between them, forming a digital neural network with orderly structured data firing in tandem, while the chaotic unstructured data assimilates new information. I noted that perhaps this makes business intelligence a little more disorganized than we have traditionally imagined, but that this disorganization might actually make an organization smarter.
Business intelligence is typically viewed as part of a top-down decision management system driven by senior executives, but a potentially more intelligent business intelligence came to mind while reading Steven Johnson’s book Emergence: The Connected Lives of Ants, Brains, Cities, and Software.
Last month I wrote about the collection of personal information by business and government and compared the loss of privacy to George Orwell’s predictions for 1984 (see “Living as far from 1984 as Orwell”). Barely had I written the post and the news of the PRISM leak hit the news.
Some months ago I predicted in the media that there would be a major breach of trust in 2013 (for an example of the coverage see “Cyber security, cloud top disruptive tech trends”). When I made those comments I wasn’t willing to predict what that event might be, but certainly the controversy around PRISM is causing many people to ask whether their personal activities are being tracked.
I am not buying into the debate on whether the US PRISM program is legitimate or desirable. I do, however, argue that any activity that involves personal information and which comes as a surprise to the owners will naturally risk a backlash. This is as true of individual businesses as it is of governments.
Combined with online fraud and hacking, there are signs that the general public is starting to lose confidence in the technology that they have embraced so enthusiastically.
Although people lose confidence in technology (and the ICT industry as a whole), they still want the convenience of the products that they have learnt to use. Whether it is location services, digital media or social media, people value these additions to their lives. When I originally spoke to the media about the potential loss of trust in 2013 I also predicted that any short-term concerns would be alleviated as the general public turned to brands they trusted.
The role of these brands is not just to stand as a beacon of trust, rather they have an opportunity to clearly establish the terms of service and provide an extra layer of security and managed privacy. Ultimately these trusted brands can negotiate agreements across the ICT industry and government to use common personally controlled records (as I’ve written abot previously in articles such as “You should own your own data”) putting the control back into the hands of the individual.
Our digital world can add so much to society and the economy. It is up to all of us, as pioneers in this information revolution to find the solutions that will replicate the protections that were built into the processes of the analogue and paper world that had evolved over more than two hundred years.
What is an Open Methodology Framework?
An Open Methodology Framework is a collaborative environment for building methods to solve complex issues impacting business, technology, and society. The best methodologies provide repeatable approaches on how to do things well based on established techniques. MIKE2.0′s Open Methodology Framework goes beyond the standards, techniques and best practices common to most methodologies with three objectives:
- To Encourage Collaborative User Engagement
- To Provide a Framework for Innovation
- To Balance Release Stability with Continuous Improvement
We believe that this approach provides a successful framework accomplishing things in a better and collaborative fashion. What’s more, this approach allows for concurrent focus on both method and detailed technology artifacts. The emphasis is on emerging areas in which current methods and technologies lack maturity.
The Open Methodology Framework will be extended over time to include other projects. Another example of an open methodology, is open-sustainability which applies many of these concepts to the area of sustainable development. Suggestions for other Open Methodology projects can be initiated on this article’s talk page.
We hope you find this of benefit and welcome any suggestions you may have to improve it.
This Week’s Blogs for Thought:
Through a PRISM, Darkly
By now, most people have heard about the NSA surveillance program known as PRISM that, according to a recent AP story, is about even bigger data seizure. Without question, big data has both positive and negative aspects, but these recent revelations are casting more light on the darker side of big data, especially its considerable data privacy implications.A few weeks ago, Robert Hillard blogged about how we are living as far from 1984 as George Orwell, whose dystopian novel 1984, which was published in 1948, has recently spiked in sales since apparently Big Brother is what many people see when they look through a PRISM, darkly.
Talking Data Democratization We both remember the days when the end user had no choice but to wait weeks, months or forever for IT to deliver every small report and every change to every small report. The end user was simply not empowered whatsoever. By the time the report came, it was far less valuable than when it was asked for. Even then, there was much further processing to be done on the data as the next step was to pull the data into Excel, where the real work would begin, including data cleansing. IT would go willfully blind on this and consequently users were inconsistently, and redundantly, recreating the work of their fellow users. Information was not a weapon. It was an afterthought.
The High Costs of Big Data: Will Monetizing Lead to Regulations?
It’s never been cheaper to obtain the technologies needed to process Big Data, but alas, that doesn’t change the fact that it’s still very expensive.
“Even with Amazon Redshift’s aggressive pricing, NASA would have to pay more than a $1 million for 45 days in data storage costs alone,” Bruno Aziza wrote in a December Venture Beat guest blog. “This number is consistent with New Vantage Partners survey, which evaluates the average big data project to cost between $1 million and $10 million.”
By now, most people have heard about the NSA surveillance program known as PRISM that, according to a recent AP story, is about even bigger data seizure. Without question, big data has both positive and negative aspects, but these recent revelations are casting more light on the darker side of big data, especially its considerable data privacy implications.
A few weeks ago, Robert Hillard blogged about how we are living as far from 1984 as George Orwell, whose dystopian novel 1984, which was published in 1948, has recently spiked in sales since apparently Big Brother is what many people see when they look through a PRISM, darkly.
Although the Big Data and Big Brother comparison was being made before the PRISM story broke, it’s now more frequently discussed and debated, especially in regard to government, and rightfully so.
However, I still can’t help but wonder if we’re overreacting to this issue. (more…)
In his recent post, my friend Jim Harris contrasts data democracies with data dictatorships. Harris writes that most organizations are data democracies, “which means that other data sources, both internal and external, will be used.” Truer words have never been written.
Of course, this hasn’t always been the case. Two decades ago, data management wasn’t nearly as democratic and chaotic as it is today. Many organizations could control a good portion of the data under their umbrellas–or at least try.
Enter the Internet.
Today, plenty of valuable data still comes from employees and customers. But they’re hardly alone. Data also emanates from partners, users, machines, websites, social networks, feeds, and many other sources as well. Enterprises today need to “manage” and aggregate data from myriad places. What’s more, no longer do organizations have to concern themselves with only structured data streaming at them once in a while. Today, they have to deal with multiple data types (read: structured and semi-structured), and at an increasing velocity to boot.
Yes, the tools today are much more powerful and dynamic than even ten years ago. Hadoop represents the de facto Big Data standard, but plenty of NoSQL, NewSQL, and other Big Data solutions exist.
The point is that organizations would do well not to think of data management as either-or/two poles: democracy vs. dictatorship. Rather, more than ever, these two extremes are part of the same continuum. (See below.)
I cannot think of an enterprise with completely democratic or dictatorial data management. Today, the most intelligent organizations incorporate both democratic and dictatorial elements into their information management.
Dictatorship good: Letting employees set their own salaries isn’t terribly wise.
Democracy good: Many companies let employees handle their own benefits via open enrollment. This is very smart. In the social sphere, only paying attention to company-initiated tweets or blog posts is ill advised. Companies that ignore consumer complaints on social networks do so at their own peril. Just ask United Airlines.
I haven’t been to too many meetings in which senior folks have openly asked, “How democratic or dictatorial should we be?” Not asking the question every so often, however, almost guarantees organizations ignore potentially critical information. Democratic data like user-generated photos, blog posts, comments, videos, and podcasts is exploding.
Just because an organization cannot control that data doesn’t mean that that it should disregard it. Embrace a hybrid strategy. Democracy and dictatorships each have their place.
What say you?
Information Governance, otherwise known as the collection of policies, structures and practices used to ensure the confidentiality, security and ethical use of information, is an on-going initiative for most organizations. However, like other information management disciplines, companies struggle to meet these challenges for one fundamental reason: they fail to focus on the enterprise-wide nature of data management problems. They incorrectly see information as a technology or IT issue, rather than as a fundamental and core business activity.
At MIKE2.0, we believe Information Governance should be implemented across the entire organization- its people, processes and technology. Listed below are the most important factors to a successful Information Governance (IG) program:
- Accountability. Because of the ways in which information is captured–and how it flows across the enterprise, everyone has a role to play in how it is governed. Many of the most important roles are played by individuals fairly junior in the organization. They typically play a key role data capture stage and often cause–or see-errors on a first-hand basis. Certain individuals need to be dedicated to IG. These roles are filled by senior executives such as the CIO, Information Architects, and Data and Content Stewards.
- Efficient Operating Models. The IG approach should define an organizational structure that most effectively handles the complexities of both integration and IM across the whole of the organization. Of course, there will typically be some degree of centralization as information flows across the business. However, this organizational model need not be a single, hierarchical team. The common standards, methods, architecture, and collaborative techniques so central to IG allow this model to be implemented in a wide variety of models: physically central, virtual, or offshore. Organizations should provide assessment tools and techniques to progressively refine these new models over time.
- A Common Methodology. An IG program should include a common set of activities, tasks, and deliverables. Doing so builds specific IM-based competencies. This enables greater reuse of artifacts and resources, not to mention higher productivity out of individuals. It also manifests the commonalities of different IM initiatives across the organization.
- Standard Models. A common definition of terms, domain values, and their relationships is one of the fundamental building blocks of IG. This should go beyond a traditional data dictionary. It should include a lexicon of unstructured content. Defining common messaging interfaces allows for easy inclusion of “data in motion.” Business and technical definitions should be represented and, just as important, the lineage between them easy to navigate.
- Architecture. An IM architecture should be defined for the current-state, transition points, and target vision. The inherent complexity of this initiative will require the representation of this architecture through multiple views. This is done in Krutchen’s Model. Use of architectural design patterns and common component models are key aspects of good governance. This architecture must accommodate dynamic and heterogeneous technology environments that, invariably, will quickly adapt to new requirements.
- Comprehensive Scope. An IG approach should be comprehensive in its scope, covering structured data and unstructured content. It should also include the entire lifecycle of information. This begins with its initial creation, including integration across systems, archiving, and eventual destruction. This comprehensive scope can only achieved with an architecture-driven approach and well-defined roles and responsibilities.
- Information Value Assessment (IVA). Organizations (should) place a very high value on their information assets. As such, they will view their organization as significantly de-valued when these assets are unknown–or poorly defined . An IVA assigns an economic value to the information assets held by an organization. The IVA also how IG influences this value. It must also measure whether the return outweighs the cost, as well as the time required to attain this return. In this vein, current methods are particularly immature, although some rudimentary models do exist. In this case, industry models must greatly improve, much like what has occurred in the past ten years in the infrastructure space.
- Senior Leadership. Senior leaders to manage their information, and deal with related issues. CIOs, for example, must face a host of business users who increasingly demand relevant, contextual information. At this same time, leadership teams often blame failures on “bad data.” In the post Sarbanes-Oxley environment, CFOs are asked to sign off on financial statements. To this end, the quality of data and the systems that produce that data are being scrutinized now more than ever before. CMOs are being asked to grow revenues with less human resources. New regulations around the management of information have prevented many organizations from being effective. Senior leaders must work towards a common goal of improving information while concurrently appreciating that IM is still immature as a discipline. The bottom line is that there will be some major challenges ahead.
- Historical Quantification. In the majority of cases, the most difficult aspect of IM can be stated very simply: most organizations are trying to fix decades of “bad behavior.” The current-state is often unknown, even at an architectural or model level. The larger the organization, the more complex this problem typically becomes. Historical quantification through common architectural models and quantitative assessments of data and content are key aspects of establishing a known baseline. Only then can organizations move forward. For such a significant task, this assessment must be conducted progressively–not all at once.
- Strategic Approach. An IG program will need to address complex issues across the organization. Improvements will typically be measured over months and years, not days. As a result, a strategic approach is required. A comprehensive program can be implemented over long periods of time through multiple release cycles. The strategic approach will allow for flexibility to change. However, the level of detail will still be meaningful enough to effectively deal with complex issues.
- Continuous Improvement. It is not always cost-effective to fix all issues in a certain area. Sometimes, it is best instead follow the “80/20 rule. An IG program should explicitly plan to revisit past activities. It should build on a working baseline through audits, monitoring, technology re-factoring, and personnel training. Organizations should look for opportunities to “release early, release often.” At the same time, though, they should remember what this means from planning and budgeting perspectives.
- Flexibility for Change. While an IG program involves putting standards in place, it must utilize its inherent pragmatism and flexibility for change. A strong governance process does not mean that exceptions can’t be granted. Rather, key individuals and groups need to know exceptions are occurring–and why. The Continuous Improvement approach grants initial workarounds. These then have to be re-factored at a later point in order to balance short-term business priorities.
- Governance Tools. Measuring the effectiveness of an IG program requires tools to capture assets and performance. Just as application development and service delivery tools exist, organizations need a way to measure information assets, actions, and their behaviors.
In many ways, Information is the new accounting. Solutions required to address complex infrastructure and information issues can’t be tackled on a department-by-department basis. MIKE2.0 offers an open source solution to improve Information Governance across the enterprise. We hope you’ll check it out when you have a moment.
In a previous post, I explained why you need data quality standards. In his comment on my post, Richard Ordowich explained “data quality tends to be subjective just like the data is subjective. I think this is what fitness for use really implies. It’s all relative. The difficulty is determining what the relative terms are. They are seldom revealed.” I definitely agree since understanding subjectivity and relativity is essential for understanding how to approach data quality.
In his book Thinking, Fast and Slow, Daniel Kahneman explained “the same sound will be experienced as very loud or quite faint, depending on whether it was preceded by a whisper or a roar. To predict the subjective experience of loudness, it is not enough to know its absolute energy; you also need to know the reference sound to which it is automatically compared.”
Therefore, assessing data quality, just like any other evaluation, is always relative to a reference point.
Most often, the reference point will be the point of view of the user that determines whether the data is fit for the purpose of their use. Add another user and you add another reference point. Depending upon which reference point you begin with, you will get a different assessment of data quality, one that is subjective to the user and relative to their use. The same data, which may have the same objective data quality, such as real-world alignment, will be experienced differently by each user.
“You can easily set up a compelling demonstration of this principle,” Kahneman explained. “Place three bowls of water in front of you. Put ice water into the left-hand bowl and warm water into the right-hand bowl. The water in the middle bowl should be at room temperature. Immerse your hands in the cold and warm water for about a minute, then dip both in the middle bowl. You will experience the same temperature as heat in one hand and cold in the other.”
This is why it’s so important that your data requirements be well-communicated, and that you establish a baseline assessment of data quality against those data requirements, before you decide to embark on a data quality improvement initiative. Without mapping out those reference points ahead of time, you will end up wandering lost in the data wilderness, wasting time, money, and effort improving the quality of whatever data you find without understanding how subjectivity and relativity impact data quality.
For the last few years, it’s become very difficult for IT to police who brings personal devices into the enterprise (never mind what people do with them). If you’re reading this site, you’ve probably heard of BYOD, a trend that will surely continue. Google Glass and its ilk are coming soon. These devices pose additional risks for IT departments determined to prevent data theft, security breaches, and industrial espionage.
But what about bringing your own software? Is this a nascent trend about which IT has to worry?
Yammer: A Case Study
True enterprise collaboration software has existed for a quite some time. More than a decade ago, we began to hear of corporate intranets and knowledge bases. One of the first proper collaboration applications: Microsoft SharePoint. Such promise!
Lamentably, employees did not consistently use these tools throughout the enterprise. For a host of reasons, many organizations continued to rely upon email as the killer collaboration app. Generally speaking, this was a mistake. Email doesn’t lend itself to true collaboration, and some CEOs even banned email.
Faced with the need to share information in a more efficient manner, many people began collaborating on the worst possible place: Facebook. From a recent study:
Facebook is a collaboration platform twice as popular as SharePoint — 74% to 39%. It’s also four times more popular than IBM Open Connections (17%) and six times more popular than Salesforce’s Chatter (12%).
The study, of 4,000 users and 1,000 business and IT decision-makers in 22 countries, also said 77% of managers and 68% of users say they now use some form of enterprise social networking technology. IT decision makers said such social technologies make their jobs more enjoyable (66%), more productive (62%) and “help them get work done faster” (57%). All in all, said Avanade, of those businesses currently using social collaboration tools, 82% want to use more of them in the future.
The governance, security, and privacy issues posed by using the world’s largest social network as an enterprise collaboration tool are hard to overstate. Yet, many experienced IT professionals became fed up with clunky top-down collaboration tools like SharePoint.
Consider the recent success of Yammer, a Freemium-based and organic alternative to top-down tools like SharePoint. Yammer became so popular precisely because it was organic. That is, employees could download the application and kick its tires. IT did not need to deploy or bless it. Organizations could date before they got married. If they wanted to expand its use–or unlock key functionality, they could pay for Yammer. And that’s exactly what many organizations did. To its credit, about a year ago, Microsoft purchased Yammer for more than $1 billion in cash.
CXOs should think very carefully about whether their current applications enable their organizations and employees to be productive. These days, it doesn’t take a computer scientist to circumvent restrictive corporate IT policies. Yammer is not unique. In other words, if it can go viral within an organization, other applications can.
What say you?
A few years ago, I did a little consulting for an organization involved in a legal dispute with its software vendor and integrator. For obvious reasons, I can’t divulge specifics here. To make a long story short, the company (let’s call it ABC) contracted the vendor (let’s call it XYZ Software) to customize and deploy a new supply chain and financial system. A six-month project quickly ballooned into 18 months, with still no promise that the system would work as promised.
And then the lawyers got involved.
As I listened to executives from ABC tell me what happened, a few things became clear to me. For starters, ABC recognized that it was partially at fault for the current state of affairs. While not accepting all of the blame, ABC realized that it made a critical mistake at the onset of the project: the company believed that it could successfully migrate its data from its legacy system to its new application. After all, how hard could ETL be, right?
ABC would soon find out.
It’s Never That Easy
ABC’s legacy system did not store data in a transactional and relational way. That is, its legacy system updated accounts, inventory, and the like to reflect current amounts (see second set of data below). Unfortunately for ABC, its new application needed transactional data to function (see first set of data below).
Because ABC employees had never worked with transactional data before, they struggled with the ETL process, no doubt contributing to the project’s delays. Sure, other issues were involved with XYZ’s (lack) of project management and personnel, but in reality this project never had a shot at hitting its six-month goal because of myriad data issues.
Interesting ending to the story: ABC kicked out XYZ and implemented the software of a new vendor relatively easily. The reason is obvious: its data had already been converted into a more ERP-friendly format.
Data storage today is a commodity but that didn’t used to be the case. (See Kryder’s Law.) Systems built in the 1980s tried to minimize then-expensive data storage. Remember, transactional tables can get very large.
When moving from an old system to a new one, don’t underestimate the amount of time and effort needed to convert data. I’ve seen far too many examples like ABC over the years.
What say you?
Data Governance: How competent is your organization?
One of the key concepts of the MIKE2.0 Methodology is that of an Organisational Model for Information Development. This is an organisation that provides a dedicated competency for improving how information is accessed, shared, stored and integrated across the environment.
Organisational models need to be adapted as the organisation moves up the 5 Maturity Levels for organisations in relation to the Information Development competencies below:
Level 1 Data Governance Organisation – Aware
- An Aware Data Governance Organisation knows that the organisation has issues around Data Governance but is doing little to respond to these issues. Awareness has typically come as the result of some major issues that have occurred that have been Data Governance-related. An organisation may also be at the Aware state if they are going through the process of moving to state where they can effectively address issues, but are only in the early stages of the programme.
Level 2 Data Governance Organisation – Reactive
- A Reactive Data Governance Organisation is able to address some of its issues, but not until some time after they have occurred. The organisation is not able to address root causes or predict when they are likely to occur. “Heroes” are often needed to address complex data quality issues and the impact of fixes done on a system-by-system level are often poorly understood.
Level 3 Data Governance Organisation – Proactive
- A Proactive Data Governance Organisation can stop issues before they occur as they are empowered to address root cause problems. At this level, the organisation also conducts ongoing monitoring of data quality to issues that do occur can be resolved quickly.
Level 4 Data Governance Organisation – Managed
Level 5 Data Governance Organisation – Optimal
The MIKE2.0 Solution for the the Centre of Excellence provides an overall approach to improving Data Governance through a Centre of Excellence delivery model for Infrastructure Development and Information Development. We recommend this approach as the most efficient and effective model for building these common set of capabilities across the enterprise environment.
Feel free to check it out when you have a moment and offer any suggestions you may have to improve it.
This Week’s Blogs for Thought:
Headaches, Data Analysis and Negativity Bias
I have suffered from bad headaches most of my life, but over the last few years they seemed to be getting worse. Discussing this with my doctor, he asked lots of questions: How often do you get headaches? Do they occur at the same time of day? How long do they last? Are they always severe or sometimes mild? How many doses of over-the-counter medication do you take per headache?
Since I have been a data management professional for over twenty years, I felt kind of stupid when I realized that what my doctor was asking for to aid his medical diagnosis was . . . data.
Living as far from 1984 as Orwell
Over the last month I’ve been talking a lot about personally controlled records and the ownership of your own information. For more background, see last month’s post and a discussion I took part in on ABC radio.
The strength of the response reinforces to me that this is an area that deserves greater focus. On the one hand, we want business and government to provide us with better services and to effectively protect us from danger. On the other, we don’t want our personal freedoms to be threatened. The question for us to ask is whether we are at risk of giving up our personal freedom and privacy by giving away our personal information.
TODAY: Fri, March 24, 2017June2013