Archive for August, 2013
The MIKE2.0 wiki defines the Chief Data Officer (CDO) as one that plays a key executive leadership role in driving data strategy, architecture, and governance as the executive leader for data management activities.
“Making the most of a company’s data requires oversight and evangelism at the highest levels of management,” Anthony Goldbloom and Merav Bloch explained in their Harvard Business Review blog post Your C-Suite Needs a Chief Data Officer.
Goldbloom and Bloch describe the CDO as being responsible for identifying how data can be used to support the company’s most important priorities, making sure the company is collecting the right data, and ensuring the company is wired to make data-driven decisions.
“I firmly believe the definition of a CDO role is a good idea,” Forrester analyst Gene Leganza blogged, but “there’s plenty to be worked out to make this effective. What would be the charter of this new role (and the organizational unit that would report to it), where would it report, and what roles would report into it? There are no easy answers as far as I can see.”
What about the CIO?
And if you are wondering whether your organization needs a CDO when you probably already have a Chief Information Officer (CIO), then “look at what we’ve asked CIOs to do,” Peter Aiken and Michael Gorman explained in their intentionally short book The Case for the Chief Data Officer. “They are responsible for infrastructure, application software packages, Ethernet connections, and everything in between. It’s an incredible range of jobs. If you look at a chief financial officer, they have a singular focus on finance, because finance and financial assets are a specific area the business cares about. Taking data as a strategic asset gives it unique capabilities, and when you take the characteristics of data and you see the breadth and scope of CIO functions, they don’t work together. It hasn’t worked, it’s not going to work, especially when you consider the other data plans coming down the pipeline.”
And there aren’t just other data plans coming down the pipeline. Our world is becoming, not just more data-driven, but increasingly data-constructed. “Global drivers have been shifting from valuing the making of things to the flow of intellectual capital,” Robert Hillard blogged. “This is the shift to an information economy which has most recently been dubbed digital disruption. There is no point, for instance, in complaining about there being less tax on music streaming than the manufacture, distribution, and sale of CDs. The value is just in a different place and most of it isn’t where it was.”
The Rise of a Second CDO?
“All businesses are now digital businesses,” Gil Press blogged. “The digitization of the entire business is spreading to all industries and all business functions and is threatening to make the central IT organization less relevant. Enter the newly-minted Chief Digital Officer expected to provide a unifying vision and develop a digital strategy, transforming existing processes and products and finding new digital-based profit and revenue opportunities. The role of the Chief Digital Officer is all about digital governance, the other CDO role—that of the Chief Data Officer—is all about data governance. With more and more digital data flowing throughout the organization, and going in and out through its increasingly porous borders, managing the quality, validity, and access to this asset is more important than ever.”
“The main similarity between the two roles,” Press explained, “is the general consensus that the new chiefs, whether of the digital or the data kind, should not report to the CIO. Theirs is a business function, while the CIO is perceived to be dealing with technology.”
“The CDO reports to the business,” Aiken and Gorman explained. “Business data architecture is a business function, not an IT function. In fact, the only data management areas that stay behind with the CIO are the actual development of databases and the tuning, backup, and recovery of the data delivery systems, with security shared between IT and the business.”
Hail to the Chiefs
“The central IT organization and CIOs may become irrelevant in the digital economy,” Press concluded. “Or, CIOs could use this opportunity to demonstrate leadership that is based on deep experience with and understanding of what data, big or small, is all about — its management, its analysis, and its use in the service of innovation, the driving force of any enterprise.”
The constantly evolving data-driven information economy is forcing enterprises to open their hailing frequencies to chiefs, both new and existing, sending a hail to the chiefs to figure out how data and information, and especially its governance, relate to their roles and responsibilities, and how they can collectively provide the corporate leadership needed in the digital age.
If any modern economy wants to keep, or even add, value to their country as the digital economy grows it has to search for productivity in new ways. That means bringing together innovations from IT that are outside today’s core and combining them with solutions developed both locally and globally.
Global economies are experiencing a period of rapid change that has arguably not been seen before by anyone less than 80 years of age. Global drivers have been shifting from valuing the making of things to the flow of intellectual capital. This is the shift to an information economy which has most recently been dubbed “digital disruption”.
In just a few short years the digital economy has grown from insignificance to being something that politicians around the world need to pay attention to.
Unfortunately, most governments see the digital economy in terms of the loss of tax revenue from activities performed over the internet rather understanding the extent of the need to recalibrate their economies to the new reality.
While tax loopholes are always worth pursuing, the real focus should be stepping up to challenge of every country around the world: how to keep adding value locally to protect the local economy and jobs. There is absolutely no point, for instance, in complaining about there being less tax on music streaming than the manufacture, distribution and sale of CDs. The value is just in a different place and most of it isn’t where it was.
Although the loss of bookstores and music retailers has been one of the most visible changes, the shift in spending has come at an incredible pace. Just ask any postal service or the newspaper industry. As citizens we are keen to take advantage of the advances of smartphones, the integration of supply chains with manufacturing in China (giving us very cheap goods) and the power of social media to share of information with our friends. We are less keen when our kids lose their jobs as retail outlets close, manufacturing shuts down and the newsagent’s paper round gets shorter.
We all benefited dramatically as IT improved the efficiency and breadth of services that government and business was able to offer. Arguably the mass rollout of business IT was as important to productivity in the 1990s as the economic reforms of the 1980s. As a direct result there are now millions of people employed in IT around the world.
While this huge workforce has been responsible for so much, today it is not being applied enough to protect economies from leaking value offshore. Many companies regard technology as something they do just enough of to get on with their “real” businesses, even as their markets diminish. Even “old economy” businesses need to be encouraging their IT departments to innovate and apply for patents.
To protect their tax base, and future prosperity, each country has to search for productivity in new ways. That means combining innovations from IT that are outside today’s core and combining them with solutions developed by other organisations locally and internationally. It means looking beyond the core business and being prepared to spin-off activities at the edge that have real value in their own right. And it also means governments and large enterprises need to change the way that they procure so that they are seeding whole new economies.
Today when politicians and executives hear about IT projects they don’t think about the productivity gain, they just fear the inevitable delays and potential for bad press. That’s because large organisations have traditionally approached IT as a 1990s procurement problem rather than as an opportunity to seed a new market. A market that is desperately needed by small and medium enterprises who, while innovative, find it very hard to compete for big IT projects leaving many solutions being limited to less innovative and less efficient approaches. Every time this happens the local and global economy takes a small productivity hit which ultimately hurts us all.
Imagine a world of IT where government and large business doesn’t believe it has to own the systems that provide its citizens and customers with services. This is the economy that cloud computing is making possible with panels of providers able to delivery everything from fishing licences to payroll for a transactional fee.
Payment for service reduces government and business involvement in the risky business of delivering large scale IT projects while at the same time providing a leg-up for local businesses to become world leaders using local jobs.
Government can have a major impact through policy settings in areas such as employee share schemes, R&D tax credits and access to skilled labour. However the biggest single impact they can have growing their digital economies and putting the huge IT workforce to productive work is through the choices they make in what they buy.
Business can have a major impact on productivity by managing cost in the short-term by better integrating with local and global providers, but to repeat the benefits of the 1990s productivity improvements will require a willingness to invent new solutions using the most important tools of our generation: digital and information technology.
In 1668, the French philosopher and mathematician Edme Mariotte discovered what has come to be known as the “blind spot” in each one of our eyes, a region of the retina where the optic nerve connects the visual cortex to the back of the retina that has no rods or cones, so the corresponding areas of our visual field are incapable of registering light.
While this blind spot is surprisingly large (imagine the diameter of the moon in the night sky — 17 moons could fit into your blind spots), its effects are minimal because the blind spots in each of our eyes do not overlap, and so the information from one eye fills in the information lacking in the other.
As the philosopher Daniel Dennett describes our blind spot, there are no centers of the visual cortex “responsible for receiving reports from this area, so when no reports arrive, there is no one to complain. An absence of information is not the same as information about an absence.”
Daragh O Brien, in his recent article The Value of Null: The Paradox of Metrics in Data Governance, wrote about the classic information governance challenge of misunderstanding the meaning of a null value in a report. In this particular case, it was a report of issues being tracked by the data governance metrics defined by one of his clients.
The root of the problem was that only one business unit was actually reporting issues, causing executive management to misinterpret the absence of data governance metrics reported by other business units as the absence of data governance issues in those business units. This was making the business unit that was actually doing a good job with data governance look bad simply because they were the only ones actually measuring and reporting their data governance progress.
“Until you define that there is a thing you will measure as an indicator of your governance performance,” O Brien explained, “then there is nothing being measured. So the fact that my client’s peers were not publishing any metrics came down to how you interpreted the null set of metrics being produced. Ultimately, the paradox of metrics in data quality and data governance is that the simple act of measuring sets you up for attack because people have historically not had visibility of these issues and the data makes organizations ask hard questions of themselves.”
In other words, insightful metrics reveal the blind spots in an organization’s field of vision.
Effective data quality and data governance metrics must provide insight into data that is aligned with how the business uses data to support a business process, accomplish a business objective, or make a business decision. This will prevent organizational blindness caused when data quality and data governance is not properly measured within a business context and continually monitored.
So, when all is null on the metrics front, don’t assume that all is well behind the business lines.
Everybody has a plan until they get punched in the face.
Who should do what on a Big Data project?
It seems like a logical and even necessary question, right? After all, Big Data is a big deal, and requires assistance from each line of business, the top brass, and IT, right?
Matt Ariker, Tim McGuire, and Jesko Perry recently wrote a HBR post attempting to answer this question. In Five Roles You Need on Your Big Data Team, the three advocate five “important roles to staff your advanced analytics bureau”:
- Data Hygienists
- Data Explorers
- Business Solution Architects
- Data Scientists
- Campaign Experts
To be sure, everyone can’t and shouldn’t do everything in an era of Big Data. I can’t tell you for certain that bifurcating roles like the authors recommend won’t work. Still, I just don’t buy the argument that Big Data lends itself to everything fitting neatly in to traditional roles.
Take data quality, for instance. As Jim Harris writes:
The quality of the data in the warehouse determines whether it’s considered a trusted source, but it faces a paradox similar to “which came first, the chicken or the egg?” Except for the data warehouse it’s “which comes first, delivery or quality?” However, since users can’t complain about the quality of data that hasn’t been delivered yet, delivery always comes first in data warehousing.
Agreed. Traditional data warehousing projects could be thought of in a more linear fashion. In most cases, organizations were attempting to aggregate–and report on–their data (read: data internal to the enterprise). Once that source was added, maintenance was fairly routine, at least compared to today’s datasets. These projects tended to be more predictable.
But what happens when much if not most relevant data stems from outside of the enterprise? What do we do when new data sources start popping up faster than ever? Mike Tyson’s quote at the top of this post has never been more apropos.
Simon Says: Big Data Is Not Predictable
My point is that IT projects have start and end dates. Amazon, Apple, Facebook, Twitter, Google, and other successful companies don’t view Big Data as “IT projects.” This is a potentially lethal mistake. For its part, Netflix views both Big Data and data visualization as ongoing processes; they are never finished. I make the same point in my last book.
When you starting thinking of Big Data as an initiative or project with traditionally defined roles, you’re on the road to failure. Don’t make “data hygenics” or “data exploring” the sole purview of a group, department, or individual. Encourage others to step out of the comfort zones, notice things, test hypotheses, and act upon them.
What say you?
Many organizations are wrapping their enterprise brain around the challenges of business intelligence, looking for the best ways to analyze, present, and deliver information to business users. More organizations are choosing to do so by pushing business decisions down in order to build a bottom-up foundation.
However, one question coming up more frequently in the era of big data is what should be the division of labor between computers and humans?
In his book Emergence: The Connected Lives of Ants, Brains, Cities, and Software, Steven Johnson discussed how the neurons in our human brains are only capable of two hundred calculations per second, whereas the processors in computers can perform millions of calculations per second.
This is why we should let the computers do the heavy lifting for anything that requires math skills, especially the statistical heaving lifting required by big data analytics. “But unlike most computers,” Johnson explained, “the brain is a massively parallel system, with 100 billion neurons all working away at the same time. That parallelism allows the brain to perform amazing feats of pattern recognition, feats that continue to confound computers—such as remembering faces or creating metaphors.”
As the futurist Ray Kurzweil has written, “humans are far more skilled at recognizing patterns than in thinking through logical combinations, so we rely on this aptitude for almost all of our mental processes. Indeed, pattern recognition comprises the bulk of our neural circuitry. These faculties make up for the extremely slow speed of human neurons.”
“Genuinely cognizant machines,” Johnson explained, “are still on the distant technological horizon, and there’s plenty of reason to suspect they may never arrive. But the problem with the debate over machine learning and intelligence is that it has too readily been divided between the mindless software of today and the sentient code of the near future.”
But even if increasingly more intelligent machines “never become self-aware in any way that resembles human self-awareness, that doesn’t mean they aren’t capable of learning. An adaptive information network capable of complex pattern recognition could prove to be one of the most important inventions in all of human history. Who cares if it never actually learns how to think for itself?”
Business intelligence in the era of big data and beyond will best be served if we let both the computers and the humans play to their strengths. Let’s let the computers calculate and the humans cogitate.
On October 28, 2012, the Oklahoma City Thunder traded star sixth-man James Harden to the Houston Rockets. The move was not entirely expected, as the team was unable to work out a long-term extension with Harden. Fans were disappointed, as this trade broke up the young core of the Western Conference champions. (Harden was looking for a max contract and the Thunder had two max players signed long-term already.*)
While the move itself wasn’t entirely unexpected, the data behind the move was even more surprising.
Rockets’ GM Daryl Morey comes from the Moneyball school of sports management. That is, all else equal, it’s better to make decisions based upon data than gut instinct. To this end, Morey had long coveted Harden, an incredibly efficient player.
As the following chart from HotShotCharts demonstrates, Harden naturally navigates to places on the floor that lend themselves to high expected values. (Click on the image to expand it).
You can noodle for days on the HSC site, looking at visual data from different teams, players, and arenas. For his part, Harden generally takes shorter three-pointers and layups. (See the red dots above.) He avoids long two-pointers because they have lower expected values. Note the low shot counts inside the arc but outside of the paint.
What’s more, field goal percentage (FGA) is a better gauge of player effectiveness. Players like Kobe Bryant, Allen Iverson, and Carmelo Anthony score a bunch of points, but they typically take far too many shots. (Even I would score ten points per game if you gave me enough shots, I’m not very good at hoops.)
Data is permeating every facet of business and, I’d argue, life. While not a complete substitute for common sense, we are seeing dataviz tools crystallize differences among companies, products, and even NBA players.
Relying exclusively on old standbys like Microsoft Excel leaves money on the table. Why not look at different ways to view your data? You may well be surprised at what you find.
What say you?
* The Thunder offered Harden $55.5 million over four years–$4.5 million less than the max deal Harden coveted and will get from the Rockets, sources told ESPN The Magazine.
“Who, what, and where, by what helpe, and by whose:
Why, how, and when, doe many things disclose. ”
- Thomas Wilson, The Arte of Rhetorique, 1560
Our human proclivity for story-telling as the primary method we have to communicate our culture — its values, tales, foibles and limitless possibilities — seems to have been ignored if not forgotten, by the standards we have for digital communications. Rather than build systems that promote human understanding about and appreciation for the complexity of human endeavor, we have built instead a somewhat technocratic structure to transmit ‘facts’ about ‘things’. A consequence of this mono-maniacal focus on ‘facts’ is that we risk the loss of humanity in the thicket of numbers and pseudo-semantic string values we pass around among ourselves.
Let’s instead look at a way to structure our communications that relies on time-tested concepts of ‘best practice’ story-telling. It may seem odd to talk of ‘best practices’ and story-telling in the same breath, though indeed we’re all familiar with these ideas.
The “5 Ws” are a good place to start. These are five or six or sometimes more, basic questions whose answers are essential to effective story-telling, often taught to students of journalism, scientific research, and police forensics. Stories which neglect to relay the entirety of the “Who-What-When-Where-Why” (the five Ws) of a topic too easily may leave listeners ‘semi-ignorant’ about the topic. In a court of law for instance, trials in which method (how) motive (why) and opportunity (when) are not fully explicated and validated, rightly end with verdicts of ‘not guilty’.
The same approach — providing the ‘full story’ about a topic — should undergird our methods of digital communications as well. Perhaps by doing so, much of the current debate about “context” for/of any particular semantic (RDF) dataset might be more tractable and resolvable.
The practical significance of a “5 Ws” approach is that it can directly provide a useful metric about RDF datasets. A low score suggests the dataset makes small effort to give the ‘full story’ about its set of topics, while a high score would indicate good conformance to the precepts of best practice communications. In the real world, for instance, a threshold for this metric could be specified in contracts which envision information exchanged between its parties.
While a high-score of course wouldn’t attest to the reliability or consistency of each answer to the “5 Ws” for a given topical datastream, a low-score is indicative that the ‘speaker’ is merely spouting facts (that is, the RDF approach, which is to “say anything about anything”) best used to accentuate one’s own story but not useful as a complete recounting in its own right.
A “best practice communications” metric might be formulated by examining the nature of the property values associated with a resource. If entities are each a subclass of a 5W class, then it can be a matter of “provisionally checking the box” to the extent that some answer exists: a 100% score might indicate the information about a given topic is nominally complete while a 0% score indicates that merely a reference to the existence of the resource has been provided.
Viewing semantic datastreams each as a highly formalized story (or set of stories), then applying quality criteria developed by professional communicators as long ago as Cicero, can provide valuable insights when building high quality data models and transmitting high quality datastreams.
Like many, I’m one who’s been around since the cinder block days, once entranced by shiny Tektronix tubes stationed nearby a dusty card sorter. After years using languages as varied as Assembler through Scheme, I’ve come to believe the shift these represented, from procedural to declarative, has well-improved the flexibility of software organizations produce.
Interest has now moved towards an equally flexble representation of data. In the ‘old’ days when an organization wanted to collect a new data-item about, say, a Person, then a new column would first be added by a friendly database administrator to a Person Table in one’s relational database. Very inflexible.
The alternative — now widely adopted — reduces databases to a simple forumulation, one that eliminates Person and other entity-specific tables altogether. These “triple-stores” basically have just three columns — Subject, Predicate and Object — in which all data is stored. Triple-stores are often called ‘self-referential’ because first, the type of a Subject of any row in a triple-store is found in a different row (not column) in the triple-store and second, definitions of types are found in different rows of the triple-store. The benefits? Not only is the underlying structure of a triple-store unchanging, but also stand-alone metadata tables (tables describing tables) are unnecessary.
Why? Static relational database tables do work well enough to handle transactional records whose dataitems are usually well-known in advance; the rate of change in those business processes is fairly low, so that the cost of database architectures based on SQL tables is equally low. What, then, is driving the adoption of triple-stores?
The scope of business functions organizations seek to automate has enlarged considerably: the source of new information within an organization is less frequently “forms” completed by users, now more frequently raw text from documents; tweets; blogs; emails; newsfeeds; and other ‘social’ web and internal sources; which have been produced received &or retrieved by organizations.
Semantic technologies are essential components of Natural Language Processing (NLP) applications which extract and convert, for instance, all proper nouns within a text into harvestable networks of “information nodes” found in a triple-store. In fact during such harvesting, context becomes a crucial variable that can change with each sentence analyzed from the text.
Bringing us to my primary distinction between really semantic and non-semantic applications: really semantic applications mimic a human conversation, where the knowledge of an indivdual in a conversation is the result of a continuous accrual of context-specific facts, context-specific definitions, even context-specific contexts. As a direct analogy, Wittgenstein, a modern giant of philosophy, calls this phenomena Language Games to connote that one’s techniques and strategies for analysis of a game’s state and one’s actions, is not derivable in advance — it comes only during the play of the game, i.e., during processing of the text corpora.
Non-semantic applications on the other hand, are more similar to rites, where all operative dialogs are pre-written, memorized, and repeated endlessly.
This analogy to human conversations (to ‘dynamic semantics’) is hardly trivial; it is a dominant modelling technique among ontologists as evidenced by development of, for instance, Discourse Representation Theory (among others, e.g., legal communities have a similar theory, simply called Argumentation) whose rules are used to build Discourse Representation Structures from a stream of sentences that accommodate a variety of linguistic issues including plurals, tense, aspect, generalized quantifiers, anaphora and others.
“Semantic models” are an important path towards a more complete understanding of how humans, when armed with language, are able to reason and draw conclusions about the world. Relational tables, however, in themselves haven’t provided similar insight or re-purposing in different contexts. This fact alone is strong evidence that semantic methods and tools must be prominent in any organization’s technology plans.
I recently hosted a well-attended webinar on Big Data in the public sector. It went reasonably well enough and, at the end, I answered some questions from inquisitive listeners.
Now, most were fairly standard queries. I sensed that there was a little skepticism about the power of Big Data among attendees. To be sure, there’s no shortage of hype these days. I also received the completely expected question, “How can you determine the value of Big Data? How do I calculate its ROI?”
I’ve ranted on this topic before, and I just don’t agree with those who won’t move before they have precisely quantified the ROI of Big Data. At best, these are SWAGs. At worst, they are biased calculations driven by vested consulting firms and vendors trying to hawk their wares.
Replicating the Old in the New
With any new technology or application, employees and enterprises often seem to fall into the same traps. We are creatures of habit, after all. Over my career, I have repeatedly seen organizations deploy new technologies and make similar mistakes. Among the worst: employees simply replicated what they were doing before in the old system or reporting application. It became old hat: I would often just help people just create their previous standard reports in the new system.
Many people just didn’t care about the new system’s enhanced functionality. Put simply, these people just didn’t want to learn.
Again, I understand this mind-set, but I don’t agree with it. When you’re doing what you just did before, you squander massive and unprecedented opportunities. You fail to explore and discover new insights. In these types of organizations, I’d wager that pre-implementation ROI calculations exceed real-world results.
On occasion, however, I have worked with people curious about the enhanced functionality of the new system or reporting tool. They didn’t just want to set up old standard reports. They wanted to learn and explore new things. What could they do now that they couldn’t do before? In cases like these, I would “take the over” on any ROI estimate.
First, don’t think of Big Data as “another application.” It’s not.
Second, realize that ROI calculations are often imprecise at best. What’s more, as books like The Halo Effect: … and the Eight Other Business Delusions That Deceive Managers manifest, the world doesn’t stand still. How can any ROI model account for what may happen when many of the unknowns are unknown?
Either way, if you get Big Data or you don’t, you can twist ROI calculations to prove your point.
What say you?
TODAY: Tue, March 28, 2017August2013