Archive for the ‘Information Development’ Category
A distributed approach to Data Quality may result in better data outcomes
A recent news article on Information-Management.com suggested a link between inaccurate data and “lack of a centralized approach.”
But I’m not sure that “lack of centralization” is the underlying issue here; I’d suggest the challenge is generally more down to “lack of a structured approach”, and as I covered in my blog post “To Centralise or not to Centralise, that is the Question”, there are organizational cultures that don’t respond well (or won’t work at all) to a centralized approach to data governance.
When you then extend this to the more operational delivery processes of Data Quality Management, I’d go so far as to suggest that a distributed and end-user oriented approach to managing data quality is actually desirable, for several reasons:
* Many organisations just haven’t given data quality due consideration, and the impacts can be significant, but often hidden.
* Empowering users to thinks about and act upon data challenges can become a catalyst for a more structured, enterprise wide approach.
* By managing data quality issues locally, knowledge and expertise is maintained as close to point-of-use as possible.
* In environments where funding is hard to come by or where there isn’t appetite to establish critical mass for data quality activity, progress can still be made and value can still be delivered
I also observe two trends in business, that have been consistent in the twenty-plus years that I’ve been working, which are contributing to make a centralised delivery of data outcomes ever-more difficult:
1) Human activity has become more and more complex. We’re living in a mobile, connected, graphical, multi-tasking, object-oriented, cloud-serviced world, and the rate at which we’re collecting data is showing no sign of abatement. It may well be that our data just isn’t “controllable” in the classic sense any more, and that what’s really needed is mindfulness. (I examined this in my post “Opening Pandora’s Box“)
2) Left to their own devices, business systems and processes will tend to decay towards a chaotic state over time, and it is management’s role to keep injecting focus and energy into the organisation. If this effort can be spread broadly across the organisation, then there is an overall cultural change towards better data. (I covered aspects of this in my post “Business Entropy – Bringing Order to the Chaos“)
Add the long-standing preoccupation that management consultants have with mapping “Business Process” rather than mapping “Business Data” and you end up in the situation that data does not get nearly enough attention. (And once attention IS payed, then the skills and capabilities to do something about it are often lacking).
Change the culture, change the result – that doesn’t require centralisation to make it happen.
A few weeks ago, while reading about the winners at the 56th Annual Grammy Awards, I saw that Daft Punk won both Record of the Year and Album of the Year, which made me wonder what the difference is between a record and an album.
Then I read that Record of the Year is awarded to the performer and the production team of a single song. While Daft Punk won Record of the Year for their song “Get Lucky”, the song was not lucky enough to win Song of the Year (that award went to Lorde for her song “Royals”).
My confusion about the semantics of the Grammy Awards prompted a quick trip to Wikipedia, where I learned that Record of the Year is awarded for either a single or individual track from an album. This award goes to the performer and the production team for that one song. In this context, record means a particular recorded song, not its composition or an album of songs.
Although Song of the Year is also awarded for a single or individual track from an album, the recipient of this award is the songwriter who wrote the lyrics to the song. In this context, song means the song as composed, not its recording.
The Least Ambiguous Award goes to Album of the Year, which is indeed awarded for a whole album. This award goes to the performer and the production team for that album. In this context, album means a recorded collection of songs, not the individual songs or their compositions.
These distinctions, and the confusion it caused me, seemed eerily reminiscent of the challenges that happen within organizations when data is ambiguously defined. For example, terms like customer and revenue are frequently used without definition or context. When data definitions are ambiguous, it can easily lead to incorrect uses of data as well as confusing references to data during business discussions.
Not only is it difficult to reach consensus on data definitions, definitions change over time. For example, Record of the Year used to be awarded to only the performer, not the production team. And the definition of who exactly counts as a member of the production team has been changed four times over the years, most recently in 2013.
Avoiding semantic inconsistencies, such as the difference between a baker and a Baker, is an important aspect of metadata management. Be diligent with your data definitions and avoid daft definitions for sound semantics.
In late December of 2013, Google Chairman Eric Schmidt admitted that ignoring social networking had been a big mistake. ”I guess, in our defense, we were busy working on many other things, but we should have been in that area and I take responsibility for that,” he said.
Brass tacks: Google’s misstep and Facebook’s opportunistic land grab of social media have resulted in a striking data chasm between the two behemoths. As a result, Facebook can do something that Google just can’t.
To his credit, Mark Zuckerberg has not been complacent with this lead. This is an age of ephemera. He is building upon his company’s lead in social data. Case in point: the launch of Graph Search.
The rationale here is pretty straightforward: Why let Google catch up? With Graph Search, Facebook users can determine which of their friends have gone to a Mexican restaurant in the last six months in San Francisco. What about which friends like the Rolling Stones or The Beatles? (Need to resell a ticket? Why use StubHub here? Maybe Facebook gets a cut of the transaction?) These are questions and problems that Google can’t address but Facebook can.
All good for Zuck et. al, right? Not really. It turns out that delivering relevant social data in a timely manner is proving remarkably elusive, even for the smart cookies at Facebook.
The New News Feeds
As Wired reported in May of 2013, Facebook “redesigned its News Feed with bolder images and special sections for friends, photos, and music, saying the activity stream will become more like a ‘personalized newspaper’ that fits better with people’s mobile lifestyles.” Of course, many users didn’t like the move, but that’s par for the course these days. You’re never going to make 1.2 billion users happy.
But Facebook quickly realized that it didn’t get the relaunch of News Feed right. Not even close. Just a few weeks before Schimdt’s revealing quote, Business Insider reported that Facebook was at it again, making major tweaks to its feed and then halting its new launch. This problem has no simple solution.
Simon Says: Big Data Is a Full-Time Job
Big Data is no picnic. “Managing” it isn’t easy, even for billion-dollar companies such as Facebook. The days of “set it and forget it” have long passed. Organizations need to be constantly monitoring the effectiveness of their data-driven products and services, to say nothing of testing for security issues. (Can someone say Target?)
What say you?
Big Data Solution Offering
Have you seen the latest offering in our Composite Solutions suite?
The Big Data Solution Offering provides an approach for storing, managing and accessing data of very high volumes, variety or complexity.
Storing large volumes of data from a large variety of data sources in traditional relational data stores is cost-prohibitive, and regular data modeling approaches and statistical tools cannot handle data structures with such high complexity. This solution offering discusses new types of data management systems based on NoSQL database management systems and MapReduce as the typical programming model and access method.
Read our Executive Summary for an overview of this solution.
We hope you find this new offering of benefit and welcome any suggestions you may have to improve it.
This Week’s Blogs for Thought:
Your data is in the cloud when…
It’s fashionable to be able to claim that you’ve moved everything from your email to your enterprise applications “into the cloud”. But what about your data? Just because information is stored over the Internet, it shouldn’t necessarily qualify as being “in the cloud.”
The Top of the Data Quality Bell Curve
“Information is the value associated with data,” William McKnight explains in his book Information Management: Strategies for Gaining a Competitive Advantage with Data. “Information is data under management that can be utilized by the company to achieve goals.” Does that data have to be perfect in order to realize its value and enable the company to achieve its goals? McKnight says no.
Data quality, according to McKnight, “is the absence of intolerable defects.”
Business Entropy: Bringing order to the chaos
In his recent post, Jim Harris drew an analogy between the interactions of atomic particles, sub-atomic forces and the working of successful collaborative teams, and coined the term ego-repulsive force. Jim’s post put me in mind of another aspect of physics that I think has parallels with our business world – the Second Law of Thermodynamics, and the concept of entropy.
In thermodynamics, “entropy” describes a measure of the number of ways a closed system can be arranged; such systems spontaneously evolve towards a state of equilibrium, which are at maximum entropy (and therefore, maximum disorder). I observe that this mechanism also holds true for the workings of organisations – and there is a law of Business Entropy at work.
I love Google and in a pretty unhealthy way. In my third book, The New Small, there are oodles references to Google products and services. I use Google on a daily basis for all sorts of different things, including e-mail, document sharing, phone calls, calendars, and Hangouts.
And one more little thing: search. I can’t imagine ever “Binging” something and, at least in the U.S., most people don’t either.
Yet, there are limitations to Google and, in this post, I am going to discuss one of the main ones.
A few years ago, I worked on a project doing some data migration. I supported one line of business (LOB) for my client while another consultant (let’s call him Mark) supported a separate LOB. Mark and I worked primarily with Microsoft Access. The organization ultimately wanted to move toward an enterprise-grade database, in all likelihood SQL Server.
Relying Too Much on Google
Mark was a nice guy. At the risk of being immodest, though, his Access and data chops weren’t quite on my level. He’d sometimes ask me questions about how to do some relatively basic things, such as removing duplicates. (Answer: SELECT DISTINCT.) When he had more difficult questions, I would look at his queries and see things that just didn’t make a whole lot of sense. For example, he’d try to write one massive query that did everything, rather than breaking them up into individual parts.
Now, I am very aware that development methodologies vary and there’s no “right” one. Potato/pot-ah-to, right? Also, I didn’t mind helping Mark–not at all. I’ll happily share knowledge, especially when I’m not pressed with something urgent.
Mark did worry me, though, when I asked him if he knew SQL Server better than MS Access. “No,” he replied. “I’ll just Google whatever I need.”
For doing research and looking up individual facts, Google rocks. Finding examples of formulas or SQL statements isn’t terribly difficult either. But one does not learn to use a robust tool like SQL Server or even Access by merely using a search engine. You don’t design an enterprise system via Google search results. You don’t build a data model, one search at a time. These things require a much more profound understanding of the process.
In other words, there’s just no replacement for reading books, playing with applications, taking courses, understanding higher-level concepts, rather than just workarounds, and overall experience.
You don’t figure out how to play golf while on the course. You go to the practice range. I’d hate to go to a foreign country without being able to speak the language–or accompanied by someone who can. Yes, I could order dinner with a dictionary, but what if a doctor asked me in Italian where the pain was coming from?
What say you?
In his recent post, Jim Harris drew an analogy between the interactions of atomic particles, sub-atomic forces and the working of successful collaborative teams, and coined the term ego-repulsive force.
Jim’s post put me in mind of another aspect of physics that I think has parallels with our business world – the Second Law of Thermodynamics, and the concept of entropy.
In thermodynamics, “entropy” describes a measure of the number of ways a closed system can be arranged; such systems spontaneously evolve towards a state of equilibrium, which are at maximum entropy (and therefore, maximum disorder).
I observe that this mechanism also holds true for the workings of organisations – and there is a law of Business Entropy at work.
It’s my contention that, just like in any closed system that a physicist could describe, business organisations will tend to decay towards a chaotic state over time. Left to their own devices, people will do their own thing, suit themselves, and interact only intermittently and randomly to a level sufficient to meet their own needs. Business entropy continues to act until a state of disordered equilibrium is reached, at which points random events may occur but effectively have no impact on the overall business (almost the business equivalent of Brownian Motion.)
The role of management, then, is to introduce additional energy into the closed system of the Brownian organisation, whether through raw enthusiasm, new ideas, encouragement, or changes of processes. And management only serves a purpose when it is acting to add energy to the system – when that stops, then the law of Business Entropy will kick in and the organisation will begin to decay over time.
This management energy applies forces that disrupt the chaotic state and create dynamic new motion (innovation), gain momentum (business change) or maintain the bonds between particles (organisational structure and process). If the right forces are applied at the right time and in the right way, then the business can be moved to a new desired state.
(As an example, when Lou Gerstner joined IBM in the early 1990s, he turned around the fortunes of a rapidly failing business by introducing an approach of continuous renewal, innovation and re-invention which is still prevalent today.)
Of course if too much energy is added, then the system may become unstable and can either collapse or explode. Too little, and the natural inertia within the organisation cannot be overcome. (e.g. Ron Johnson’s failed leadership and change of direction for J.C. Penney, while time will tell whether Kodak can survive its endemic complacency and too-cautious response to digital imaging).
What is the state of Business Entropy in your organisation? Are you moving with momentum, with management applying the right forces and energy to achieve the desired state? Or are you in a Brownian Business?
And more to the point, what are you going to do about it?!
It’s fashionable to be able to claim that you’ve moved everything from your email to your enterprise applications “into the cloud”. But what about your data? Just because information is stored over the Internet, it shouldn’t necessarily qualify as being “in the cloud”.
New cloud solutions are appearing at an incredible rate. From productivity to consumer applications the innovations are staggering. However, there is a world of difference between the ways that the data is being managed.
The best services are treating the application separately to the data that supports it and making the content easily available from outside the application. Unfortunately there is still a long way to go before the aspirations of information-driven businesses can be met by the majority of cloud services as they continue to lock away the content and keep the underlying models close to their chest.
An illustrative example of best is a simple drawing solution, Draw.io. Draw.io is a serious threat to products that support the development of diagrams of many kinds. Draw.io avoids any ownership of the diagrams by reading and saving XML and leaving it to its user to decide where to put the content while making it particularly easy to integrate with your Google Drive or Dropbox account, keeping the content both in the cloud and under your control. This separation is becoming much more common with cloud providers likely to bifurcate between the application and data layers.
You can see Draw.io in action as part of a new solution for entity-relationship diagrams in the tools section of www.infodrivenbusiness.com .
Offering increasing sophistication in data storage are the fully integrated solutions such as Workday, Salesforce.com and the cloud offerings of the traditional enterprise software companies such as Oracle and SAP. These vendors are realising that they need to work seamlessly with other enterprise solutions either directly or through third-party integration tools.
Also important to watch are the offerings from Microsoft, Apple and Google which provide application logic as well as facilitating third-party access to cloud storage, but lead you strongly (and sometimes exclusively) towards their own products.
There are five rules I propose for putting data in the cloud:
1. Everyone should be able to collaborate on the content at same time
To be in the cloud, it isn’t enough to back-up the data on your hard disk drive to an Internet server. While obvious, this is a challenge to solutions that claim to offer cloud but have simply moved existing file and database storage to a remote location. Many cloud providers are now offering APIs to make it easy for application developers to offer solutions with collaboration built-in.
2. Data and logic are separated
Just like the rise of client/server architectures in the 1990s, cloud solutions are increasingly separating the tiers of their architecture. This is where published models and the ability to store content in any location is a real advantage. Ideally the data can be moved as needed providing an added degree of flexibility and the ability to comply with different jurisdictional requirements.
3. The data is available to other applications regardless of vendor
Applications shouldn’t be a black box. The trend towards separating the data from the business logic leads inexorably towards open access to the data by different cloud services. Market forces are also leading towards open APIs and even published models.
4. The data is secure
The content not only needs to be secure, but it also needs to be seen to be secure. Ideally it is only visible to the owner of the content and not the cloud application or storage vendor. This is where those vendors offering solutions that separate application logic and storage have an advantage given that much of the security is in the control of the buyer of the service.
5. The data remains yours
I’ve written about data ownership before (see You should own your own data). This is just as important regardless of whether the cloud solution is supporting a consumer, a business or government.
Over the course of my career, I have written more reports than I can count. I’ve created myriad dashboards, databases, SQL queries, ETL tools, neat Microsoft Excel VBA, scripts, routines, and other ways to pull and massage data.
In a way, I am Big Data.
This doesn’t make me special. It just makes me a seasoned data-management professional. If you’re reading this post, odds are that the list above resonates with you.
Three Problems with Creating Excessive Reports
As an experienced report writer, it’s not terribly hard to pull data from databases table, external sources, and the web. There’s no shortage of forums, bulletin boards, wikis, websites, and communities devoted to the most esoteric of data- and report-related concerns. Google is a wonderful thing.
I’ve made a great deal of money in my career by doing as I was told. That is, a client would need me to create ten reports and I would dutifully create them. Sometimes, though, I would sense that ten weren’t really needed. I would then ask if any reports could be combined. What if I could build only six or eight reports to give that client the same information? What if I could write a single report with multiple output options?
There are three main problems with creating an excessive number of discrete reports. First, it encourages a rigid mode of thinking, as in: “I’ll only see it if it’s on the XYZ report.” For instance, Betty in Accounts Receivable runs an past due report to find vendors who are more than 60 days late with their payments. While this report may be helpful, it will fail to include any data that does not meet predefined criterion. Perhaps her employer is particularly concerned about invoices from particularly shady vendors only 30 days past due.
Second, there’s usually a great deal of overlap. Organizations with hundreds of standard reports typically use multiple versions of the same report. If you ran a “metareport”, I’d bet that some duplicates would appear. In and of itself, this isn’t a huge problem. But often database changes means effectively modifying the same multiple times.
Third, and most important these days, the reliance upon standard reports inhibits data discovery.
Look, standard reports aren’t going anywhere. Simple lists and financial statements are invaluable for millions of organizations.
At the same time, though, one massive report for everything is less than ideal. Ditto for a “master” set of reports. These days, true data discovery tools like Tableau increase the odds of finding needles in haystacks.
Why not add interactivity to basic reports to allows non-technical personnel to do more with the same tools?
What say you?
Big Data requires million-dollar investments.
Nonsense. That notion is just plain wrong. Long gone are the days in which organizations need to purchase expensive hardware and software, hire consultants, and then three years later start to use it. Sure, you can still go on-premise, but for many companies cloud computing, open source tools like Hadoop, and SaaS have changed the game.
But let’s drill down a bit. How can an organization get going with Big Data quickly and inexpensively? The short answer is, of course, that it depends. But here are three trends and technologies driving the diverse state of Big Data adoption.
Crowdsourcing and Gamification
Consider Kaggle. Founded in April 2010 by Anthony Goldbloom and Jeremy Howard, the company seeks to make data science a sport, and an affordable one at that. Kaggle is equal parts rowdsourcing company, social network, wiki, gamification site, and job board (like Monster or Dice).
Kaggle is a mesmerizing amalgam of a company, one that in many ways defies business convention. Anyone can post a data project by selecting an industry, type (public or private), type of participation (team or individual), reward amount, and timetable.” Kaggle lets you easily put data scientists to work for you, and renting is much less expensive than buying them.
Open Source Applications
But that’s just one way to do Big Data in a relatively inexpensive manner–at least compared to building everything from scratch and hiring a slew of data scientists. As I wrote in Too Big to Ignore, digital advertising company Quantcast attacked Big Data in a very different way, forking the Hadoop file system. This required a much larger financial commitment than just running contest on Kaggle.
The common thread: Quantcast’s valuation is nowhere near that of Facebook, Twitter, et al. The company employs dozens of people–not thousands.
Finally, even large organizations with billion-dollar budgets can save a great deal of money on the Big Data front. Consider NASA, nowhere close to anyone’s definition of small. NASA embraces open innovation, running contests on Innocentive to find low-cost solutions to thorny data issues. NASA often prizes in the thousands of dollars, receiving suggestions and solutions from all over the globe.
I’ve said this many times. There’s no one “right” way to do Big Data. Budgets, current employee skills, timeframes, privacy and regulatory concerns, and other factors should drive an organization’s direction and choice of technologies.
What say you?
Few computing and technological achievements rival IBM’s Watson. Its impressive accomplishments to this point include high-profile victories in chess and Jeopardy!
Turns out that we ain’t seen nothin’ yet. Its next incarnation will be much more developer-friendly. From a recent GigaOM piece:
Developers who want to incorporate Watson’s ability to understand natural language and provide answers need only have their applications make a REST API call to IBM’s new Watson Developers Cloud. “It doesn’t require that you understand anything about machine learning other than the need to provide training data,” Rob High, IBM’s CTO for Watson, said in a recent interview about the new platform.
The rationale to embrace platform thinking is as follows: As impressive as Watson is, even an organization as large as IBM (with over 400,000 employees) does not hold a monopoly on smart people. Platforms and ecosystems can take Watson in myriad directions, many of which you and I can’t even anticipate. Innovation is externalized to some extent. (If you’re a developer curious to get started, knock yourself out.)
Continue reading the article and you’ll see that Watson 2.0 “ships” not only with an API, but an SDK, an app store, and a data marketplace. That is, the more data Watson has, the more it can learn. Can someone say network effect?
Think about it for a minute. A data marketplace? Really? Doesn’t information really want to be free?
Well, yes and no. There’s no dearth of open data on the Internet, a trend that shows no signs of abating. But let’s not overdo it. The success of Kaggle has shown that thousands of organizations are willing to pay handsomely for data that solves important business problems, especially if that data is timely, accurate, and aggregated well. As a result, data marketplaces are becoming increasingly important and profitable.
Simon Says: Embrace Data and Platform Thinking
The market for data is nothing short of vibrant. Big Data has arrived, but not all data is open, public, free, and usable.
Combine the explosion of data with platform thinking. It’s not just about the smart cookies who work for you. There’s no shortage of ways to embrace platforms and ecosystems, even if you’re a mature company. Don’t just look inside your organization’s walls for answers to vexing questions. Look outside. You just might be amazed at what you’ll find.
What say you?
TODAY: Wed, October 22, 2014October2014