Archive for January, 2014
I love Google and in a pretty unhealthy way. In my third book, The New Small, there are oodles references to Google products and services. I use Google on a daily basis for all sorts of different things, including e-mail, document sharing, phone calls, calendars, and Hangouts.
And one more little thing: search. I can’t imagine ever “Binging” something and, at least in the U.S., most people don’t either.
Yet, there are limitations to Google and, in this post, I am going to discuss one of the main ones.
A few years ago, I worked on a project doing some data migration. I supported one line of business (LOB) for my client while another consultant (let’s call him Mark) supported a separate LOB. Mark and I worked primarily with Microsoft Access. The organization ultimately wanted to move toward an enterprise-grade database, in all likelihood SQL Server.
Relying Too Much on Google
Mark was a nice guy. At the risk of being immodest, though, his Access and data chops weren’t quite on my level. He’d sometimes ask me questions about how to do some relatively basic things, such as removing duplicates. (Answer: SELECT DISTINCT.) When he had more difficult questions, I would look at his queries and see things that just didn’t make a whole lot of sense. For example, he’d try to write one massive query that did everything, rather than breaking them up into individual parts.
Now, I am very aware that development methodologies vary and there’s no “right” one. Potato/pot-ah-to, right? Also, I didn’t mind helping Mark–not at all. I’ll happily share knowledge, especially when I’m not pressed with something urgent.
Mark did worry me, though, when I asked him if he knew SQL Server better than MS Access. “No,” he replied. “I’ll just Google whatever I need.”
For doing research and looking up individual facts, Google rocks. Finding examples of formulas or SQL statements isn’t terribly difficult either. But one does not learn to use a robust tool like SQL Server or even Access by merely using a search engine. You don’t design an enterprise system via Google search results. You don’t build a data model, one search at a time. These things require a much more profound understanding of the process.
In other words, there’s just no replacement for reading books, playing with applications, taking courses, understanding higher-level concepts, rather than just workarounds, and overall experience.
You don’t figure out how to play golf while on the course. You go to the practice range. I’d hate to go to a foreign country without being able to speak the language–or accompanied by someone who can. Yes, I could order dinner with a dictionary, but what if a doctor asked me in Italian where the pain was coming from?
What say you?
In his recent post, Jim Harris drew an analogy between the interactions of atomic particles, sub-atomic forces and the working of successful collaborative teams, and coined the term ego-repulsive force.
Jim’s post put me in mind of another aspect of physics that I think has parallels with our business world – the Second Law of Thermodynamics, and the concept of entropy.
In thermodynamics, “entropy” describes a measure of the number of ways a closed system can be arranged; such systems spontaneously evolve towards a state of equilibrium, which are at maximum entropy (and therefore, maximum disorder).
I observe that this mechanism also holds true for the workings of organisations – and there is a law of Business Entropy at work.
It’s my contention that, just like in any closed system that a physicist could describe, business organisations will tend to decay towards a chaotic state over time. Left to their own devices, people will do their own thing, suit themselves, and interact only intermittently and randomly to a level sufficient to meet their own needs. Business entropy continues to act until a state of disordered equilibrium is reached, at which points random events may occur but effectively have no impact on the overall business (almost the business equivalent of Brownian Motion.)
The role of management, then, is to introduce additional energy into the closed system of the Brownian organisation, whether through raw enthusiasm, new ideas, encouragement, or changes of processes. And management only serves a purpose when it is acting to add energy to the system – when that stops, then the law of Business Entropy will kick in and the organisation will begin to decay over time.
This management energy applies forces that disrupt the chaotic state and create dynamic new motion (innovation), gain momentum (business change) or maintain the bonds between particles (organisational structure and process). If the right forces are applied at the right time and in the right way, then the business can be moved to a new desired state.
(As an example, when Lou Gerstner joined IBM in the early 1990s, he turned around the fortunes of a rapidly failing business by introducing an approach of continuous renewal, innovation and re-invention which is still prevalent today.)
Of course if too much energy is added, then the system may become unstable and can either collapse or explode. Too little, and the natural inertia within the organisation cannot be overcome. (e.g. Ron Johnson’s failed leadership and change of direction for J.C. Penney, while time will tell whether Kodak can survive its endemic complacency and too-cautious response to digital imaging).
What is the state of Business Entropy in your organisation? Are you moving with momentum, with management applying the right forces and energy to achieve the desired state? Or are you in a Brownian Business?
And more to the point, what are you going to do about it?!
“Information is the value associated with data,” William McKnight explains in his book Information Management: Strategies for Gaining a Competitive Advantage with Data. “Information is data under management that can be utilized by the company to achieve goals.” Does that data have to be perfect in order to realize its value and enable the company to achieve its goals? McKnight says no.
Data quality, according to McKnight, “is the absence of intolerable defects.”
“It is not the absence of defects. Every enterprise will have those. It is the absence of defects that see us falling short of a standard in a way that would have real, measurable negative business impact. Those negative effects could see us mistreating customers, stocking shelves erroneously, creating foolish marketing campaigns, or missing chances for expansion. Proper data quality management is also a value proposition that will ultimately fall short of perfection, yet will provide more value than it costs.”
“The proper investment in data quality is based on a bell curve on which the enterprise seeks to achieve the optimal ROI at the top of the curve.”
Mark Twain once said, “few things are harder to put up with than the annoyance of a good example.”
McKnight’s book provides many good examples, one based on an e-commerce/direct mail catalog/brick-and-mortar enterprise that regularly interacts with its customers.
“For e-commerce sales, address information is updated with every order. Brick-and-mortar sales may or may not capture the latest address, and direct mail catalog orders will capture the latest address. However, if I place an order and move two weeks later, my data is out-of-date: short of perfection.”
This is why I don’t like the anti-data-cleansing mantra of getting data right, the first time, every time—because even when you get data right the first time, it’s not the last time data has to be managed.
“Perfection is achievable,” McKnight continued, “but not economically achievable. For instance, an enterprise could hire agents in the field to knock on their customers’ doors and monitor the license plates of cars coming and going to ensure that they know to the day when a customer moves. This would come closer to perfect data on the current address of consumers, but at tremendous cost (not to mention that it would irritate the customer).”
Not only is data perfection the asymptote of data quality that’s not economically achievable, data perfection is not the goal of information management. The goal of information management is to help the enterprise achieve its goals by providing data-driven solutions for business problems, which, by their very nature, are dynamic challenges that rarely have (or require) a perfect solution.
It’s fashionable to be able to claim that you’ve moved everything from your email to your enterprise applications “into the cloud”. But what about your data? Just because information is stored over the Internet, it shouldn’t necessarily qualify as being “in the cloud”.
New cloud solutions are appearing at an incredible rate. From productivity to consumer applications the innovations are staggering. However, there is a world of difference between the ways that the data is being managed.
The best services are treating the application separately to the data that supports it and making the content easily available from outside the application. Unfortunately there is still a long way to go before the aspirations of information-driven businesses can be met by the majority of cloud services as they continue to lock away the content and keep the underlying models close to their chest.
An illustrative example of best is a simple drawing solution, Draw.io. Draw.io is a serious threat to products that support the development of diagrams of many kinds. Draw.io avoids any ownership of the diagrams by reading and saving XML and leaving it to its user to decide where to put the content while making it particularly easy to integrate with your Google Drive or Dropbox account, keeping the content both in the cloud and under your control. This separation is becoming much more common with cloud providers likely to bifurcate between the application and data layers.
You can see Draw.io in action as part of a new solution for entity-relationship diagrams in the tools section of www.infodrivenbusiness.com .
Offering increasing sophistication in data storage are the fully integrated solutions such as Workday, Salesforce.com and the cloud offerings of the traditional enterprise software companies such as Oracle and SAP. These vendors are realising that they need to work seamlessly with other enterprise solutions either directly or through third-party integration tools.
Also important to watch are the offerings from Microsoft, Apple and Google which provide application logic as well as facilitating third-party access to cloud storage, but lead you strongly (and sometimes exclusively) towards their own products.
There are five rules I propose for putting data in the cloud:
1. Everyone should be able to collaborate on the content at same time
To be in the cloud, it isn’t enough to back-up the data on your hard disk drive to an Internet server. While obvious, this is a challenge to solutions that claim to offer cloud but have simply moved existing file and database storage to a remote location. Many cloud providers are now offering APIs to make it easy for application developers to offer solutions with collaboration built-in.
2. Data and logic are separated
Just like the rise of client/server architectures in the 1990s, cloud solutions are increasingly separating the tiers of their architecture. This is where published models and the ability to store content in any location is a real advantage. Ideally the data can be moved as needed providing an added degree of flexibility and the ability to comply with different jurisdictional requirements.
3. The data is available to other applications regardless of vendor
Applications shouldn’t be a black box. The trend towards separating the data from the business logic leads inexorably towards open access to the data by different cloud services. Market forces are also leading towards open APIs and even published models.
4. The data is secure
The content not only needs to be secure, but it also needs to be seen to be secure. Ideally it is only visible to the owner of the content and not the cloud application or storage vendor. This is where those vendors offering solutions that separate application logic and storage have an advantage given that much of the security is in the control of the buyer of the service.
5. The data remains yours
I’ve written about data ownership before (see You should own your own data). This is just as important regardless of whether the cloud solution is supporting a consumer, a business or government.
Over the course of my career, I have written more reports than I can count. I’ve created myriad dashboards, databases, SQL queries, ETL tools, neat Microsoft Excel VBA, scripts, routines, and other ways to pull and massage data.
In a way, I am Big Data.
This doesn’t make me special. It just makes me a seasoned data-management professional. If you’re reading this post, odds are that the list above resonates with you.
Three Problems with Creating Excessive Reports
As an experienced report writer, it’s not terribly hard to pull data from databases table, external sources, and the web. There’s no shortage of forums, bulletin boards, wikis, websites, and communities devoted to the most esoteric of data- and report-related concerns. Google is a wonderful thing.
I’ve made a great deal of money in my career by doing as I was told. That is, a client would need me to create ten reports and I would dutifully create them. Sometimes, though, I would sense that ten weren’t really needed. I would then ask if any reports could be combined. What if I could build only six or eight reports to give that client the same information? What if I could write a single report with multiple output options?
There are three main problems with creating an excessive number of discrete reports. First, it encourages a rigid mode of thinking, as in: “I’ll only see it if it’s on the XYZ report.” For instance, Betty in Accounts Receivable runs an past due report to find vendors who are more than 60 days late with their payments. While this report may be helpful, it will fail to include any data that does not meet predefined criterion. Perhaps her employer is particularly concerned about invoices from particularly shady vendors only 30 days past due.
Second, there’s usually a great deal of overlap. Organizations with hundreds of standard reports typically use multiple versions of the same report. If you ran a “metareport”, I’d bet that some duplicates would appear. In and of itself, this isn’t a huge problem. But often database changes means effectively modifying the same multiple times.
Third, and most important these days, the reliance upon standard reports inhibits data discovery.
Look, standard reports aren’t going anywhere. Simple lists and financial statements are invaluable for millions of organizations.
At the same time, though, one massive report for everything is less than ideal. Ditto for a “master” set of reports. These days, true data discovery tools like Tableau increase the odds of finding needles in haystacks.
Why not add interactivity to basic reports to allows non-technical personnel to do more with the same tools?
What say you?
The recent North American cold wave that winded its way across Canada and the United States, brought heavy snowfall and broke low temperature records, leading to business, school, and road closures, as well as flight cancellations. This polar vortex also spun the phrase wind chill factor into almost every conversation, prompting me to investigate how wind chill is calculated, and leading me to yet another cold contemplation of data quality.
Before we get to data quality, let’s begin with some chilling facts about wind chill factor.
Wind makes us feel cold because as it blows across the exposed surface of our skin, it draws heat away from our bodies. When the wind picks up speed, it draws more heat away from exposed skin, cooling us more quickly. Wind chill, therefore, calculates how rapidly body heat is lost at different wind speeds.
Though not originally meant to express a temperature equivalent, weather forecasters started translating wind chills into the “feels like” factor we hear in weather reports today. For example, I live in Iowa and at one point last week the air temperature was -3 degrees Fahrenheit, while the wind chill factor made it feel like -36 degrees Fahrenheit.
It’s also important to note that lower wind chills mean inanimate objects cool to the air temperature more quickly, but even high winds can’t force the object’s temperature below the air temperature. For example, if the air temperature is 40 degrees Fahrenheit, water will not freeze even if the wind chill makes it feel to us like it’s below 32 degrees Fahrenheit (i.e., the freezing point of water).
What does Data Quality feel like?
All of this made me wonder if data quality has a chill factor. Data quality metrics are analogous to air temperature, meaning they’re often an objective measurement of the quality of data. A postal address, for example, can be validated independent of business context—it’s either valid or invalid.
However, what an invalid postal address feels like is dependent on a subjective measurement of business context. An email marketing program, for example, would not care about the validity of postal addresses since its data usage has no exposed skin in the postal address game, so to speak. Whereas a non-electronic billing system would feel the data quality chill factor of an invalid postal address.
Data quality standards are often established without acknowledging the different reference points from which they will be viewed, which could also influence how consistently standards are enforced.
If you want your organization’s data quality to be warm and cozy for all of your users, make sure you consider what data quality feels like from their business perspective, perhaps supplementing objective data quality metrics with a subjective data quality chill factor that’s customized for each user.
Big Data requires million-dollar investments.
Nonsense. That notion is just plain wrong. Long gone are the days in which organizations need to purchase expensive hardware and software, hire consultants, and then three years later start to use it. Sure, you can still go on-premise, but for many companies cloud computing, open source tools like Hadoop, and SaaS have changed the game.
But let’s drill down a bit. How can an organization get going with Big Data quickly and inexpensively? The short answer is, of course, that it depends. But here are three trends and technologies driving the diverse state of Big Data adoption.
Crowdsourcing and Gamification
Consider Kaggle. Founded in April 2010 by Anthony Goldbloom and Jeremy Howard, the company seeks to make data science a sport, and an affordable one at that. Kaggle is equal parts rowdsourcing company, social network, wiki, gamification site, and job board (like Monster or Dice).
Kaggle is a mesmerizing amalgam of a company, one that in many ways defies business convention. Anyone can post a data project by selecting an industry, type (public or private), type of participation (team or individual), reward amount, and timetable.” Kaggle lets you easily put data scientists to work for you, and renting is much less expensive than buying them.
Open Source Applications
But that’s just one way to do Big Data in a relatively inexpensive manner–at least compared to building everything from scratch and hiring a slew of data scientists. As I wrote in Too Big to Ignore, digital advertising company Quantcast attacked Big Data in a very different way, forking the Hadoop file system. This required a much larger financial commitment than just running contest on Kaggle.
The common thread: Quantcast’s valuation is nowhere near that of Facebook, Twitter, et al. The company employs dozens of people–not thousands.
Finally, even large organizations with billion-dollar budgets can save a great deal of money on the Big Data front. Consider NASA, nowhere close to anyone’s definition of small. NASA embraces open innovation, running contests on Innocentive to find low-cost solutions to thorny data issues. NASA often prizes in the thousands of dollars, receiving suggestions and solutions from all over the globe.
I’ve said this many times. There’s no one “right” way to do Big Data. Budgets, current employee skills, timeframes, privacy and regulatory concerns, and other factors should drive an organization’s direction and choice of technologies.
What say you?
Few computing and technological achievements rival IBM’s Watson. Its impressive accomplishments to this point include high-profile victories in chess and Jeopardy!
Turns out that we ain’t seen nothin’ yet. Its next incarnation will be much more developer-friendly. From a recent GigaOM piece:
Developers who want to incorporate Watson’s ability to understand natural language and provide answers need only have their applications make a REST API call to IBM’s new Watson Developers Cloud. “It doesn’t require that you understand anything about machine learning other than the need to provide training data,” Rob High, IBM’s CTO for Watson, said in a recent interview about the new platform.
The rationale to embrace platform thinking is as follows: As impressive as Watson is, even an organization as large as IBM (with over 400,000 employees) does not hold a monopoly on smart people. Platforms and ecosystems can take Watson in myriad directions, many of which you and I can’t even anticipate. Innovation is externalized to some extent. (If you’re a developer curious to get started, knock yourself out.)
Continue reading the article and you’ll see that Watson 2.0 “ships” not only with an API, but an SDK, an app store, and a data marketplace. That is, the more data Watson has, the more it can learn. Can someone say network effect?
Think about it for a minute. A data marketplace? Really? Doesn’t information really want to be free?
Well, yes and no. There’s no dearth of open data on the Internet, a trend that shows no signs of abating. But let’s not overdo it. The success of Kaggle has shown that thousands of organizations are willing to pay handsomely for data that solves important business problems, especially if that data is timely, accurate, and aggregated well. As a result, data marketplaces are becoming increasingly important and profitable.
Simon Says: Embrace Data and Platform Thinking
The market for data is nothing short of vibrant. Big Data has arrived, but not all data is open, public, free, and usable.
Combine the explosion of data with platform thinking. It’s not just about the smart cookies who work for you. There’s no shortage of ways to embrace platforms and ecosystems, even if you’re a mature company. Don’t just look inside your organization’s walls for answers to vexing questions. Look outside. You just might be amazed at what you’ll find.
What say you?
TODAY: Fri, March 24, 2017January2014