Archive for the ‘Information Management’ Category
Facebook has fallen on hard financial times lately. That 100-billion-dollar valuation now seems wildly optimistic, put mildly. To be sure, Facebook is hardly the only company to slide after its IPO–and it won’t be the last. But is there something more dangerous going on at the world’s largest social network–something that may threaten its very existence?
The Data Problem?
As James Ball writes in in The Guardian about the larger problem facing Mark Zuckerberg’s company:
While Google’s revenues are growing – not a bad feat in the current economy – the huge amounts of extra data it’s accumulating aren’t improving its actual ads: the money the company gets for each advert is actually falling. If more data doesn’t make these companies more cash, the rationale falls away. Google’s adverts make it a huge amount of money, and will continue to do so, but there’s no evidence that more user data is making those adverts more effective at generating profit than they already were.
A large chunk of Facebook’s business model is based on the “more data is better than less data” assumption. In theory, this will bring in advertisers and, ultimately, profits. (Parenthetically, this is precisely why Facebook scares the hell out of Google.)
But The Guardian piece begs the following important and fundamental questions about the value of data:
- Does data eventually reach a point of diminishing returns?
- Is more data always better than less data?
We’ll probably find out over the next few quarters or years if Facebook is able to monetize what is perhaps the largest trove of data in the history of the world. What’s more, if Facebook can’t do it, will any organization be able to make sense–and, more important, money–from vast amounts of user-generated information?
Your Organization is Not Facebook
Now, don’t for one minute dismiss the need for (and value of) Big Data, sentiment analysis, semantic technologies, and other modern data management techniques. Facebook’s struggles hardly prove that there is no legitimate value to be gleaned from such things. Let’s say, for the sake of argument, that Facebook can’t justify such a lofty valuation (now or in the future). This in no way means that its data is worthless. It just might not be worth as much as some think. In other words, the argument here centers around how much that data is worth–not whether the data is worth anything.
It’s my firm belief that the vast majority of organizations need to both manage their existing data better. I can think of few that wouldn’t benefit from increasing the types and amount of data they manage. If there is such a thing as diminishing returns to the value of data, Facebook is much, much closer to realizing it than your organization is. Even if Facebook’s stock plummets to zero (and Larry Page bought drinks for everyone in Silicon Valley), it behooves organizations to embrace the “more is better” data theory. Structured, unstructured, and semi-structured data are extremely valuable assets, not liabilities.
Much like any company, the big question for Facebook is not what kind of data it has. Rather, it’s “What can it do with that information?”
What say you?
Nate Silver is a really smart guy. Not even 40, he’s already developed tools to analyze statistics ultimately bought by Major League Baseball. He writes about politics and stats for the New York Times. And, at the Data Science Summit here in Las Vegas, I saw him speak about Big Data. (You can watch the video of that talk here.)
DSS was all about Big Data, a topic about which I’m not terribly familiar. (It is a fairly new term, after all.) I suspected that Silver would be speaking over my head. But a funny thing happened when I watched his speak: I was able to follow most of what he was saying.
By way of background, I’m a numbers guy and have been for a very long time. Whether in sports or work, I like statistics. I like data. I remember most of the material from my different probability and stat courses in college. I understand terms like p values, statistical significance, and confidence intervals. This isn’t that difficult to understand; it’s not like I’m a full-time poet or ballet dancer.
Wide, Not Long
The interesting thing about new data sources and streams is that some of the old tools just don’t cut it. Consider the relational database. Data from CRM or ERP applications data fit nicely into transactional tables linked by primary and foreign keys. On many levels, however, social media is far from CRM and ERP–and this has profound ramifications for data management. Case in point: Twitter runs on Scala, not a relational database. Why? To make a long story short, the type of data generated and accessed by Twitter just can’t run fast enough in a traditional RMDB architecture. This type of data is wide, not long. Think columns, not rows.
To this end, Big Data requires some different tools and Scala is just one of many. Throw Hadoop in there as well. But our need to use “some different tools” does not mean that tried and true statistical principles fall by the wayside. On the contrary, old stalwarts like Bayes’ Theorem are still as relevant as ever, as Silver pointed out during his speech.
Simon Says: The Best of Both Worlds
In an era of Big Data, there will be winners and losers. Organizations that see success will be the ones that combine the best of the old with the best of the new. Retrofitting unstructured or semistructured data into RMDBs won’t cut it, nor will ignoring still-relevant statistical techniques. Use the best of the old and the best of the new to succeed in the Big Data world.
What say you?
Anyone who has ever tried to negotiate a standard for data storage or communication will confirm that it is difficult to get agreement and even harder to gain adoption. Decades of debate over both analogue and digital communications standards for radio, television and telecommunications have been used as evidence by the information technology sector that there must be a better way.
In the 1990s there was a great desire to avoid having the internet break into the type of divides that saw the television world split into NTSC and PAL (television) camps. If the newly commercial Internet was to be open and connected then it was assumed that everyone needed to speak the same technical standard language. One of the first examples of this was HTML and the creation of the W3C to govern it as a standard.
Moving beyond the rendering of web pages, it was argued that communication of information needed to be open and subject to specifications and standards. XML was born of this push to an open, systematic approach to data specifications. A quick review of XML-based standards shows how many groups have taken this ambition to heart.
XML is meant to be different. By providing a generic umbrella with standardised technology, theoretically any information can be encoded. From documents to financial statements.
The big question is whether standards-based specifications work or at least whether they are as important as they once were.
In an era when technologies were deployed and then static, such as traditional televisions, telephones or radio, the approach to information decoding was fixed at the time of manufacture. A decision on information formats needed to be made before sending large numbers of devices out to the public, otherwise no two telephones could talk and television stations would have had to broadcast in a myriad of formats (assuming they could find enough spectrum to do so). In such a world, it was up to government and major industry bodies to decide on standards, which is why they tended to differ by country. Today, the last remaining bastion of this era is the division of spectrum which continues to lead to frustrating differences in the deployment of mobile technology between countries.
At the very time that the internet was pushing for adoption of standards, the user community seemed to be voting with their feet (or at least fingers) by adopting some of the least standardised formats in the market. A good example is the rapid adoption of Adobe’s PDF format for rendered documents. The PDF format breaks all of the rules that W3C hoped to set with XML and yet it met a very real business and consumer need – an accurate and efficient onscreen rendering of paper documents.
As much as Adobe used the PDF format to its commercial advantage, it was ultimately only able to sustain its position by handing it over as an open standard. In this case, standards have followed commercial evolution rather than the other way around.
Despite appearances, HTML has followed a similar pattern with the dominant browser of the time effectively defining extensions to the format which have then have ultimately been adopted as part of the standard. Attempts to drive HTML in the opposite direction, through initiatives such as the semantic web seem to fail on both agreement and even more importantly adoption.
The internet is fundamentally different as a vehicle for communication to anything that has preceded it. Very few devices are locked-down, with even connected televisions being provided constant software updates. Browsers, word processors, spreadsheets, reporting tools and a myriad of other products used for reading and authoring files support “add-ins” which allow for new file formats to be supported.
The ability for products to rapidly adopt formats and allow for relatively seamless information interchange has been highly evident with the take-up of mobile devices spawning a wide range of software products supporting traditional office documents. It is unthinkable today that you wouldn’t be able to read and update your MS Word document on your iPad using your tool of choice and then send it on to someone else’s Android tablet.
Clearly standards are still important, but our assumptions on drivers and sequencing might need to change. Standard formats need to benefit all parties involved in a way that they can immediately see. The only alternative is for them to be mandated by the party that benefits (such as government or regulatory data submissions). If a standard is to develop and leverage innovation across a sector, it cannot rely on regulation alone.
At the same time, there is less need to be afraid of having several different approaches to communication in play at the same time. Because the internet is “always on”, the market will ensure that those that benefit from translators will have access to them. The market will also naturally encourage convergence over time.
The future of information standards is perhaps to encourage the right markets and economic motivations rather than rely on regulation and expert committees.
The house was quiet before Henry (foreground) arrived. Ginger (background) would rarely bark. Now, when Henry hears something even the slightest bit unfamiliar, he will utter a soft grrrrr. Ginger then follows with a louder grrrrrrrrrr. Then, Henry barks softly. Then, Ginger barks. Within a matter of 5 seconds, they’re both going ballistic.
Now, I can handle it and I will go Dog Whisperer on them, but I was thinking that this is how frenzies get created in all walks of life. Including information management. Analysts, bloggers, vendors and the media alike don’t want to get left behind so the ante continually gets upped. Does big data affect your life? The 2-word phrase has almost lost its meaning with so many vendors claiming it for so many different things.
I like MIKE and one of the reasons is that it’s written by practitioners. You don’t see a lot of “survey says” here. You see a lot of 1-1 experiences being shared. You can determine the hype level better that way.
Now, excuse me while I see what the fuss is about in the other room. Probably nothing.
Photo Credit: » Zitona «
“Simplicity is the ultimate sophistication.”
–Leonardo da Vinci
Author’s note: In a series of posts over the next few months, I’ll be delving into a nascent trend: the Applefication of the Enterprise. Today’s introductory post lays a bit of the groundwork for the series.
In early February of 2012, Halliburton, one of the world’s largest oilfield service companies, became the latest enterprise to abandon RIM’s BlackBerry. Halliburton’s new smartphone of choice: Apple’s iPhone.
Even two years ago, this would have been earth-shattering news. Companies of this size just didn’t buy Apple products. These days, however, announcements like these have almost become commonplace. That is, Halliburton is hardly alone in adopting the Apple’s iPhone throughout the company. In late 2011, Pfizer announced that it will purchase a rumored 37,000 iPads for its scientists and sales and manufacturing employees. In the same year, biotech giant Genentech announced that it had rolled out 30 company-specific apps in its own private app store.
Government Goes Apple Too
If you think that this trend is limited to the private sector companies rife with cash, think again. In mid-February of 2012, another government institution went Apple. As David Zax writes, “the National Oceanic and Atmospheric Administration (NOAA) is throwing their BlackBerrys overboard, opting for Apple products instead. Though NOAA already put some 3,000 BlackBerry devices in circulation among its 20,000 workers, it will only be supporting the devices until May 12. NOAA CIO Joe Klimavicz cited the cost of Research in Motion’s software as the chief reason for the switch.” The cash-strapped public sector is realizing that Apple products are not only just cooler; they actually may be cost-efficient relative to existing applications and devices.
The reasons for these moves aren’t terribly difficult to understand. “With a relatively small investment, companies can re-create the whole information-on-the-fly scenario that was nearly impossible before”, says Pierfrancesco Manenti, an analyst at information technology outfit IDC. More large organizations will doubtless follow the mass exodus away from once omnipresent BlackBerries. PCs and laptops will give way to iPads. Apps will continue to supplant many complex software programs in new and exciting ways in enterprises across the globe.
The key point of these stories is not the demise of any one tool like the BlackBerry, a product whose rate of innovation clearly has trailed that of its competitors. Rather, the story illustrates a new mind-set and the transformative power of Apple and its products. They are no longer merely the solve purview of stylish consumers, small design firms, and niche start-ups. Increasingly, Apple products are now accepted as real products in real companies that solve real problems.
Definition: The Applefication of the Enterprise
The Applefication of the Enterprise is not just about enterprises purchasing and deploying Apple products. Rather, it’s about what companies like Apple and Google represent: simplicity, ease of use, self-service and rapid deployment. Every organization will not use Apple products—and Apple couldn’t produce enough iPads, iPhones, and MacBooks even if every last CIO signed up. Rather, Apple is focusing all organizations to reevaluate their existing technologies, applications, and infrastructure with the intent of making them simpler, more user-friendly, more Apple. In short, Apple is causing large organizations to think different.
What say you?
In the next installment of this series, I’ll be taking a look at some of the factors driving the Applefication of the Enterprise.
The MIKE 2.0 framework provides specific tools to promote effective data migration. Today I’d like to discuss the paramount importance of the initial estimate.
First up, let me state the obvious: Not all data migrations are created equal. While particulars vary, they can be safely categorized into three big buckets: light, medium, and heavy. Which one most aptly describes your organization? This simple question is often the most important one on any information management (IM) project. What’s more, it’s often answered incorrectly by the following folks:
- clients eager to minimize the budgets of their projects
- software vendors eager to book a sale, especially before the end of a quarter or fiscal year
- consulting firms, afraid of being too difficult for fear of not winning the business
In my experience, far too many senior executives severely underestimate the amount of time, money, effort involved in moving from one or more systems to one or more different ones. It’s hard to overstate the importance of being honest with yourself before an IM project commences.
For instance, let’s say that the picture below accurately represents what you believe to be the state of your organization’s systems and data–and their ultimate destination (see #7):
On the basis of the above diagram, you and your consulting partners estimate that it will take about three months to extract, cleanse, transfer, and load your data into the new target systems. Upon commencing the project, however, you discover that your organization actually supports more systems, applications, and standalone databases than previously known. (Don’t believe that this happens? Trust me. I’ve seen this more than a few times in my days as an IT consultant.).
As a result of these discoveries, the new architecture looks more like this:
To boot, that three-month estimate is now laughably low. The fallout begins. Consulting firms press for change orders and for more money. Clients resist this–and start complaining about the customer focus of their new partner. Internally, fingers are pointed at each other and obscenities are exchanged. People resign or are fired.
Simon Says: Truth in Advertising
Few things are more important than understanding the amount of time, money, and effort required to successfully migrate data. If in doubt, overestimate. It’s always easier to come in under a higher budget than over a lower one.
If you suspect that you’re organization falls into the medium scenario, don’t be afraid to call a spade a spade. Representing your company as light when it’s really medium (or medium when it’s really heavy) is a recipe for disaster.
Finally, realize that a good-faith estimate is just that. Anyone who believes that he or she can predict with absolute certainly how many hours or how much money is required on an IM project of any size is delusional. Take estimates with a bit of salt. Problems always emerge. Just be prepared when they do.
What say you?
The 1993 documentary The War Room tells the story of the 1992 US presidential campaign from a behind-the-scenes’ perspective. The film shows first-hand how Bill Clinton’s campaign team responded to different crises, including allegations of marital infidelity. While a bit dated today, it’s nonetheless a fascinating look into “rapid response” politics just when technology was starting to change traditional political media.
Today, we’re starting to see organizations set up their own data war rooms for essentially the same reasons: to respond to different crises and opportunities. Information Week editor Chris Murphy writes about one such company in “Why P&G CIO Is Quadrupling Analytics Expertise”:
[Procter & Gamble CIO Filippo] Passerini is investing in analytics expertise because the model for using data to run a company is changing. The old IT model was to figure out which reports people wanted, capture the data, and deliver it to the key people weeks or days after the fact. “That model is an obsolete model,” he says.
Murphy hits the nail on the head in this article. Now, let’s delve a bit depper into the need for a new model.
The Need for a New Model
There are at least three factors driving the need for a new information management (IM) model in many organizations. First, let’s look at IT track records. How many organizations invested heavily in the late 1990s and early 2000s on expensive, on-premise ERP, CRM, and BI applications–only to have these investments ultimately disappoint the vast majority of stakeholders? Now, on-premise isn’t the only option. Big Data and cloud computing are gaining traction in many organizations.
Next up: time to respond. Beyond the poor track record of many traditional IT investments, we live in different times relative to even ten years ago. Things happen so much faster today. Why? The usual supects are the explosion of mobility, broadband, tablets, and social media. Ten years ago, the old, reactive requirement-driven IM model might have made sense. Today, however, that model becoming increasingly difficult to justify. For instance, a social media mention might cause a run on products. By the time that proper requirements have been gathered, a crisis has probably exacerbated. An opportunity has probably been squandered.
Third, data analysis and manipulation tools have become much more user-friendly. Long gone are the days in which people needed a computer science or programming background to play with data. Of course, data modeling, data warehousing, and other heavy lifting necessitate more technical skills and backgrounds. But the business layperson, equipped with the right tools and a modicum of training, can easily investigate and drill down on issues related to employees, consumers, sales, and the like.
Against this new backdrop, which of the following makes more sense?
- IT analysts spending the next six weeks or months interacting with users and building reports?
- Skilled users creating their own reports, creating and interpreting their own analytics, and making business decisions with minimal IT involvement (aka, self service)?
Building a data war room is no elixir. You still have to employ people with the skills to manage your organizations data–and hold people accountable for their decisions. Further, rapid response means making decisions without all of the pertinent information. If your organization crucifies those who make logical leaps of faith (but ultimately turn out to be “wrong” in their interpretation of the data), it’s unlikely that this new model will take hold.
What say you?
All too often on information management (IM) projects, the best laid plans go awry. Despite copiously followed methodologies, extensive planning sessions, and a bevy of high-priced consultants, results frequently do not meet expectations.
Well, there are many reasons, but ultimately interpersonal communication is sorely lacking. In this post, I’ll discuss how these communication issues can cause a data nightmare to remember.
A Case in Point
Two years ago, I was managing data conversions for a large IM project at a regional hospital in New Jersey. I had built a Microsoft Access database with a number of involved automated routines that imported legacy data, mapped it new values, and generated upload files for the new ERP application.
The database took months to develop and, because of the complexity and number of different legacy data sources, it wasn’t exactly intuitive to the layperson. Complicated ETL programs are that way because the data is often, well, complicated. I had created a number of temp tables, queries, and macros that would properly format employee and vendor data, taking into account myriad undocumented rules.
Well, the ETL tool worked, until it didn’t.
One Monday morning, I loaded tens of thousands of what should have been accurate records into the test environment. A few hours later, a few people came to me with questions. They were seeing errant data and wanted to know what I had done wrong. Since I wasn’t senile yet, I had the same question. After all, my ETL program worked fine when I left on Friday.
I did some digging, asking a few people in my general work area if they had noodled with anything in my database. It turns out that one user (Kathy – not her real name) had in fact tweaked the Access database, changing an important date field from which other dates were derived.
I asked Kathy why she did this. Her response was priceless, “Why? Is it a big deal?”
The short answer was yes. It was a big deal. A really big deal. All of the data loaded into the test environment had to be purged and testing (already well behind schedule) was delayed even more. Fortunately, the team and I were able to undo her changes and resume testing a few days later. I then locked the database down to prevent future tampering.
It’s always better to ask permission rather than forgiveness on IM projects, especially when making changes that can affect hundred thousands of records. Adding an individual vendor or pay check or sale is one thing; loading five years’ worth of data is another.
These days, everyone is reachable. Pick up the phone and ask someone if you have a question before potentially affecting everyone’s data.
What say you?
Let’s try a test.
Organization ABC has deployed top-tier enterprise software. It has hired an army of expensive consultants who advise that people should follow specific business practices designed to maximize data quality.
Contrast ABC with organization XYZ. The latter’s management never upgraded its mainframe, bought “modern” apps and, to be frank, some of its business processes are antiquated.
Based on this information, which organization manages its data better?
You’d probably guess ABC, right? Why? The answer can be found in Daniel Kahneman’s new book Thinking, Fast and Slow (affiliate link). He writes about how the human brain is broken into two systems. From the book’s Amazon page:
System 1 is fast, intuitive, and emotional; System 2 is slower, more deliberative, and more logical. Kahneman exposes the extraordinary capabilities—and also the faults and biases—of fast thinking, and reveals the pervasive influence of intuitive impressions on our thoughts and behavior. The impact of loss aversion and overconfidence on corporate strategies, the difficulties of predicting what will make us happy in the future, the challenges of properly framing risks at work and at home, the profound effect of cognitive biases on everything from playing the stock market to planning the next vacation—each of these can be understood only by knowing how the two systems work together to shape our judgments and decisions.
When you read the start of this post, you were invoking System 1.
Kahneman has taken some flak from academics because he has ostensibly simplified years of research. Pay them no heed. Few people are going to read books written like dense theses rife with citations.
This notion of two systems is essential in understanding how we interpret–or fail to interpret data. In Chapter 19 of the book, he writes about how intelligence on 9/11 gathered a few months before that awful day was not reported directly to George W. Bush. Rather, that information went to Condoleeza Rice, then National Security Advisor.
Of course, hindsight is 20/20. It’s easy to point fingers because we know now what we didn’t know then. But how often is that the case?
Systems 1 dominates most of the time, fueled by our need to understand the world as quickly as possible. Case in point: We like simple stories with tactical, repeatable instructions. If I only do these ten things, then my company will be the next Wal-Mart or Apple. Books like The Halo Effect point out the facile nature of most management texts.
(Side note: I am not being hypocritical here. One of the things of which I am most proud in my most recent book, The Age of the Platform: How Amazon, Apple, Facebook, and Google Have Redefined Business, is that I don’t provide a ten-point plan on how to be the next Google. I’m just not that smart. In fact, if launched today, I’d argue that these four companies wouldn’t be the companies they are right now. Luck and timing are huge.)
Are companies successful because their CEO practices certain management techniques? Or is the chain reversed? Ultimately, this is impossible to tell absent some experiment.
Many organizations mistakenly follow a me-too approach to data management. That is, they model their data, buy applications, and/or follow “best practices” like they were scripture because “successful” companies are doing the same. But successful data management is more art than science; there are only necessary conditions. Those looking for recipes are probably going to be disappointed with the results.
What say you?
A few weeks ago, I wrote about the outsourcing of data analysis and discovery through a site called Kaggle. Today, I’d like to go deeper.
A look at the site reveals a number of fascinating data contests, including one that offers $3 million (USD) for identifying patients who will be admitted to a hospital within the next year, using historical claims data. For a look at the data, data dictionary, and the like, click here.
How, you ask, can an organization such a large prize? It’s actually not that hard to understand. From the site:
More than 71 million individuals in the United States are admitted to hospitals each year, according to the latest survey from the American Hospital Association. Studies have concluded that in 2006 well over $30 billion was spent on unnecessary hospital admissions. Is there a better way? Can we identify earlier those most at risk and ensure they get the treatment they need? The Heritage Provider Network (HPN) believes that the answer is “yes”.
Do the math. $3 million is one-hundredth of one percent of $30 billion. One could even argue that the prize for that kind of savings should be ten times higher than what is currently offered, but the current bounty is clearly not holding people back. At the time I wrote this post, 734 teams or individuals or companies had entered to win the $3 million prize.
Why are so many people competing? To quote Gordon Gekko, “It’s all about bucks, kid.” $3 million is clearly a great deal of incentive–and that doesn’t include the invariable PR benefit of winning the prize.
In a way, the mere fact that this type of project has to be outsourced is, quite frankly, sad. Think about it. With more than $1 trillion wasted on healthcare in the United States, even moving the needle a little bit can result in massive savings. Yet, clearly something isn’t working.
Healthcare is just one of many industries has become complacent and utterly incapable of fixing its own problems. (Of course, there are many others, as Jeff Jarvis’ wonderful book What Would Google Do? [affiliate link] points out.)
This is the beauty of the Internet. It has brought with it increased transparency, opportunity, and tools. No longer do people and organizations need to sit idly by on the sidelines as opportunities are squandered and poor practices are ossified. No, creative and/or frustrated folks can take their data or their causes online and circumvent traditional gatekeepers.
Now, no one is saying that developing this type of predictive algorithm is easy. It can’t be. But that’s a far cry from impossible. Perhaps the current level of waste is simply an example of a market failure.
In any event, the Kaggle example demonstrates how poorly many–if not most–large organizations treat the topic of information management. Maybe if organizations awarded major bonuses to individuals, teams, and departments for (one could argue) doing their jobs, they wouldn’t have to go elsewhere.
Then again, maybe more of them should.
What say you?
TODAY: Mon, December 9, 2013December2013