17 May 2013
Have you seen the latest offering in our Composite Solutions suite?
The Big Data Solution Offering provides an approach for storing, managing and accessing data of very high volumes, variety or complexity.
Agile agile information development Amazon Apple Big Data books Business Intelligence cloud Cloud computing Collaboration consultants CRM customer service data governance data modelling data quality data science data standards data visualization data warehousing digital disruption ERP ETL executive issues facebook Google Hadoop IBM Information Development Information Governance information theory Information Value interviews metadata management Microsoft Netflix omCollab Open Source people Privacy reporting social media Twitter unstructured data WordPress
Archive for the ‘Information Development’ Category
17 May 2013
14 May 2013
It seems rather obvious to state that choosing between a dictatorship and a democracy is such an easy choice it doesn’t even need to be discussed. However, when it comes to data, it seems like the choice is not so obvious, since for as long as I can remember the data management industry has been infatuated with the notion of instituting some form of a Data Dictatorship.
Providing the organization with a single system of record, a single version of the truth, a single view, a golden copy, or a consolidated repository of trusted data has long been the rallying cry and siren song of data warehousing, and more recently, of master data management.
I admit a data dictatorship has its appeal, especially since most information development concepts are easier to manage and govern when you only have to deal with one official source of enterprise data.
Of course, the reality is most organizations are a Data Democracy, which means that other data sources, both internal and external, will be used. During a recent Twitter chat, one discussion thread noted how the democratization of data has lead to the consumerization of data, which has made the silo-ification of data easier than ever. Cloud-based services (and other consumerization of IT trends) make rolling your own data silo simple and inexpensive, at least in terms of financial cost, but arguably expensive in terms of splintering the enterprise’s data asset.
Forging one big data silo in the cloud to rule them all might soon be pitched as a new form of data dictatorship. For this to happen, users must surrender the freedoms consumerization brought them, but history has shown reverting back to dictatorship after democracy is difficult, if not impossible.
As Winston Churchill famously said, “no one pretends that democracy is perfect or all-wise. Indeed, it has been said that democracy is the worst form of government except all those other forms that have been tried from time to time.”
No one pretends that data democracy is perfect or all-wise. Indeed, most data professionals would say that data democracy is the worst form of data governance except all those other forms that have been tried from time to time. But perhaps it’s simply time to stop pursuing any form of data dictatorship.
07 May 2013
by: Phil Simon
01 May 2013
Think for a minute about how much we spend on healthcare. In the United States, the numbers break down as follows:
For more astonishing data on healthcare, click here. The stats are frightening. With so much waste and opportunity, it should be no surprise that quite a few software vendors are focusing on Big Data–and not just behemoths like IBM. Start-ups like Explorys, Humedica, Apixio, and scores of others have entered the space.
Where’s the Data?
With so much action surrounding Big Data and healthcare, you’d think that there would be a tremendous number of examples. You’d expect there to be more statistics on how Big Data has helped organizations save lives, reduce costs, and increase revenue.
And you’d be wrong.
I’ve worked in hospitals a great deal over my career, and the term risk aversion is entirely apropos. Forget for a minute the significant difficulty in isolating cause and effect. (It’s not easy to accurately claim that deploying Hadoop throughout the organization saved 187 lives in 2012.)
Say for a minute that you’re the CIO of a healthcare organization and you make such a claim. Think about the potential ramifications from lawsuit-happy attorneys. Imagine having to respond to inquiries from lawyers about why you waited so long to deploy software that would have saved so many lives. What were you waiting for? How much will you pay my clients to drop their suit?
This isn’t to say that you can’t find data on, well, Big Data and healthcare. You can. You just have to look really hard–and you’ll more than likely be less than satisfied with the results. For example, this Humedica case study shows increased diagnosis of patients with diabetes who fell between the cracks.
Large organizations are conservative by their nature. Toss in potential lawsuits and it’s easy to understand the paucity results-oriented Big Data healthcare studies. What’s more, we’re still in the early innnings. Expect more data on Big Data in healthcare over the coming years.
What say you?
30 Apr 2013
In her book Being Wrong: Adventures in the Margin of Error, Kathryn Schulz explained “the pivotal insight of the Scientific Revolution was that the advancement of knowledge depends on current theories collapsing in the face of new insights and discoveries. In this model of progress, errors do not lead us away from the truth. Instead, they edge us incrementally toward it.”
In his book Ignorance: How It Drives Science, Stuart Firestein explained “questions are more relevant than answers. Questions are bigger than answers. One good question can give rise to several layers of answers, inspire decades-long searches for solutions, generate whole new fields of inquiry, and prompt changes in entrenched thinking. Answers, on the other hand, often end the process.”
Unfortunately, some people seem to misunderstand the goal of big data and data science to be the pursuit to provide the answers to all of our questions. Some go so far as to claim that eventually we will know everything, that soon we will be able to foretell the future with absolute certainty.
These were a few of the misunderstandings addressed by Andrew McAfee in his recent Harvard Business Review blog post Pundits: Stop Sounding Ignorant About Data. “I’ve been talking and hanging out with a lot of data geeks over the past months and even though they’re highly ambitious people,” McAfee concluded, “they’re very circumspect when they talk about their work. They know that the universe is a ridiculously messy and complex place and that all we can do is chip away at its mysteries with whatever tools are available, our brains always first and foremost among them. The geeks are excited these days because in the current era of Big Data the tools just got a whole lot better.”
“The right question asked in the right way, rather than the accumulation of more data,” Firestein concluded, “allows a field to progress. Scientists don’t just design an experiment based on what they don’t know. The truly successful strategy is one that provides them even a glimpse of what’s on the other side of their ignorance and an opportunity to see if they can’t get the question to be bigger. Ignorance works as the engine of science because it is virtually unbounded, and that makes science much more expansive.”
Science has always been about bigger questions, not bigger data.
23 Apr 2013
22 Apr 2013
The Internet of Things became a more frequently heard phrase over the last decade as more things embedded with radio-frequency identification (RFID) tags, or similar technology, allowed objects to be uniquely identified, inventoried, and tracked by computers. Early adopters focused on inventory control and supply chain management, but the growing fields of application include smart meters, smart appliances, and, of course, smart phones.
The concept is referred to as the Internet of Things to differentiate its machine-generated data from data generated directly by humans typing, taking pictures, recording videos, scanning bar codes, etc.
The Internet of Things is the source of the category of big data known as sensor data, which is often the new type you come across while defining big data that requires you to start getting to know NoSQL.
In his book Too Big to Know, David Weinberger discussed another growing category of data facilitated by the Internet, namely the crowd-sourced knowledge of amateurs providing scientists with data to aid in their research. “Science has a long tradition of embracing amateurs,” Weinberger explained. “After all, truth is truth, no matter who utters it.”
The era of big data could be called the era of big utterance, and the Internet is the ultimate platform for crowd-sourcing the knowledge of amateurs. Weinberger provided several examples, including websites like GalaxyZoo.org, eBird.org, and PatientsLikeMe.com, which, as Weinberger explained, “not only enables patients to share details about their treatments and responses but gathers that data, anonymizes it, and provides it to researchers, including to pharmaceutical companies. The patients are providing highly pertinent information based on an expertise in a disease in which they have special credentials they have earned against their will.” The intriguing term Weinberger used to describe the source of this crowd-sourced amateur knowledge was human sensors.
In our increasingly data-constructed world, where more data might soon be constructed by things than by humans, I couldn’t help but wonder whether the phrase the Internet of Humans needs to be frequently heard in the coming decades to not only differentiate machine-generated data from human-generated data, but, more importantly, to remind us that humans (amateurs and professionals alike) are a vital source of knowledge that no amount of data from any source could ever replace.
17 Apr 2013
Big Data, and information more generally, is big business and top of mind in this era of cyber security as the vision of an integrated world moves from science fiction to a day-to-day reality. There are no trivial answers to managing this burgeoning resource even though there are a myriad of point solutions. With this in mind, MIKE2.0 Governance Association (MGA), the governing body of the open source standard for information management, announced today the release of their latest publication, Information Development Using MIKE2.0.
MIKE2.0, which stands for Method for an Integrated Knowledge Environment, is an open source delivery framework for Enterprise Information Management. It provides a comprehensive methodology (with 984 significant articles so far) that can be applied across the key domains of Information Management including Business Intelligence & Performance Management, Enterprise Data Management, Access/Search & Content Delivery, Enterprise Content Management, Information Asset Management, and Information Strategy, Architecture & Governance.
Since 2006, MIKE2.0 has acted as a free resource for information professionals by bringing together best practices in methods, techniques and benchmarking. It is now being made available in print publication to a wider audience, highlighting key wiki articles, blog posts, case studies and user applications of the methodology. With many governments and businesses thinking about how to respond to the latest round of challenges including cyber risks and privacy, the MIKE2.0 book elevates the discussion from day-to-day headlines to a strategic response.
The book has already received significant industry praise:
Authors for the book include Andreas Rindler, Sean McClowry, Robert Hillard, and Sven Mueller, with additional credit due to Deloitte, BearingPoint and over 7,000 members and key contributors of the MIKE2.0 community.
The book has been published in paperback (available on Amazon.com and Barnes & Noble) as well as all major e-book publishing platforms. For more information on MIKE2.0 or how to get involved with the online MIKE2.0 community, please contact us.
13 Apr 2013
28 Mar 2013
“If you analyzed the flow of digital data in 1980,” Stephen Baker wrote in his 2011 book Final Jeopardy: Man vs. Machine and the Quest to Know Everything, “only a smidgen of the world’s information had found its way into computers.”
“Back then, the big mainframes and the new microcomputers housed business records, tax returns, real estate transactions, and mountains of scientific data. But much of the world’s information existed in the form of words—conversations at the coffee shop, phone calls, books, messages scrawled on Post-its, term papers, the play-by-play of the Super Bowl, the seven o’clock news. Far more than numbers, words spelled out when humans were thinking, what they knew, what they wanted, whom they loved. And most of those words, and the data they contained, vanished quickly. They faded in fallible human memories, they piled up in dumpsters and moldered in damp basements. Most of these words never reached computers, much less networks.”
However, during the era of big data, things have significantly changed. “In the last decade,” Baker continued, “as billions of people have migrated their work, mail, reading, phone calls, and webs of friendships to digital networks, a giant new species of data has arisen: unstructured data.”
“It’s the growing heap of sounds and images that we produce, along with trillions of words. Chaotic by nature, it doesn’t fit neatly into an Excel spreadsheet. Yet it describes the minute-by-minute goings-on of much of the planet. This gold mine is doubling in size every year. Of all the data stored in the world’s computers and coursing through its networks, the vast majority is unstructured.”
One of Melinda Thielbar’s three questions of data science is: “Are these results actionable?” As Baker explained, unstructured data describes the minute-by-minute goings-on of much of the planet, so the results of analyzing unstructured data must be actionable, right?
Although sentiment analysis of unstructured social media data is often lauded as a great example, late last year Augie Ray wrote a great blog post asking How Powerful Is Social Media Sentiment Really?
My contrarian’s view of unstructured data is that it is, in large part, gigabytes of gossip and yottabytes of yada yada digitized, rumors and hearsay amplified by the illusion-of-truth effect and succumbing to the perception-is-reality effect until the noise amplifies so much that its static solidifies into a signal.
As Roberta Wohlstetter originally defined the terms, signal is the indication of an underlying truth behind a statistical or predictive problem, and noise is the sound produced by competing signals.
The competing signals from unstructured data are competing with other signals in a digital world of seemingly infinite channels broadcasting a cacophony that makes one nostalgic for a luddite’s dream of a world before word of mouth became word of data, and before private thoughts contained within the neural networks of our minds became public thoughts shared within social networks, such as Twitter, Facebook, and LinkedIn.
“While it may seem heretical to say,” Ray explained, “I believe there is ample evidence social media sentiment does not matter equally in every industry to every company in every situation. Social media sentiment has been elevated to God-like status when really it is more of a minor deity. In most situations, what others are saying does not trump our own personal experiences. In addition, while public sentiment may be a factor in our purchase decisions, we weigh it against many other important factors such as price, convenience, perception of quality, etc.”
Social media is not the only source of unstructured data, nor am I suggesting there’s no business value in this category of big data. However, sometimes a contrarian’s view is necessary to temper unchecked enthusiasm, and a lot of big data is not only unstructured, but enthusiasm for it is often unchecked.