Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Posts Tagged ‘Amazon’

by: Phil Simon
09  Sep  2013

On Corkscrews, Cocaine, and Duplicate Data

In 2006, I was working on a large and particularly crazy ERP implementation in Philadelphia, PA. I had plenty to do with HR and payroll data conversions, but people there knew that I was pretty adept at data manipulation and ETL. It wasn’t totally surprising, then, that someone one day tapped on my shoulder.

The PM asked me to visit the folks in procurement. Turns out that the product list was a complete mess. Of the nearly 60,000 “different” items listed at this hospital, there were maybe 15,000 unique ones. Different product codes and descriptions spiraled out of control over the past twenty years. Could I work my magic and fix them?


Ah, if only I were that good. Forget the fact that I’m not a doctor. I know the difference between a syringe and a lab coat, but I could not possibly tell you that the specific differences among different types of medical supplies. I could run queries that counted the number of times that cocaine appeared in the list, but I couldn’t tell you that type A was different than type B. (And I wasn’t willing to sample either to find out.)

After I looked at the data, I told the PM that it was just too messy. No number of SQL statements would fix the problem. Someone was going to have to roll up his or her sleeves and decide on the master list. Ideally, that person would have years of direct experience working with these supplies.

He was none too pleased. Maybe I wasn’t as good as my reputation. Dupes ran rampant at that hospital, and not just with respect to medical supplies. I should just be able to dedupe immediately, according to the PM.

Amazon Gets It

OXO Good Grips Winged Corkscrew, Black

Contrast that experience with my recent order of a OXO Good Grips Winged Corkscrew, Black from Amazon. In a rare PICNIC, I submitted the same order twice.

So, did Amazon accept my duplicate order?

No, the site made me confirm that I wanted to order a second wine opener.

Simon Says: Build in Dupe Protection

You can’t prevent user errors. You can, however, build in system safeguards and soft edits to minimize duplicate records. Do it.


What say you?


Tags: ,
Category: Information Management, Information Value
No Comments »

by: Phil Simon
08  Jul  2013

The Increasing Meaninglessness of the Data-Metadata Distinction

“One man’s data is another man’s metadata”

As I pen these words, the PRISM scandal continues to unfold. The questions raised by the National Security Administration’s formerly furtive program strike at the very heart of a free society.

The fallout will continue for months, if not years. Maybe it will spark a deeper conversation about data ownership. Perhaps more people will echo the words of Jim Harris, who wrote recently on this site:

So, if we are so concerned about the government accessing this data, then why were we not similarly concerned about having voluntarily provided this data to those companies in the first place?  Because we agreed to a data privacy policy (which we have no choice but to accept, and most of us never read)? Or because those companies have comforting corporate mottos like “don’t be evil” (Google)?

The Metadata Cop-Out?

I for one noticed something interesting buried in many of the non-denial denials, the carefully scripted and lawyer-approved statements from Microsoft, Apple, Yahoo!, Microsoft, Facebook, and others. Many press releases claimed (truthfully, for all I know) that these companies didn’t provide data per se to the NSA. Rather, they provided metadata. In other words, Yahoo! didn’t give up the actual contents of any single email, just things like:

  • the sender’s email address
  • the receiver’s email address
  • the subject of the email
  • the time and date that the email was sent

So, what is this distinction between data and metadata? And does it ultimately matter?

I discussed this very subject recently with my friend Melinda Thielbar, a real-life statistician and data scientist. She agreed with me that the distinction is becoming “essentially meaningless.” Equipped with enough of (the right) metadata, one can more or less figure out what’s going on–or at least identify potentially suspicious communications among persons of interest.

Simon Says

The quote at the beginning of this post is as true as its ever been. In a world of Big Data, metadata is increasingly important. It’s not just the video, picture, blog post, email, or customer record that matters. The data about or “behind” the data can be just as critical.


Is your organization paying attention to its metadata?

Tags: , , , ,
Category: Information Development, Metadata

by: Phil Simon
09  May  2013

Netflix: Understanding the Who, How, When, and Why

I’ve written before on this site on how Netflix uses data in fascinating ways. The company’s knowledge of its customers–and their viewing habits–is beyond impressive. Moreover, it’s instructive for companies attempting to navigate the era of Big Data.

Consider this Wired article explaining how Netflix operates and utilizes its data:

But Net­flix doesn’t only know what its audience likes to watch; it also knows how viewers like to watch it—beyond taste, the company understands habits. It doesn’t just know that we like to watch Breaking Bad; it knows that we like to watch four episodes of Breaking Bad in a row instead of going to sleep. It knows, in other words, that we like to binge.

Think about the astonishing level of knowledge that Netflix has developed on its customers. Not just the what, but the how, the when and (increasingly) the why. Talk about the Holy Grail. This should be the goal of every for-profit enterprise. Period.

Minimizing Risk

Equipped with information like this, Netflix can make more informed business decisions. Case in point: Its decision to resurrect the very popular cult classic Arrested Development. While by no means an inexpensive or risk-free venture, Netflix management is reasonably confident that its bet will pay off for one simple reason: its data supports the decision.

Note that each move Netflix makes is hardly guaranteed. No business decision even resembles complete certainty. But, by virtue of its exceptional data management, Netflix has moved the needle considerably. Its odds of success are without question much higher because it has minimized risk.

Think about the way in which far too many organizations operate these days. Forget Big Data and effectively harnessing its power. I’ve personally seen many enterprises manage their data so poorly that a comprehensive and accurate list of customers could not be produced in a reasonable period of time. Without this information, questions like how, when, and why could not be answered. The downstream effect: Decisions were based upon conjecture, rank, culture, and policy. This type of scenario is hardly ideal.

Simon Says

Rome, as they say, was not built in a day–and neither was Netflix. Rather than fret over the state of your data, take the steps now to improve your ability to analyze data in a few years.


What say you?

Tags: ,
Category: Information Value
1 Comment »

by: Phil Simon
09  Mar  2013

A Key Data Management Lesson from Amazon

Few companies do data management better than Amazon–and I’m not just talking about their internal practices. Regardless of how well the company’s analytic systems generate über-accurate recommendations, it’s not perfect. Nor, for that matter, is it a substitute for human intuition.

To the Amazon’s credit, it recognizes the inherent limitations of relying exclusively upon its sophisticated algorithms and machines. Why not let customers refine, customize, and even remove their own recommendations, à la Netflix? In fact, Amazon does just that. Just look at the image below:

When perusing books, Amazon lets each customer override its own algorithm-generated recommendations. In this case, I can tell Amazon that I have no desire to read Guy Kawasaki’s book. (This was random. I have no bone to pick with Apple’s former chief evangelizer.)

It’s evident to me that machines can spawn remarkable recommendations. Collaborative filtering is nothing short of amazing–and I’m more than willing to consider Netflix gentle suggestions. But organizations adept at Big Data realize the inherent limitations of a computer- or data-only method to data management. In fact, there are typically legitimate reasons to ignore the results of even very accurate algorithms. Brass tacks: they’re not always right.

Even mighty Google–another Big Data stalwart–isn’t batting 1.000 vis-à-vis algorithm accuracy. Consider the recent story on Google Flu that “drastically overestimated peak flu levels.”

The Limits of Democratized Data

No one is saying that all data should be democratic. I can’t imagine allowing employees to update their own pay rates or companies letting vendors tweak their own invoices. (In fact, ERP self-service tools have been with us for more than a decade, although many organizations refuse to use them for a cauldron of reasons).

Still, it’s hard to see the downside of Amazon’s move here. After all, don’t customers know what they like better than some machine? What’s the real harm in allowing them to remove items from their search or browsing history that they have no intention of buying? I’d argue that the benefits of this type of move far exceed their costs.

Simon Says

Organizations ought to learn from the examples set by Big Data leaders such as Amazon, Apple, Facebook, and Google. No CTO, CIO, or individual employee should be so enamored with his or her algorithm or technology that common sense is ignored. As a starting point, yes, emerging technologies and fancy algorithms can do amazing things and tap into heretofore unknown insights. By the same token, though, a user-override can often improve a good but imperfect result.


What say you?

Tags: , ,
Category: Information Management
No Comments »

by: Phil Simon
03  Dec  2012

Can Big Data Save CMOs?

Half the money I spend on advertising is wasted; the trouble is I don’t know which half.

John Wanamaker

Executive turnover has always fascinated me, especially as of late. HP’s CEO Leo Apotheker had a very short run and Yahoo! has been a veritable merry-go-round over the last five years. Beyond the CEO level, though, many executive tenures resemble those of Spinal Tap drummers. For instance, CMOs have  notoriously short lifespans. While the average tenure of a CMO has increased from 23.6 to 43 months since 2004, it’s still not really a long-term position. And I wonder if Big Data can change that.

In a recent article for Chief Marketer, Wilson Raj the global customer intelligence director of SAS, writes about the potential impact of Big Data and CMOs. From the piece:

CMOs today are better poised than ever not only to retain their roles, but to deliver broad, sustainable business impact. CMOs who capitalize on big data will reap big rewards, both personally and professionally. Bottom line: Businesses that exploit big data outperform their competition.

Necessary vs. Sufficient Conditions

The potential of Big Data is massive. To realize it to an optimal level, however, organizations need to effectively integrate transactional and analytical data and systems. Lamentably, many organizations are nowhere close to being able to do this. That is, for every Quantcast, Amazon, Target, and Wal-Mart, I suspect that dozens or even hundreds of organizations continue to struggle with what should be fairly standard blocking and tackling. Data silos continue to plague many if not most mature organizations.

Utilizing Big Data in any meaningful way involves a great deal more than merely understanding its importance. Big Data requires deploying new solutions like NoSQL databases, Hadoop, Cassandra, and others. Only then will CMOs be able to determine the true ROI of their marketing efforts. That is, accessing and analyzing enterprise and external (read: social) information guarantees nothing. A CMO will not necessarily be able to move the needle just because s/he has superior data. (Microsoft may have all of the data in the world, but so what? Bing hasn’t made too many inroads in the search business and Surface isn’t displacing the iPad anytime soon.)

Think of access to information as a necessary but insufficient condition to ensure success. As I look five and ten years out, I see fewer and fewer CMOs being able to survive on hunches and standard campaigns. The world is just moving too fast and what worked six months ago may very well not work today.

Simon Says

Some believe that Big Data represents the future of marketing. I for one believe that Big Data and related analytics can equip organizations with extremely valuable and previously unavailable information. And, with that information, they will make better decisions. Finally marketers will be able to see what’s really actually going on with their campaigns. Perhaps problems like the one mentioned at the beginning of this post can finally be solved.


What say you?

Tags: , , , , , ,
Category: Information Value, Master Data Management
1 Comment »

by: Phil Simon
16  Aug  2012

Big Data and the Gang of Four

At least to me, sometimes Big Data often seems like a bit of an amorphous term. Just what exactly is it, anyway?

Consider the following statistics from the Gang of Four:

  • Amazon: Excluding books, the company sells 160 million products on its website. Target sells merely 500,000. Amazon’s reported to have credit cards on file for 300 million customers. 300 million. For more Amazon stats, click here.
  • Apple: The company a few months ago passed 25 billion app downloads.
  • Facebook: 954 million registered users share more than one billion pieces of content every day.
  • Google: As of two years ago, Google handled 34,000 searches per second.

These numbers are nothing less than mind-blowing. While Facebook’s rate of growth seems to be waning, make no mistake: it’s still growing. (Deceleration of growth shouldn’t be confused with deceleration.)

While we’re at it, let’s look at some of Twitter’s numbers. On the company’s five-year anniversary, the company posted the following numbers:

  • 3 years, 2 months and 1 day. The time it took from the first tweet to the billionth tweet.
  • 1 week. The time it now takes for users to send a billion tweets.
  • 50 million. The average number of tweets people sent per day, one year ago.
  • 140 million. The average number of tweets people sent per day, in the last month.
  • Oddly, 80 percent of all tweets involve Charlie Sheen.

OK, I’m making the last one up, but you get my drift.

A few things strike me about these numbers. First, this is a staggering amount of data. Second, all of this data is kept somewhere. To varying extents, these companies and others are turning data into information and, ultimately knowledge.

What they do with that knowledge varies, but no one can doubt the potential of so much data–even if much of it is noise. Another issue: will people continue to use ad-supported platforms? Will we become sick of having our data sold to the highest bigger? Or, will private, ad-free platforms like flourish?

Even if the latter is true, those private platforms will still be generating data. So, in a way, the explosion of data does not hinge upon the continued growth of open or “somewhat-open” platforms

Simon Says

If you think that consumers are going to be generating and using less data in the upcoming years, you’re living in an alternate reality. Take steps now to ensure that your organization has the software, hardware, and human capabilities to handle vastly increasing amounts of data.


What say you?

Tags: , , ,
Category: Information Management, Information Strategy
No Comments »

by: Phil Simon
24  Jun  2012

The Semantic Web Inches Closer

I’ve written before on this site about the vast implications of the forthcoming semantic web. In short, it will be a game-changer–but it certainly won’t happen anytime soon. Every day, though, I hear about organizations taking one more step in that direction. Case in point: A few days ago, Harvard announced that it was “making public the information on more than 12 million books, videos, audio recordings, images, manuscripts, maps, and more things inside its 73 libraries.” From the piece:

Harvard can’t put the actual content of much of this material online, owing to intellectual property laws, but this so-called metadata of things like titles, publication or recording dates, book sizes or descriptions of what is in videos is also considered highly valuable. Frequently descriptors of things like audio recordings are more valuable for search engines than the material itself. Search engines frequently rely on metadata over content, particularly when it cannot easily be scanned and understood.

This might not seem like a terribly big deal to the average person. Five years ago, I wouldn’t have given this announcement much thought. But think for a moment about the ramifications of such a move. After all, Harvard is a prominent institution and others will no doubt follow its lead here. More metadata from schools, publishers, record companies, music labels, and businesses mean that the web will become smarter–much smarter. Search will continue to evolve in ways that relatively few of us appreciate or think about.

Understanding Why

And let’s not forget about data mining and business intelligence. Forget about knowing more about who buys which books, although this is of enormous importance. (Ask Jeff Bezos.) Think about knowing whythese books or CDs or movies sell–or, perhaps more important, don’t sell. Consider the following questions and answers:

  • Are historical novels too long for the “average” reader? We’ll come closer to knowing because metadata includes page and word counts.
  • Which book designs result in more conversions? Are there specific fonts that readers find more appealing than others?
  • Are certain keywords registering more with a niche group of readers? We’ll know because tools will allow us to perform content and sentiment analysis.
  • Which authors’ books resonate with which readers? Executives at companies like Amazon and Apple must be frothing at the mouth here.
  • Which customers considered buying a book but ultimately did not? Why did they opt not to click the buy button?

I could go on but you get my drift. Metadata and the semantic web collectively mean that no longer will we have to look at a single book sale as a discrete event. We’ll be able to know so much more about who buys what and why. Ditto cars, MP3s, jelly beans, DVDs, and just about any other product out there.

Simon Says

In the next ten years, we still may not be able to answer every commerce-related question–or any question in its entirety. However, a more semantic web means that a significant portion of the mystery behind the purchase will be revealed. Every day, we get a little closer to a better, more semantic web.


What say you?

Tags: ,
Category: Metadata, Semantic Web
1 Comment »

by: Phil Simon
01  Feb  2012

Understanding the Paradox of the Middle

Back when MySpace, AOL, and Yahoo! ruled the world, people online were not always who they appeared to be. Yes, the Internet was still shaking out, but these erstwhile titans did not exactly take pains to authenticate their users. The dot-com era rewarded eyeballs, clicks, and page views–not authenticity.

A New Era

Fast forward ten years. Those three companies are shells of their former selves. Screen names like TennisFan_69 have given way to real names at companies that understand the importance of validating user identifies. While forgeries are nearly possible to completely prevent, current tech bellwethers like Twitter, Google (via Plus), LinkedIn, and Facebook make great efforts to ensure that people are who they claim to be. (By extension, sites that use tools like Facebook Connect benefit from these authentication steps.)

The point is that millions of people can effectively manage their own identities, their own data, much better than a centralized entity or a customer service department.

This is one end of the spectrum: the democratization of data. As Clive Thompson writes in Wired, we’ve seen this era of increased transparency play out among our very eyes over the last five years, although sites like eBay and Amazon have long enabled this type of data self-service. Thanks to Google, it’s harder than ever to pretend that you’re someone else. Ask Scott Adams–or at least one of his pseudonyms.

Let’s switch gears.

Contrast the “all hands on deck approach to data management” with what many small business owners have to face. Their data tends to be extremely accurate because so few hands are touching it. While far from perfect, at least errors tend to be consistently made. Rarely in my experience are 20 different people at a company entering hours, invoices, or purchase orders in 20 different ways. Mistakes can typically be rectified in a relatively short period of time after someone understands what was done.

To sum, when millions of people touch the data, the result tends to be the same: reasonably good data.

The Middle

The problem for most organizations lies somewhere in between these two extremes. When 50 or 200 or 1,000 people touch the data, things often go awry (absent some type of data quality tool, culture of data governance, routine audits, and the like). Data is often, incomplete, inaccurate, dated, and/or duplicated.

Employees in big companies rarely make errors in consistent ways–and business rules of enterprise applications can only do so much. Yes, I can prevent someone from adding an employee with the same social number, but does the busy data entry clerk really care about data integrity when making minimum wage?

Adding to the mess is the fact that too often organizations fail to appropriately train employees. On-the-job training is, at least in my experience, sadly the norm.

Simon Says

You may allow vendors, customers, and even employees to manage their own information–or at least some of it. Of course, you can restrict access to editable data to only employees who have been properly trained and understand the consequences of their actions–and inactions.

Perhaps most important, however, understand that “the middle” represents a danger zone, a potential netherworld in which your data faces serious risk of being compromised.


What say you?


Tags: , , , ,
Category: Data Quality
No Comments »

by: Phil Simon
03  Jan  2012

Is 2012 the year?

Over the past two years, I have tried to dispense advice on this blog about intelligent information management, MDM, data quality, technology, and the like. Today, though, I’d like to ask a series of simple but vital questions about 2012.

Is this the year that your organization finally:

  • Decides to adopt data quality initiatives? Or, even better, tries to institutionalize it?
  • Attempts to make sense of its data?
  • Tries to consolidate multiple and disparate data sources?
  • Looks at mining its unstructured data for meaning?
  • Embraces semantic technologies?
  • Gets on board with MDM?
  • Retires legacy systems?

In all likelihood, this is not the year that your organization does all of the above. Perhaps it is already doing many of these things–and doing them well. Less likely, your organization has no need for MDM, data governance, etc.

Now Is the Time

Here’s the rub: data quality is not going to decline in importance. Nor is data governance. Unstructured data isn’t going away. The need to produce an accurate list of customers, vendors, and employees (and quickly, to boot) isn’t ephemeral. In fact, in each of these cases, 2012 and beyond will only intensify the need to do data right. Period.

No more excuses.

Open source software continues to make strides–as does the cloud. (And not just cute little apps that do this, that, or the other. I’m talking about enterprise-grade software like Scala.)

While employee time should not be underestimated in tackling these endeavors, a major source of resistance in the form of out-of-pocket expenditures for expensive, on-premise solutions is not less of a consideration.

So, we know that the costs have dropped. To make the case for significant IM improvements, I also contend that the benefits of data governance, MDM, et. al have never been higher. We continue to generate ungodly amounts of data–and different forms to boot. And look at the companies that manage their data exceptionally well. Do you think that Amazon, Apple, Facebook, and Google would be remotely as successful if they didn’t excel at IM?

Simon Says

If not now, then when? Next year? 2014? If your organization continues to struggle with basic data management, how much longer will it be around? Will it gradually erode into irrelevance? Will it be usurped by nimble startups or much larger companies?

I can’t think of a better time to start adopting intelligent IM practices–many of which are detailed on this very site.


What say you?


Tags: , , ,
Category: Data Quality, Enterprise Data Management, Information Development
No Comments »

by: Phil Simon
28  Dec  2011

Data Batting Averages

For many reasons, I have done a great deal of research about companies such as Amazon, Apple, Facebook, Twitter, and Google over the past year. You could say many things about these companies. First and foremost, they sport high data batting averages (DBAs). By DBAs, I mean that these companies’ records on their users, employees, customers, and vendors are exceptionally accurate.
Of course, this begs the question Why?

A few reasons come to mind. Let’s explore them.


First up, the Gang of Four allows users and customers to maintain their own information. Consider Amazon for a moment. Its customers make any and all changes to their credit card numbers, mailing addresses, and communication preferences. That’s a given. But Vendor Central, Seller Central, and Author Central each allow affected parties to submit pertinent updates as needed. So, let’s say that I want to sell a copy of The World is Flat by Thomas Friedman. No problem. No one from Amazon needs to approve it.


Self-service is all fine and dandy, but surely mistakes are made. No one bats 1.000, right? Honest errors aside, there are some unscrupulous folks out there. For instance, a clown recently claimed that he wrote The Age of the Platform—and submitted a separate and fake listing to Amazon, including the cover of my actual book.

(I’m actually honored. The same thing happened to Seth Godin.)

While Amazon didn’t catch this, the author (in this case, yours truly) did. After a bit of bouncing around, I emailed and, after a few days, Amazon removed the fraudulent listing. (One small step for man…) Evidently, the company’s systemic checks and balances aren’t foolproof. At least the company provides a mechanism to correct this oversight. The result: Amazon is today a tiny bit more accurate because I noticed this issue and took the time and effort to resolve it. Now, Amazon and I can make more money.

A Recognition of the Cardinal Importance of Accurate Data

The above example demonstrates that Amazon gets it: data matters. Fixable errors should be, well, fixed.

And soon.

Now, let’s turn to Facebook. The company takes steps to ensure that, to paraphrase the famous Dennis Green post-game rant, “you are who it thinks you are.” That is, Facebook is all about authenticity among its users. While it doesn’t ask for proof of identification upon joining, try singing up as Barrack Obama or Bruce Willis.

Go ahead. I’ll wait.

You see. You can’t do this—even if your name is Barrack or Bruce. Of course, there’s an appeals process, but those with celebrity names have to endure an additional step. Annoying to these namesakes? Perhaps, but in the end it prevents at least 50 apocryphal accounts for every one “false positive.”

And Facebook is hardly alone. Twitter does the same thing with verified accounts, a service that it introduced a while back, although I’m sure that there are at least tens of thousands of people on Twitter posing as other people.

Simon Says

The companies that manage data exceptionally well today aren’t complacent. Even “hitting” .990 means potentially tens or hundreds of thousands of errors. While perfection may never be attainable, the Facebooks and Googles of the world are constantly tweaking their systems and processes in the hope of getting better and better. This is an important lesson for every organization.


What say you?



Tags: , ,
Category: Information Development
No Comments »

Collapse Expand Close
TODAY: Tue, March 19, 2019
Collapse Expand Close
Recent Comments
Collapse Expand Close