Mike Tyson, Big Data, and the Perils of the Project Mentality

Everybody has a plan until they get punched in the face.

–Mike Tyson

Who should do what on a Big Data project?

It seems like a logical and even necessary question, right? After all, Big Data is a big deal, and requires assistance from each line of business, the top brass, and IT, right?

Defining Roles

Matt Ariker, Tim McGuire, and Jesko Perry recently wrote a HBR post attempting to answer this question. In Five Roles You Need on Your Big Data Team, the three advocate five “important roles to staff your advanced analytics bureau”:

  1. Data Hygienists
  2. Data Explorers
  3. Business Solution Architects
  4. Data Scientists
  5. Campaign Experts

To be sure, everyone can’t and shouldn’t do everything in an era of Big Data. I can’t tell you for certain that bifurcating roles like the authors recommend won’t work. Still, I just don’t buy the argument that Big Data lends itself to everything fitting neatly in to traditional roles.

Take data quality, for instance. As Jim Harris writes:

The quality of the data in the warehouse determines whether it’s considered a trusted source, but it faces a paradox similar to “which came first, the chicken or the egg?” Except for the data warehouse it’s “which comes first, delivery or quality?” However, since users can’t complain about the quality of data that hasn’t been delivered yet, delivery always comes first in data warehousing.

Agreed. Traditional data warehousing projects could be thought of in a more linear fashion. In most cases, organizations were attempting to aggregate–and report on–their data (read: data internal to the enterprise). Once that source was added, maintenance was fairly routine, at least compared to today’s datasets. These projects tended to be more predictable.

But what happens when much if not most relevant data stems from outside of the enterprise? What do we do when new data sources start popping up faster than ever? Mike Tyson’s quote at the top of this post has never been more apropos.

Simon Says: Big Data Is Not Predictable

My point is that IT projects have start and end dates. Amazon, Apple, Facebook, Twitter, Google, and other successful companies don’t view Big Data as “IT projects.” This is a potentially lethal mistake. For its part, Netflix views both Big Data and data visualization as ongoing processes; they are never finished. I make the same point in my last book.

When you starting thinking of Big Data as an initiative or project with traditionally defined roles, you’re on the road to failure. Don’t make “data hygenics” or “data exploring” the sole purview of a group, department, or individual. Encourage others to step out of the comfort zones, notice things, test hypotheses, and act upon them.

Feedback

What say you?

Category: Data Quality, Information Management
1 Comment »