09 Jan 2012
A few years ago and while its stock was still sky-high, Netflix ran an innovative contest with the intent of improving its movie recommendation algorithm. Ultimately, a small team figured out a way for the company to significantly increase the accuracy with which it gently suggests movies to its customers.
It turns out that these types of data analysis and improvement contests are starting to catch on. Indeed, with the rise of Big Data, cloud computing, open source software, and collaborative commerce, it has never been easier to outsource these “data science projects.”
From a recent BusinessWeek article:
In April 2010, Anthony Goldbloom, an Australian economist, [f]ounded a company called Kaggle to help businesses of any size run Netflix-style competitions. The customer supplies a data set, tells Kaggle the question it wants answered, and decides how much prize money it’s willing to put up. Kaggle shapes these inputs into a contest for the data-crunching hordes. To date, about 25,000 people—including thousands of PhDs—have flocked to Kaggle to compete in dozens of contests backed by Ford (F), Deloitte, Microsoft (MSFT), and other companies. The interest convinced investors, including PayPal co-founder Max Levchin, Google Chief Economist Hal Varian, and Web 2.0 kingpin Yuri Milner, to put $11 million into the company in November.
The potential for these types of projects is hard to overstate. Ditto the benefits.
Think about it. Organizations can publish even extremely large data sets online for the world at large. Interested groups, companies, and even individuals can use powerful tools such as Hadoop to analyze the information and provide recommendations. In the process, these insights can lead to developing new products and services and dramatic enhancements in existing businesses process (see Netflix).
Of course, these organizations will have to offer some type of prize or incentive. Building a better mousetrap may be exciting, but don’t expect too many people to volunteer their time without the expectation of significant reward. Remember that, of the millions of people who visit Wikipedia every day, only a very small percentage of them actually does any editing. If Wikipedia (a non-profit) offered actual remuneration, that number would be significantly higher (although the quality of its edits would probably suffer).
Consider the following examples:
- A pharmaceutical company has a raft of data on a new and potentially promising new drug.
- A manufacturing company has years of historical data on its defects.
- A retailer is trying to understand its customer churn but can’t seem to get its arms around its data.
I could go on, but you get my drift.
While there will always be the need for proprietary data and attendant analysis, we may be entering an era of data democratization. Open Data is here to stay and I can certainly see the growth of marketplaces and companies like Kaggle that match data analysis firms with companies in need of that very type of expertise.
Of course, this need has always existed, but unprecedented power of contemporary tools, technologies, methodologies, and data mean that outsourced analysis and contests have never been easier. No longer do you have to look down the hall, call IT, or call in a Big Four consulting firm to understand your data–and learn from it.
What say you?