Posts Tagged ‘facebook’
In late December of 2013, Google Chairman Eric Schmidt admitted that ignoring social networking had been a big mistake. ”I guess, in our defense, we were busy working on many other things, but we should have been in that area and I take responsibility for that,” he said.
Brass tacks: Google’s misstep and Facebook’s opportunistic land grab of social media have resulted in a striking data chasm between the two behemoths. As a result, Facebook can do something that Google just can’t.
To his credit, Mark Zuckerberg has not been complacent with this lead. This is an age of ephemera. He is building upon his company’s lead in social data. Case in point: the launch of Graph Search.
The rationale here is pretty straightforward: Why let Google catch up? With Graph Search, Facebook users can determine which of their friends have gone to a Mexican restaurant in the last six months in San Francisco. What about which friends like the Rolling Stones or The Beatles? (Need to resell a ticket? Why use StubHub here? Maybe Facebook gets a cut of the transaction?) These are questions and problems that Google can’t address but Facebook can.
All good for Zuck et. al, right? Not really. It turns out that delivering relevant social data in a timely manner is proving remarkably elusive, even for the smart cookies at Facebook.
The New News Feeds
As Wired reported in May of 2013, Facebook “redesigned its News Feed with bolder images and special sections for friends, photos, and music, saying the activity stream will become more like a ‘personalized newspaper’ that fits better with people’s mobile lifestyles.” Of course, many users didn’t like the move, but that’s par for the course these days. You’re never going to make 1.2 billion users happy.
But Facebook quickly realized that it didn’t get the relaunch of News Feed right. Not even close. Just a few weeks before Schimdt’s revealing quote, Business Insider reported that Facebook was at it again, making major tweaks to its feed and then halting its new launch. This problem has no simple solution.
Simon Says: Big Data Is a Full-Time Job
Big Data is no picnic. “Managing” it isn’t easy, even for billion-dollar companies such as Facebook. The days of “set it and forget it” have long passed. Organizations need to be constantly monitoring the effectiveness of their data-driven products and services, to say nothing of testing for security issues. (Can someone say Target?)
What say you?
“One man’s data is another man’s metadata”
As I pen these words, the PRISM scandal continues to unfold. The questions raised by the National Security Administration’s formerly furtive program strike at the very heart of a free society.
The fallout will continue for months, if not years. Maybe it will spark a deeper conversation about data ownership. Perhaps more people will echo the words of Jim Harris, who wrote recently on this site:
The Metadata Cop-Out?
I for one noticed something interesting buried in many of the non-denial denials, the carefully scripted and lawyer-approved statements from Microsoft, Apple, Yahoo!, Microsoft, Facebook, and others. Many press releases claimed (truthfully, for all I know) that these companies didn’t provide data per se to the NSA. Rather, they provided metadata. In other words, Yahoo! didn’t give up the actual contents of any single email, just things like:
- the sender’s email address
- the receiver’s email address
- the subject of the email
- the time and date that the email was sent
So, what is this distinction between data and metadata? And does it ultimately matter?
I discussed this very subject recently with my friend Melinda Thielbar, a real-life statistician and data scientist. She agreed with me that the distinction is becoming “essentially meaningless.” Equipped with enough of (the right) metadata, one can more or less figure out what’s going on–or at least identify potentially suspicious communications among persons of interest.
The quote at the beginning of this post is as true as its ever been. In a world of Big Data, metadata is increasingly important. It’s not just the video, picture, blog post, email, or customer record that matters. The data about or “behind” the data can be just as critical.
Is your organization paying attention to its metadata?
“It’s all about bucks, kid. The rest is conversation.”
–Michael Douglass as Gordon Gekko, Wall Street (1987)
Sporting more than 60 million users, Evernote is one of the most popular productivity apps out there these days. You may in fact use the app to store audio notes, video, pics, websites, and perform a whole host of other tasks.
For the last few years, it’s become very difficult for IT to police who brings personal devices into the enterprise (never mind what people do with them). If you’re reading this site, you’ve probably heard of BYOD, a trend that will surely continue. Google Glass and its ilk are coming soon. These devices pose additional risks for IT departments determined to prevent data theft, security breaches, and industrial espionage.
But what about bringing your own software? Is this a nascent trend about which IT has to worry?
Yammer: A Case Study
True enterprise collaboration software has existed for a quite some time. More than a decade ago, we began to hear of corporate intranets and knowledge bases. One of the first proper collaboration applications: Microsoft SharePoint. Such promise!
Lamentably, employees did not consistently use these tools throughout the enterprise. For a host of reasons, many organizations continued to rely upon email as the killer collaboration app. Generally speaking, this was a mistake. Email doesn’t lend itself to true collaboration, and some CEOs even banned email.
Faced with the need to share information in a more efficient manner, many people began collaborating on the worst possible place: Facebook. From a recent study:
Facebook is a collaboration platform twice as popular as SharePoint — 74% to 39%. It’s also four times more popular than IBM Open Connections (17%) and six times more popular than Salesforce’s Chatter (12%).
The study, of 4,000 users and 1,000 business and IT decision-makers in 22 countries, also said 77% of managers and 68% of users say they now use some form of enterprise social networking technology. IT decision makers said such social technologies make their jobs more enjoyable (66%), more productive (62%) and “help them get work done faster” (57%). All in all, said Avanade, of those businesses currently using social collaboration tools, 82% want to use more of them in the future.
The governance, security, and privacy issues posed by using the world’s largest social network as an enterprise collaboration tool are hard to overstate. Yet, many experienced IT professionals became fed up with clunky top-down collaboration tools like SharePoint.
Consider the recent success of Yammer, a Freemium-based and organic alternative to top-down tools like SharePoint. Yammer became so popular precisely because it was organic. That is, employees could download the application and kick its tires. IT did not need to deploy or bless it. Organizations could date before they got married. If they wanted to expand its use–or unlock key functionality, they could pay for Yammer. And that’s exactly what many organizations did. To its credit, about a year ago, Microsoft purchased Yammer for more than $1 billion in cash.
CXOs should think very carefully about whether their current applications enable their organizations and employees to be productive. These days, it doesn’t take a computer scientist to circumvent restrictive corporate IT policies. Yammer is not unique. In other words, if it can go viral within an organization, other applications can.
What say you?
By now, the cat is out of the bag on Facebook’s Graph Search. I have no specific knowledge of the specific applications behind it. I haven’t talked to Zuck lately, and even the King of Facebook has admitted that the product is in very, very early beta. Translation: it’s coming, but not anytime soon.
What We Know–and What We Don’t
We know that Facebook uses MySQL to power at least part of its business, but we don’t know when Graph Search will be available to the masses. That doesn’t mean that we can’t have some fun examining Facebook’s forthcoming Big Data tool. We can, however, certainly make some intelligent guesses about what’s going on under the hood of its nascent product.
Traditional SQL/Relational Database
If this were 1995, such a product would have probably have run via very complex SQL statements. (I know because I used to write them back then.) In our attention-challenged world, there’s just no way that tens of millions of concurrent and complex queries could work efficiently with traditional SQL statements.
Imagine a SELECT statement on a table with billions of records and 20 conditions. Tell me all of my friends who have visited a Chinese restaurant in Manhattan in the last six months who also like Modern Family…
With “normal” SQL, this query would take hours or days and, I’d bet, crash more often than not.
Odds: Definitely not.
A very good possibility. NoSQL tools like Hadoop, Cassandra, and others have been shown to produce remarkably fast results on massive datasets. Because of fault tolerance and parallel processing, the odds of crashing are very low (if configured correctly with sufficiently powerful hardware, of course). Remember that NoSQL means “not only SQL”, not “does not use SQL.” This is a common misconception.
NewSQL is a fairly obscure but emerging technology that theoretically takes the old standby to the next level. I have heard very positive things about NewSQL, but its lack of adoption makes me wonder whether Facebook would bet the farm on something fairly immature.
A Hybrid or Something Else
Perhaps some of the engineers at Facebook have mixed and matched from the emerging technologies described above. Maybe they’ve taken a page from Google and developed an analog to BigTable or BigQuery and fused it with something else.
Big Data and related solutions are here. 2013 may well turn out to be the year of Big Data. Regardless of the specific solution implemented, it’s important to realize that old standbys are most likely not going to produce meaningful results on enormous datasets.
What is your organization doing with Big Data?
There’s very little doubt that we are generating more and more data every day. How much? Well, let me use images rather than words.
The following infographic comes from Fliptop:
Click for the originally sized image.
In a word, wow.
While we know that there’s a boatload of data out there, we are much less certain about who owns this data. That’s why this TechCrunch article on the resurrection of now-defunct content sharing site Digg is so interesting.
To make a long story short, Digg on August 31st of 2012 “launched the Digg Archive, a tool to help users of the old Digg (before July 2012) retrieve a history of their Diggs, Submissions, Saved Articles, and Comments.” Digg had help from Kippt and Pinboard. From the post:
We believe that people own the data they create, so while we work to determine if and how this data makes its way into the new Digg, we wanted to provide a way for users to access their history. It took some digging through the old infrastructure, but the complete Digg Archive is now live.
To be sure, Digg is hardly the only company or organization to feel this way. Google’s Data Liberation Project also comes to mind. In short, the DLP enables users to move their data in and out of Google products.
It’s hard to discuss data ownership today without addressing the corresponding issues of privacy and security–and this leads us right to the elephant in the room: Facebook. Facebook doesn’t exactly make it easy for users to remove their data from the site.
Surprised? Don’t be. As I write in The Age of the Platform, companies like Amazon, Apple, Facebook, and Google try to make their platforms as sticky as possible. And I can think of few stickers ploys than “locking” user data into a site. For that very reason, many people won’t move off of Microsoft Outlook into a cloud-based alternative. There’s no native “transfer to Gmail” button in any version of Outlook I’ve seen. And, again, this is by design.
Well, the forces of data freedom aren’t standing still. Computational knowledge engine Wolfram Alpha is at least letting users see the specific data Facebook keeps on them. Note that Facebook can at any point turn off the API that Wolfram Alpha uses to pull this information. Translation: the forces of data liberation are not without their obstacles and opponents.
Stay tuned. The Data Liberation Wars are just heating up. I firmly expect Twitter, Google, Amazon, and other platform-based companies to face this growing issue in the near- and long-term.
What say you?
At least to me, sometimes Big Data often seems like a bit of an amorphous term. Just what exactly is it, anyway?
Consider the following statistics from the Gang of Four:
- Amazon: Excluding books, the company sells 160 million products on its website. Target sells merely 500,000. Amazon’s reported to have credit cards on file for 300 million customers. 300 million. For more Amazon stats, click here.
- Apple: The company a few months ago passed 25 billion app downloads.
- Facebook: 954 million registered users share more than one billion pieces of content every day.
- Google: As of two years ago, Google handled 34,000 searches per second.
These numbers are nothing less than mind-blowing. While Facebook’s rate of growth seems to be waning, make no mistake: it’s still growing. (Deceleration of growth shouldn’t be confused with deceleration.)
While we’re at it, let’s look at some of Twitter’s numbers. On the company’s five-year anniversary, the company posted the following numbers:
- 3 years, 2 months and 1 day. The time it took from the first tweet to the billionth tweet.
- 1 week. The time it now takes for users to send a billion tweets.
- 50 million. The average number of tweets people sent per day, one year ago.
- 140 million. The average number of tweets people sent per day, in the last month.
- Oddly, 80 percent of all tweets involve Charlie Sheen.
OK, I’m making the last one up, but you get my drift.
A few things strike me about these numbers. First, this is a staggering amount of data. Second, all of this data is kept somewhere. To varying extents, these companies and others are turning data into information and, ultimately knowledge.
What they do with that knowledge varies, but no one can doubt the potential of so much data–even if much of it is noise. Another issue: will people continue to use ad-supported platforms? Will we become sick of having our data sold to the highest bigger? Or, will private, ad-free platforms like app.net flourish?
Even if the latter is true, those private platforms will still be generating data. So, in a way, the explosion of data does not hinge upon the continued growth of open or “somewhat-open” platforms
If you think that consumers are going to be generating and using less data in the upcoming years, you’re living in an alternate reality. Take steps now to ensure that your organization has the software, hardware, and human capabilities to handle vastly increasing amounts of data.
What say you?
Facebook has fallen on hard financial times lately. That 100-billion-dollar valuation now seems wildly optimistic, put mildly. To be sure, Facebook is hardly the only company to slide after its IPO–and it won’t be the last. But is there something more dangerous going on at the world’s largest social network–something that may threaten its very existence?
The Data Problem?
As James Ball writes in in The Guardian about the larger problem facing Mark Zuckerberg’s company:
While Google’s revenues are growing – not a bad feat in the current economy – the huge amounts of extra data it’s accumulating aren’t improving its actual ads: the money the company gets for each advert is actually falling. If more data doesn’t make these companies more cash, the rationale falls away. Google’s adverts make it a huge amount of money, and will continue to do so, but there’s no evidence that more user data is making those adverts more effective at generating profit than they already were.
A large chunk of Facebook’s business model is based on the “more data is better than less data” assumption. In theory, this will bring in advertisers and, ultimately, profits. (Parenthetically, this is precisely why Facebook scares the hell out of Google.)
But The Guardian piece begs the following important and fundamental questions about the value of data:
- Does data eventually reach a point of diminishing returns?
- Is more data always better than less data?
We’ll probably find out over the next few quarters or years if Facebook is able to monetize what is perhaps the largest trove of data in the history of the world. What’s more, if Facebook can’t do it, will any organization be able to make sense–and, more important, money–from vast amounts of user-generated information?
Your Organization is Not Facebook
Now, don’t for one minute dismiss the need for (and value of) Big Data, sentiment analysis, semantic technologies, and other modern data management techniques. Facebook’s struggles hardly prove that there is no legitimate value to be gleaned from such things. Let’s say, for the sake of argument, that Facebook can’t justify such a lofty valuation (now or in the future). This in no way means that its data is worthless. It just might not be worth as much as some think. In other words, the argument here centers around how much that data is worth–not whether the data is worth anything.
It’s my firm belief that the vast majority of organizations need to both manage their existing data better. I can think of few that wouldn’t benefit from increasing the types and amount of data they manage. If there is such a thing as diminishing returns to the value of data, Facebook is much, much closer to realizing it than your organization is. Even if Facebook’s stock plummets to zero (and Larry Page bought drinks for everyone in Silicon Valley), it behooves organizations to embrace the “more is better” data theory. Structured, unstructured, and semi-structured data are extremely valuable assets, not liabilities.
Much like any company, the big question for Facebook is not what kind of data it has. Rather, it’s “What can it do with that information?”
What say you?
Back when MySpace, AOL, and Yahoo! ruled the world, people online were not always who they appeared to be. Yes, the Internet was still shaking out, but these erstwhile titans did not exactly take pains to authenticate their users. The dot-com era rewarded eyeballs, clicks, and page views–not authenticity.
A New Era
Fast forward ten years. Those three companies are shells of their former selves. Screen names like TennisFan_69 have given way to real names at companies that understand the importance of validating user identifies. While forgeries are nearly possible to completely prevent, current tech bellwethers like Twitter, Google (via Plus), LinkedIn, and Facebook make great efforts to ensure that people are who they claim to be. (By extension, sites that use tools like Facebook Connect benefit from these authentication steps.)
The point is that millions of people can effectively manage their own identities, their own data, much better than a centralized entity or a customer service department.
This is one end of the spectrum: the democratization of data. As Clive Thompson writes in Wired, we’ve seen this era of increased transparency play out among our very eyes over the last five years, although sites like eBay and Amazon have long enabled this type of data self-service. Thanks to Google, it’s harder than ever to pretend that you’re someone else. Ask Scott Adams–or at least one of his pseudonyms.
Let’s switch gears.
Contrast the “all hands on deck approach to data management” with what many small business owners have to face. Their data tends to be extremely accurate because so few hands are touching it. While far from perfect, at least errors tend to be consistently made. Rarely in my experience are 20 different people at a company entering hours, invoices, or purchase orders in 20 different ways. Mistakes can typically be rectified in a relatively short period of time after someone understands what was done.
To sum, when millions of people touch the data, the result tends to be the same: reasonably good data.
The problem for most organizations lies somewhere in between these two extremes. When 50 or 200 or 1,000 people touch the data, things often go awry (absent some type of data quality tool, culture of data governance, routine audits, and the like). Data is often, incomplete, inaccurate, dated, and/or duplicated.
Employees in big companies rarely make errors in consistent ways–and business rules of enterprise applications can only do so much. Yes, I can prevent someone from adding an employee with the same social number, but does the busy data entry clerk really care about data integrity when making minimum wage?
Adding to the mess is the fact that too often organizations fail to appropriately train employees. On-the-job training is, at least in my experience, sadly the norm.
You may allow vendors, customers, and even employees to manage their own information–or at least some of it. Of course, you can restrict access to editable data to only employees who have been properly trained and understand the consequences of their actions–and inactions.
Perhaps most important, however, understand that “the middle” represents a danger zone, a potential netherworld in which your data faces serious risk of being compromised.
What say you?
Over the past two years, I have tried to dispense advice on this blog about intelligent information management, MDM, data quality, technology, and the like. Today, though, I’d like to ask a series of simple but vital questions about 2012.
Is this the year that your organization finally:
- Decides to adopt data quality initiatives? Or, even better, tries to institutionalize it?
- Attempts to make sense of its data?
- Tries to consolidate multiple and disparate data sources?
- Looks at mining its unstructured data for meaning?
- Embraces semantic technologies?
- Gets on board with MDM?
- Retires legacy systems?
In all likelihood, this is not the year that your organization does all of the above. Perhaps it is already doing many of these things–and doing them well. Less likely, your organization has no need for MDM, data governance, etc.
Now Is the Time
Here’s the rub: data quality is not going to decline in importance. Nor is data governance. Unstructured data isn’t going away. The need to produce an accurate list of customers, vendors, and employees (and quickly, to boot) isn’t ephemeral. In fact, in each of these cases, 2012 and beyond will only intensify the need to do data right. Period.
No more excuses.
Open source software continues to make strides–as does the cloud. (And not just cute little apps that do this, that, or the other. I’m talking about enterprise-grade software like Scala.)
While employee time should not be underestimated in tackling these endeavors, a major source of resistance in the form of out-of-pocket expenditures for expensive, on-premise solutions is not less of a consideration.
So, we know that the costs have dropped. To make the case for significant IM improvements, I also contend that the benefits of data governance, MDM, et. al have never been higher. We continue to generate ungodly amounts of data–and different forms to boot. And look at the companies that manage their data exceptionally well. Do you think that Amazon, Apple, Facebook, and Google would be remotely as successful if they didn’t excel at IM?
If not now, then when? Next year? 2014? If your organization continues to struggle with basic data management, how much longer will it be around? Will it gradually erode into irrelevance? Will it be usurped by nimble startups or much larger companies?
I can’t think of a better time to start adopting intelligent IM practices–many of which are detailed on this very site.
What say you?
TODAY: Tue, March 28, 2017March2017