What is Big Data?

February 22, 2012

You may have seen my recent Video Index interview talking about online video and Big Data. As a data scientist, I’m thrilled that Big Data is in the news lately. It got the spotlight treatment in last week's Sunday New York Times, which looked at America's need for 1.5 million data-literate managers and 140,000 to 190,000 more workers with deep analytical expertise. 

Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.
Like currency or gold? That likely perked the ears of some execs. But what is Big Data?
A meme and a marketing term, for sure, but also shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions. There is a lot more data, all the time, growing at 50 percent a year, or more than doubling every two years, estimates IDC, a technology research firm.
Big Data is a real thing -- not just a meme. It has two defining characteristics:
  1. It doesn't fit on one computer.
  2. It can't be consumed by a single person.
The first point explains the technological change that is occurring in the Silicon Valley. New systems, such as Hadoop and Twitter's Storm, are popping up to process a deluge of information. Meanwhile, new classes of schema-less distributed data stores help bring structure to the huge data sets. All of these systems are distributed across a computing cluster. Each node in the cluster is an individual computer that can store and process data. By breaking out of a single box, the Big Data set can be scaled by simply adding more nodes to its cluster.
 
The second point helps explain the way companies are changing to take advantage of this vast data set. Small Data is a few distilled facts; it is the report one hands to his or her boss -- in essence, the executive summary and the bottom line. In contrast, Big Data cannot be reduced to a single number. Since we’re dealing with so many numbers and varying uses for these figures, Big Data requires structural change so that employees and customers can access such information autonomously, to answer their questions quickly and easily. Big Data empowers decision-making at every level of an enterprise, accessed through dashboards, visualizations and search engines.
 
Bottom line: Big Data is an ocean. You don't consume it, you swim in it.
 
Every part of an enterprise should have access to Big Data. To make sure people don’t drown in this ocean, companies need teams of Big Data engineers and analysts -- lifeguards, so to speak -- to facilitate the connections between employees, customers and the ever-growing data set. As the resident data scientist at Ooyala, I’m the catalyst for this connection. We have a multifaceted approach where publishers can pull relevant analytics to gain insight into the performance of their content. This approach liberates data and empowers individual employees and customers.
 
The New York Times article mentions a study of 179 large companies and found those “that adopted ‘data-driven decision making’ achieved productivity gains that were 5 percent to 6 percent higher than other factors could explain.” This gain should be even more pronounced for companies that focus heavily on analytics. 
 
Big Data can help companies make more money, but is it cool?
“The culture has changed,” says Andrew Gelman, a statistician and political scientist at Columbia University. “There is this idea that numbers and statistics are interesting and fun. It’s cool now.”
We can dream, Professor Gelman, we can only dream.
 
Dr. Matt Pasienski is the data scientist at Ooyala. A version of this post originally appeared on his blog Tin Marine.

Leave a Comment