A New High-Performance and Scalable Datastore

May 18, 2010

Maximizing online video viewership and online ad revenue requires insight into users' online video behavior. To that end, Ooyala collects and processes terabytes of logs data per month. Our analytics data storage is growing at over 5x the rate of the storage of video content libraries.

As the tech lead of the analytics and monetization team, I've been leading the effort in the past few months to build a new storage and serving platform on top of Cassandra -- a fault-tolerant, scalable, and high-performance distributed data store. Cassandra uses a decentralized architecture with no single points of failure and can easily be scaled up with the addition of new nodes. Combined with its automatic replication, horizontal partitioning, and flexible data model, Cassandra makes for an ideal data store layer for large data sets.

Today, we're proud to announce that we've deployed Cassandra as the foundation for our new analytics platform. Cassandra will provide the tools and infrastructure we need to quickly innovate in processing real-time viewership data, slicing analytics data into finer-grained buckets, and powering new monetization frameworks.

Cassandra has been vetted by a number of other technology companies including Facebook, Twitter, Digg, Reddit, Rackspace, and Cloudkick, and has proven itself to be a powerful platform for storing and serving terabytes of data.

We owe a big thanks to Cassandra's open-source community, which has been very active at both building and supporting the platform.

More details are available in the formal announcement and white paper.