Jump to content
The State of NoSQL
Submitted by AdrienLamothe
Posted Oct 04 2011 12:39 PM
"NoSQL" databases have become the hot topic in the Web 2.0 and Cloud worlds. On a rainy Monday evening in San Francisco, ~150 highly experienced and talented technologists gathered to discuss the topic at NoSQLCamp. Besides the exceptional crowd and speakers, the event stood out for the venue at Microsoft's offices and because it was held during Oracle's "OpenWorld" convention just a block away. The event was organized by Dave Nielsen, who also produces events like CloudCamp and Social App Workshop. Dave offers free public events to showcase his workshop facilitation, which leads to private gigs for companies. I attended Dave's Social App Workshop last year and was looking forward to NoSQLCamp. I've yet to use a NoSQL database in any of my projects, but deal with clients who either currently do or are planning to. NoSQLCamp would serve as my executive overview of the technology and it didn't disappoint.
After an hour of socializing, the event kicked off with a series of lightning talks lasting 5 minutes each. The speakers did an outstanding job of getting their messages across in such a short time. The talks were:
- Manish Pandit, IGN, "MongoDB at IGN - Architecture, Deployment and Administration"
- J. Chris Anderson, CouchBase architectural overview
- Darren Wood, InfiniteGraph, "Intro to Graph Databases"
- Srini Srinivasan, Citrusleaf, "The Velocity of Data"
- Andy Twigg, Acunu, "Big data and cheap SSD"
- Edward M. Goldberg, MyCloudWatcher, Cancelled his appearance
- Bruno Terkaly, Microsoft, Azure Blobs and Tables
Manish Pandit started things off by giving a nice overview of how IGN uses NoSQL. Chris Anderson's talk on CouchBase Mobile was very interesting. He gave an example of how medical clinics in countries lacking strong network infrastucture use Couch to keep local data storage, then sync up with the main computers when connectivity is available. He also stated that "ground computing" (aka mobile computing) is now a $1.1 trillion dollar (and growing) market when including all devices, services and infrastructure. Darren Wood of InfiniteGraph discussed the need for graphing databases for certain types of applications. Srini Srinivasan talked about his company's real-time NoSQL database CitrusLeaf. CitrusLeaf boasts very impressive performance metrics; it is used in real-time online advertising. Andy Twigg of Acunu talked about his company's storage core for Cassandra and how it eliminates some severe problems involved with data writes to hard drives and SSD. Bruno Terkaly of Microsoft finished off with an overview of Azure. He mentioned that Azure stores new posts captured from Facebook and Twitter, making the data available to the Bing search engine within 15 seconds of posting.
Following the lightning talks, Dave asked the audience for any questions and asked if any experts were willing to answer them. The experts were then seated in front, forming what Dave calls an "UnPanel". UnPanel members had 1 minute to answer each question, and did exceptional jobs doing so.
The final phase of NoSQLCamp was several breakout sessions lasting 30 minutes each. The audience was asked to decide how long the sessions should last, with choices of 1 hour, 45 minutes, 30 minutes or 15 minutes.
My first breakout session was a demonstration of Tungsten Replicator by Robert Hodges of Continuent. Tungsten Replicator is an open-source data replication tool for MySQL and Postgres, that can use MongoDB and Oracle as slaves. Hadoop functionality is planned for next year. Tungsten Replicator is a work in progress but apparently does some things quite well. I may use it to migrate data from MySQL to MongoDB.
My second session was a talk about the challenges of building a web service on Cassandra, by John Schneider of CloudTalk, with assistance from an engineer named Ram. The talk was co-presented by CloudTalk and DataStax. The software CloudTalk has built is quite ambitious. It includes a DDL and DML and a comprehensive transaction log that is useful in troubleshooting. They are considering open-sourcing their new system. One audience member talked about experiencing major problems with his Cassandra based system, it seems his engineers keep extending the schema in ways that bog down the system and he hasn't been able to figure out how to locate the offending data; perhaps this system can help him.
There were some notable people in attendance who didn't give talks. Neo Technology's Emil Eifrem was present with one of his engineers, Andreas Kollegger (who helped support the event.) Neo Technology produces Neo4j, a mature graphing database built in Java. The company recently received $10.6 million in series A funding.
I came away from the event with a good overview of the state of NoSQL databases. Some observations:
- The NoSQL space is in an early stage, with much room for growth.
- Most of the NoSQL databases and products are still works in progress.
- NoSQL databases are SUPER fast.
- A number of presenters and attendees bemoaned the lack of a standard query language for NoSQL databases.
- You need to understand the capabilities and trade-offs of each NoSQL database and choose one that meets your application's requirements.
- You may not find a perfect fit of capability and requirements.
- You need to evaluate the total cost of ownership (TCO) for your NoSQL infrastructure.
- Some NoSQL databases require a different administration and management strategy than relational databases.
- Some older, well-established companies have entered the NoSQL space and enjoy an advantage due to existing relationships with their customers.
It was surprising no one talked about data security issues with NoSQL databases. This is an important issue and I suspect the difficulty of it tends to steer people away from the topic.
I've been consulting in the Ruby on Rails space. My experience to date is that most of the funded Rails SAAS startups adhere to cookie-cutter development methodologies and component stacks and are quite averse to considering alternatives. In particular, it seems most Rails based startups reflexively think of MongoDB when considering a NoSQL database. Such "monkey see, monkey do" behavior is usually attributable to a notable successful company using the technology. Startups also tend to want to fill in all the check boxes on the buzz word compliance list when seeking venture capital and for whatever reason certain technologies become vogue and "must have." My experience attending NoSQLCamp confirmed that you should thoroughly explore alternatives when deciding which NoSQL solution to use.