Sharding is the method MongoDB uses to split a large collection across several servers (called a cluster). While sharding has roots in relational database partitioning, it is (like most aspects of MongoDB) very different.
The biggest difference between any partitioning schemes you’ve probably used and MongoDB is that MongoDB does almost everything automatically. Once you tell MongoDB to distribute data, it will take care of keeping your data balanced between servers. You have to tell MongoDB to add new servers to the cluster, but once you do, MongoDB takes care of making sure that they get an even amount of the data, too.
Sharding is designed to fulfill three simple goals:
Make the cluster “invisible.”
To accomplish this, MongoDB comes with a special routing process called mongos. mongos sits in front of your cluster and looks like an ordinary mongod server to anything that connects to it. It forwards requests to the correct server or servers in the cluster, then assembles their responses and sends them back to the client. This makes it so that, in general, a client does not need to know that they’re talking to a cluster rather than a single server.
Make the cluster always available for reads and writes.
MongoDB ensures maximum uptime in a couple different ways. Every part of a cluster can and should have at least some redundant processes running on other machines (optimally in other data centers) so that if one process/machine/data center goes down, the other ones can immediately (and automatically) pick up the slack and keep going.
There is also the question of what to do when data is being migrated from one machine to another, which is actually a very interesting and difficult problem: how do you provide continuous and consistent access to data while it’s in transit? We’ve come up with some clever solutions to this, but it’s a bit beyond the scope of this book. However, under the covers, MongoDB is doing some pretty nifty tricks.
Let the cluster grow easily
MongoDB allows you to add as much capacity as you need as you need it.
These goals have some consequences: a cluster should be easy to use (as easy to use as a single node) and easy to administrate (otherwise adding a new shard would not be easy). MongoDB lets your application grow—easily, robustly, and naturally—as far as it needs to.
Learn more about this topic from Scaling MongoDB.
Create a MongoDB cluster that will to grow to meet the needs of your application. With this short and concise ebook, you'll get guidelines for setting up and using clusters to store a large volume of data, and learn how to access the data efficiently. In the process, you'll understand how to make your application work with a distributed database system.