Counting
When you do a count on a sharded collection, you may not get the results you expect. You may get quite a few more documents than actually exist.
The way a count works is the mongos forwards the count command to every shard in the cluster. Then, each shard does a count and sends its results back to the mongos, which totals them up and sends them to the user. If there is a migration occurring, many documents can be present (and thus counted) on more than one shard.
When MongoDB migrates a chunk, it starts copying it from one shard to another. It still routes all reads and writes to that chunk to the old shard, but it is gradually being populated on the other shard. Once the chunk has finished “moving,” it actually exists on both shards. As the final step, MongoDB updates the config servers and deletes the copy of the data from the original shard (see Figure 4-1).
Thus, when data is counted, it ends up getting counted twice. MongoDB may hack around this in the future, but for now, keep in mind that counts may overshoot the actual number of documents.
Unique Indexes
Suppose we were sharding on email and wanted to have a unique index on username. This is not possible to enforce with a cluster.
Let’s say we have two application servers processing users. One application server adds a new user document with the following fields:
{
"_id" : ObjectId("4d2a2e9f74de15b8306fe7d0"),
"username" : "andrew",
"email" : "awesome.guy@example.com"
}The only way to check that “andrew” is the only “andrew” in the cluster is to go through every username entry on every machine. Let’s say MongoDB goes through all the shards and no one else has an “andrew” username, so it’s just about to write the document on Shard 3 when the second appserver sends this document to be inserted:
{
"_id" : ObjectId("4d2a2f7c56d1bb09196fe7d0"),
"username" : "andrew",
"email" : "cool.guy@example.com"
}Once again, every shard checks that it has no users with username “andrew”. They still don’t because the first document hasn’t been written yet, so Shard 1 goes ahead and writes this document. Then Shard 3 finally gets around to writing the first document. Now there are two people with the same username!
The only way to guarantee no duplicates between shards in the general case is to lock down the entire cluster every time you do a write until the write has been confirmed successful. This is not performant for a system with a decent rate of writes.
Therefore, you cannot guarantee uniqueness on any key other than the shard key. You can guarantee uniqueness on the shard key because a given document can only go to one chunk, so it only has to be unique on that one shard, and it’ll be guaranteed unique in the whole cluster. You can also have a unique index that is prefixed by the shard key. For example, if we sharded the users collection on username, as above, but with the unique option, we could create a unique index on
{username : 1, email : 1}.One interesting consequence of this is that, unless you’re sharding on _id, you can create non-unique _ids. This isn’t recommended (and it can get you into trouble if chunks move), but it is possible.
Updating
Updates, by default, only update a single record. This means that they run into the same problem unique indexes do: there’s no good way of guaranteeing that something happens once across multiple shards. If you’re doing a single-document update, it must use the shard key in the criteria (update’s first argument). If you do not, you’ll get an error.
> db.adminCommand({shardCollection : "test.x", key : {"y" : 1}})
{ "shardedCollection" : "test.x", "ok" : 1 }
>
> // works okay
> db.x.update({y : 1}, {$set : {z : 2}}, true)
>
> // error
> db.x.update({z : 2}, {$set : {w : 4}})
can't do non-multi update with query that doesn't have the shard keyYou can do a multiupdate using any criteria you want.
> db.x.update({z : 2}, {$set : {w : 4}}, false, true)
> // no errorIf you run across an odd error message, consider whether the operation you’re trying to perform would have to atomically look at the entire cluster. Such operations are not allowed.
Create a MongoDB cluster that will to grow to meet the needs of your application. With this short and concise ebook, you'll get guidelines for setting up and using clusters to store a large volume of data, and learn how to access the data efficiently. In the process, you'll understand how to make your application work with a distributed database system.




Help






