This topic describes knowledge about MongoDB sharding.

Scenarios of sharded clusters

You can use a sharded cluster in the following scenarios:
  • The storage capacity of a single machine is restricted.
  • The read and write capabilities of a single machine are restricted because of the CPU, memory, or network interface controller (NIC) performance bottleneck.

Number of required shards and mongos nodes

If you use a sharded cluster, you can determine the number of required shards and mongos nodes based on your business needs.

  • You may want to use MongoDB sharding to store a large amount of data that is less frequently accessed. Assume that the storage capacity of each shard is M, the storage threshold of each shard is 75%, and the total storage capacity required is N. If the data is less frequently accessed, two or more mongos nodes are enough for high availability. You can calculate the number of required shards and mongos nodes as follows:
    • Number of shards = N/M/0.75
    • Number of mongos nodes = 2 or more
  • You may want to use MongoDB sharding to write or read a small amount of data with high concurrency. The number of shards and mongos nodes to be deployed must meet the requirements for the read and write performance. The storage capacity is not an important factor. Assume that the maximum queries per second (QPS) of each shard is M, the maximum QPS of each mongos node is Ms, the total QPS required is Q, and the load threshold of each shard or mongos node is 75%. You can calculate the number of required shards and mongos nodes as follows:
    • Number of shards = Q/M/0.75
    • Number of mongos nodes = Q/Ms/0.75
    Note The required performance of the mongos and mongod nodes depends on whether data is frequently accessed in actual scenarios.

If you want to use MongoDB sharding to meet the preceding requirements at the same time, calculate the number of shards and mongos nodes based on more accurate metrics. In this topic, the number of shards and mongos nodes is calculated based on the ideal scenario where the data and requests are distributed evenly. However, the system load may be distributed unevenly in actual scenarios. To resolve this problem, you need to select a proper shard key.

Shard key selection

MongoDB provides the following sharding strategies:

  • Ranged sharding: distributes data into ranges based on the shard key values.
  • Hashed sharding: distributes data evenly to each shard.

The two sharding strategies cannot resolve the following problems:

  • The shard key has a small range of values. That is, the shard key has low cardinality. Assume that you deploy data across several data centers and use a data center as a shard key. The sharding effect, however, is generally not as expected because the number of data centers is limited.
  • A shard key value is contained by a large number of documents. Therefore, too many documents are stored in a chunk. If the size of the documents exceeds the upper limit of the chunk, the chunk is labeled as jumbo and the balancer cannot migrate the jumbo chunk.
  • If you query and update data based on a synthetic shard key, all queries will become scatter-gather queries. This lower the query and update efficiency.

A proper shard key has the following features:

  • Sufficient cardinality
  • Evenly distributed write operations
  • Targeted read operations

For example, an Internet of Things (IoT) application uses a MongoDB sharded cluster to store the logs of a large number of devices. Assume that millions of devices send logs to the MongoDB sharded cluster every 10s and the logs contain information such as the device IDs and timestamps. The most common query request of an application is to query the logs of a device within a specific time range. Four solutions are listed in this topic for comparison. We recommend that you do not use the first three of these solutions. The fourth solution is optimal.

  • Solution 1: Use the timestamp as the shard key and use the ranged sharding strategy.
    • The shard key is improper.
    • The timestamps of newly written data are always increasing in value. All write operations are distributed to the same shard. The write distribution is uneven.
    • The queries based on the device ID are spread across all shards. This lowers the query efficiency.
  • Solution 2: Use the timestamp as the shard key and use the hashed sharding strategy.
    • The shard key is improper.
    • All write operations are evenly distributed to shards.
    • The queries based on the device ID are spread across all shards. This lowers the query efficiency.
  • Solution 3: Use the device ID as the shard key and use the hashed sharding strategy. If the device IDs are random values, the ranged sharding strategy has the same effect.
    • The shard key is improper.
    • All write operations are evenly distributed to shards.
    • The data of a device cannot be further split but can only be distributed to the same chunk. As a result, the chunk may become a jumbo chunk and the queries based on the device ID can only be distributed to a single shard. In this case, you must scan and sort the entire table if you query data based on the timestamp range.
  • Solution 4: Recommended. Use the combination of the device ID and timestamp as the shard key and use the ranged sharding strategy.
    • The shard key is proper.
    • All write operations are evenly distributed to shards.
    • The data of a device can be further split and distributed to multiple chunks based on the timestamps.
    • You can use a combination of the device ID and timestamp to query the data of a device within a specific time range.

Jumbo chunk and chunk size

The default size of a chunk in a MongoDB sharded cluster is 64 MB. If the size of a chunk exceeds 64 MB and the chunk cannot be split, the chunk is labeled as jumbo. For example, if all documents use the same shard key, the chunk cannot be split. The balancer cannot migrate jumbo chunks, which may cause load imbalance. Try your best to avoid jumbo chunks.

If load balancing is not a demanding requirement, jumbo chunks do not affect the read or write operations. To deal with a chunk labeled as jumbo, use the following methods:

  • Split the jumbo chunk. If the jumbo chunk is split, the mongos node automatically removes the jumbo label.
  • If the chunk cannot be split and is not a jumbo chunk, you can manually remove the jumbo label. Before you remove the label, back up the config database in case of misoperation.
  • Increase the value of the chunk size parameter. If the size of the chunk does not exceed the upper limit, the jumbo label is removed. However, the chunk will be labeled as jumbo again when new data is written to the chunk and the size of the chunk exceeds the upper limit. The best solution is to select a proper shard key.

The default chunk size can be used in most scenarios. In the following scenarios, you may need to change the chunk size to a proper value ranging from 1 to 1024:

  • If the input and output (I/O) load is too high during chunk migration, you can reduce the chunk size.
  • During testing, you can reduce the chunk size to verify the performance.
  • The default chunk size is so small that a large number of chunks are labeled as jumbo and the load is imbalanced. In this case, you can try to increase the chunk size.
  • If you want to shard a collection that stores TBs of data, you must increase the chunk size. For more information, see Sharding Existing Collection Data Size.

Tag aware sharding

Tag aware sharding is a useful feature of sharded clusters, allowing you to customize chunk distribution rules. You can use the tag aware sharding feature by following these steps:

  1. Use the sh.addShardTag() method to associate a shard with tag A.
  2. Use the sh.addTagRange() method to associates a specific range of chunks with tag A. In this way, the chunks with tag A are distributed to the shard with tag A.

Scenarios of tag aware sharding

  • Associate shards in different data centers with tags based on data centers and distribute data in different chunks to specified data centers.
  • Associate the shards with tags based on their service capabilities and distribute more chunks to the shards with better performance.

Note:

The chunks cannot be distributed to shards with the specified tags in a timely manner. Instead, the distribution is gradually completed after chunk split and migration are triggered by the write and update operations. Make sure that the balancer is enabled during distribution. A period of time after chunks are associated with tags, newly written data may not be distributed to shards with the same tag.

Sharded cluster balancer

Currently, automatic load balancing for MongoDB sharded clusters is implemented by the background threads of mongos. Only one migration task can be run for each collection at the same time. The load balancing is triggered based on the number of chunks on each shard. When the number of chunks on a shard reaches the specific migration threshold, chunks are migrated between shards. The migration threshold depends on the total number of chunks.

By default, the balancer is enabled. To prevent the chunk migration from affecting online services, you can set the balancing window. For example, you can enable the balancer to migrate chunks only from 02:00 to 06:00.

use config
                db.settings.update(
                { _id: "balancer" },
                { $set: { activeWindow : { start : "02:00", stop : "06:00" } } },
                { upsert: true }
                )
			

Note: Disable the balancer when you back up a sharded cluster through mongos or back up data on the config server or all shards separately. Otherwise, the status of the data is inconsistent after backup.

sh.stopBalancer()
			

Data archiving during chunk migration

When you use a sharded cluster of MongoDB 3.0 or earlier, you may find that the disk usage in the directory keeps increasing after you stop writing data.

The problem is caused by the sharding.archiveMovedChunks parameter. In MongoDB 3.0 or earlier, the sharding.archiveMovedChunks parameter is set to true by default. During chunk migration, the source shard archives the data of the migrated chunks in a directory so that the archived data can be used for recovery if any problem arises. That is, the occupied space on the source shard is not released, whereas the space on the target shard is occupied.

Note In MongoDB 3.2, the parameter defaults to false, indicating that the data of the migrated chunks is not archived on the source shard.

recoverShardingState parameter

When you use a MongoDB sharded cluster, the following problem may occur:

The shard does not work after it is started. The ismaster parameter is set to false on the primary node of the shard and other commands cannot be run on the shard. The following result of running the db.isMaster() command shows the status of the shard.

mongo-9003:PRIMARY> db.isMaster()
                {
                "hosts" : [
                "host1:9003",
                "host2:9003",
                "host3:9003"
                ],
                "setName" : "mongo-9003",
                "setVersion" : 9,
                "ismaster" : false,  // Indicates that the current node is not a primary node.
                "secondary" : true,
                "primary" : "host1:9003",
                "me" : "host1:9003",
                "electionId" : ObjectId("57c7e62d218e9216c70aa3cf"),
                "maxBsonObjectSize" : 16777216,
                "maxMessageSizeBytes" : 48000000,
                "maxWriteBatchSize" : 1000,
                "localTime" : ISODate("2016-09-01T12:29:27.113Z"),
                "maxWireVersion" : 4,
                "minWireVersion" : 0,
                "ok" : 1
                }
			

The error log indicates that the shard cannot connect to the config server because the sharding.recoverShardingState parameter is set to true. When the shard is started, it connects to the config server to initialize the sharding. If the shard cannot connect to the config server, initialization fails. In this case, the status of the shard is abnormal.

This problem may arise when you migrate all shards of a sharded cluster to a new host. The information used to connect to the config server is changed whereas the started shard still tries to connect to the config server based on the original information. To resolve this problem, add setParameter recoverShardingState=false in the commands used to start the shard.