How do I configure sharding for a sharded cluster instance - ApsaraDB for MongoDB

You can configure sharding for each collection in a sharded cluster instance to make full use of the storage space of shards in the instance and maximize the computing performance of these shards.

Background information

If a collection is not sharded, all data in the collection is stored on the same shard node. In this case, the storage space of other shards cannot be fully used and the computing performance of these shards is compromised.

Prerequisites

The instance is a sharded cluster instance.

Usage notes

You cannot change or delete the configured shard key after sharding.
After you configure sharding, the balancer shards existing data that meets the specified criteria, which consumes the resources of an instance. We recommend that you perform this operation during off-peak hours.
Note
Before configuring sharding, you can set an active time window to limit the effective period of the balancer to off-peak hours. For more information, see Set an active time window for the balancer.
The choice of a shard key affects the performance of a sharded cluster instance. For more information about how to choose a shard key, see Shard Keys.

Sharding strategies

Sharding strategy

Description

Scenario

Ranged sharding

MongoDB divides data into contiguous ranges determined by the shard key values. Each chunk represents a contiguous range of data.

Advantage: The mongos can quickly locate the data being requested and forward the requests to the target shard.
Disadvantage: Data may be distributed unevenly among shards, causing hot shards for reads and writes and an uneven spread of writes.

The shard key value is not monotonically increasing or decreasing. The shard key has large cardinality and low frequency. Range-based queries are required.

Hashed sharding

MongoDB computes the hash value of a single field as the index value and divides data into chunks based on the range of hash values.

Advantage: Data can be distributed evenly among shards, guaranteeing an even spread of writes.
Disadvantage: This strategy is inapplicable to range-based queries because read requests must be distributed across all shards during the queries.

The shard key value is monotonically increasing or decreasing. The shard key has large cardinality and low frequency. Data writes are randomly distributed to shards. Data is read with high randomness.

In addition to the preceding two sharding strategies, you can also configure a compound shard key. For example, configure both a key with low cardinality and a monotonically increasing key. For more information, see Shard key selection.

Procedure

The following procedure uses the database named mongodbtest and the collection named customer as an example.

Connect to an ApsaraDB for MongoDB sharded cluster instance by using the mongo shell.
Enable sharding for the database where the collection to be sharded resides.
Important
If your instance runs MongoDB 6.0 or later, skip this step. For more information, see sh.enableSharding().
```
sh.enableSharding("<database>")
```
<database>: the name of the database.
Example:
```
sh.enableSharding("mongodbtest")
```
Note
You can run the sh.status() command to check whether sharding is enabled.
Create an index on the shard key field.
```
db.<collection>.createIndex(<keyPatterns>,<options>)
```
Parameters in the preceding command:
- <collection>: the name of the collection.
- <keyPatterns>: the field used for indexing and the index type.
  Common index types are as follows:
  - 1: an ascending index
  - -1: a descending index
  - "hashed": a hashed index
- <options>: the optional parameters. For more information, see db.collection.createIndex(). This field is not used in this example.
Sample command for creating an ascending index:
```
db.customer.createIndex({name:1})
```
Sample command for creating a hashed index:
```
db.customer.createIndex({name:"hashed"})
```
Configure sharding for the collection.
```
sh.shardCollection("<database>.<collection>",{ "<key>":<value> } ) 
```
Parameters in the preceding command:
- <database>: the name of the database.
- <collection>: the name of the collection.
- <key>: the shard key that MongoDB uses to shard data.
- <value>
  - 1: ranged sharding. This strategy supports efficient range-based queries based on the shard key.
  - "hashed": hashed sharding. This strategy distributes data writes evenly among shards.
Sample command for configuring ranged sharding:
```
sh.shardCollection("mongodbtest.customer",{"name":1})
```
Sample command for configuring hashed sharding:
```
sh.shardCollection("mongodbtest.customer",{"name":"hashed"})
```

What to do next

After the instance has been running and data has been written for a while, you can run the sh.status() command in the mongo shell to view data distribution among shards.

You can also run the db.stats() command to view the size of data stored on each shard.