Assistant Engineer
Assistant Engineer
  • UID626
  • Fans1
  • Follows1
  • Posts52

MongoDB sharding migration (1)

More Posted time:Oct 21, 2016 13:32 PM

If you have no idea about the MongoDB Sharded Cluster principle, please read
Principle of MongoDB sharded cluster architecture
What You Should Know about MongoDB Sharding
The sharding migration will be introduced in three parts. This article is the first part.
1. Load balancing and migration policies
2. Chunk migration process
3. Balancer O&M
Why chunk migration?
MongoDB sharding mainly has three scenarios requiring chunk migration.
Scenario 1
When the number of chunks is not balanced among multiple shards, MongoDB will migrate chunks automatically among shards to balance the distribution of chunks among various shards as much as possible, which is also the so-called load balancing.
Scenario 2
After a user calls the removeShard command, the chunks on the removed shard will be migrated to other shards. After the shard to be removed has no data, it will be removed. (Attention: If a shard has no sharded sets, you need to call the movePrimary command manually for the migration, because the system will not perform the migration automatically.)
Scenario 3
MongoDB sharding supports shard tags and you can tag the shard and shard key range. The system will automatically migrate the data in the corresponding range to the shard with the same tag. Example
mongos> sh.addShardTag("shard-hz", "hangzhou")
mongos> sh.addShardTag("shard-sh", "shanghai")
mongos> sh.addTagRange("shtest.coll", {x: 1}, {x: 1000}, "hangzhou")
mongos> sh.addTagRange("shtest.coll", {x: 2000}, {x: 5000}, "shanghai")

If you have tagged two shards and the shard key range of a certain set, all the documents with the x value between [1, 1000) in the set will be distributed to shard-hz, while the documents with the x value between [2000, 5000) in the set will be distributed to shard-sh.
Who will perform the migration?
In Version 3.2, Mongos has a background balancer task. The task keeps making judgments about whether to migrate chunks according to the three scenarios above. If migration is necessary, it will send the moveChunk command to the source shard to start migration. The entire migration process is comparatively complicated and will be detailed in the second part.
Apart from the above scenarios which will trigger automatic chunk migration, MongoDB also provides the moveChunk command for users to initiate data migration.
How does a balancer work?
There may be many Mongos in a sharded cluster. If the balancers of all the Mongos trigger migration at the same time, the whole cluster will be a mess. To avoid such a situation, only one balancer is allowed for load balancing at the same time.
Before starting the load balancing, balancers will first compete for the lock. The succeeding balancer continues the work, and others keep waiting and will compete for the lock again after a while.
The lock here is actually a special document in the config.locks set in the config server. Balancers use the findAndModify command to update the state field of the document (similar to the logic of set state=1 if state==0). Successful update means successful competition for the lock.
Afterwards, the balancer starts to traverse all the sharded sets, execute the following steps for every set and check whether chunk migration is required.
Step 1: Obtain chunk distribution information of the set
Obtain the meta information of the shard (draining indicates whether the shard is being removed).

Obtain chunk distribution information of the set.

Obtain tag information of the set.

Step 2: Check whether a chunk needs to be split
If the set hasn't configured the tag range, no operations are required in this step. It mainly checks whether there are overlapped tag ranges with the chunk. If yes, the chunk is split with the Range.min (the minimum value of the range) as the split point. Example
The tag of the aforementioned (20, 80) range is “tag0”, and is overlapped with the chunk (0, 100), so chunk splitting is triggered at the point of 20 into chunk (0, 20) and chunk (20, 100). Next, you can migrate chunk (20, 100) from shard1 to shard0, so that the tag distribution rule can be met. This step only serves to prepare for the migration and the migration is completed in Step 4.
Step 3: Migrate chunks on the draining shard
When a user runs the removeShard command to remove a shard, MongoDB will mark the shard as in the draining state. When the balancer is conducting the migration, if it finds a shard in the draining state, it will take the initiative to migrate the chunks on the shard to other shards. The balancer will select the shard with the fewest chunks as the target for migration for setting up the migration task.
Step 4: Migrate chunks with unmatched tags
In Step 2, the chunk has been split according to the tag range border. At this time, the balancer only needs to check which shards that the chunks belong to have unmatched tags. If such a shard is located, a migration task is set up to migrate the chunk to a shard with a matched tag.
Step 5: Load balancing migration
The balancer will also perform load balancing migration based on the number of chunks of various shards. If the difference between the chunk number of two shards in a set exceeds the threshold value, the migration will be triggered. (Through comparing the shard with the most chunks and that with the fewest chunks)

The migration threshold value is shown in the above table. It means that when the number of chunks in the set is smaller than 20, and the difference in the number of chunks between two shards is greater than or equal to 2, a migration task will be established to migrate a chunk from the shard with the most chunks to the shard with the fewest chunks.
Step 6: Execute migration
The true migration can start based on the migration task set up in Step 3 to Step 5.
It is worth noting that migration in Step 3 and Step 4, although necessary to ensure the normal operation of the system functions, is controlled by the balancer. If the balancer is disabled, the removeShard and shard tag logic may not function normally. So please be cautious when disabling the balancer. The O&M of the balancer will be illustrated in detail in Section 3.
[Charlene edited the post at Oct 24, 2016 9:53 AM]