This topic describes the features of ApsaraDB for MongoDB V4.4. As the exclusive strategic partner of MongoDB, Alibaba Cloud introduced MongoDB 4.4 and released ApsaraDB for MongoDB V4.4 in November 2020. MongoDB 4.4 was released on July 30, 2020. Compared with previous major versions, ApsaraDB for MongoDB V4.4 provides the enhanced features that are most demanded.

Hidden indexes

Although a large number of indexes degrade write performance, complex data makes it difficult for O&M engineers to delete potentially ineffective indexes. If effective indexes are deleted by accident, jitter may occur. Furthermore, the cost to recreate the accidentally deleted indexes is high.

To solve these problems, Alibaba Cloud and MongoDB jointly develop the hidden indexes feature based on the strategic cooperation. For more information, see the ApsaraDB for MongoDB product page. This feature allows you to use the collMod command to hide indexes. Hidden indexes are not evaluated as part of query plan selection. If hiding the indexes produce a positive impact, you can delete the hidden indexes.

A hidden index has the following syntax:
db.runCommand( {
   collMod: 'testcoll',
   index: {
      keyPattern: 'key_1',
      hidden: false
   }
} )

Hidden indexes are not visible to the query planner of ApsaraDB for MongoDB. However, hidden indexes retain some natural features of indexes such as unique index constraint and time to live (TTL) expiration.

Note Hidden indexes are active immediately after you unhide them because hidden indexes are still updated when they are hidden.

Refinable shard keys

A shard key assumes a vital role in a sharded cluster. A shard key can make the sharded cluster more scalable under a specified workload. However, in some scenarios, workload changes may lead to jumbo chunks even if you carefully select the shard key. Jumbo chunks are the chunks that grow beyond the specified chunk size.

In ApsaraDB for MongoDB V4.0 and earlier, the shard key for a collection and the shard key value are immutable. Since MongoDB V4.2, you can change the shard key value. However, to change the shard key value, you must migrate data across shards based on distributed transactions. This mechanism consumes high performance overheads. The issues on jumbo chunks and query hotpots can hardly be avoided. For example, if you use the {customer_id:1} shard key to shard an order collection, the shard key can meet the business requirements in the early stage of business development. This is because each customer places a small number of orders. However, if a customer increases the number of orders as the business develops, frequent queries on the order document may lead to query hotspots on a single shard. Because an order has close relationships with the customer_id field, the problems that are caused by uneven queries cannot be solved by changing the customer_id field.

In the preceding scenario, you can use the refineCollectionShardKey command that is provided by ApsaraDB for MongoDB V4.4 to add one or more suffix fields to the existing shard key. This way, the shard key is refined for a more fine-grained document data distribution across chunks. In the preceding example, to avoid query hotspots on a single shard, you can use the refineCollectionShardKey command to change the shard key to {customer_id:1, order_id:1}.

The refineCollectionShardKey command consumes low performance overheads. The command modifies only the metadata on the Config Server node but does not migrate data. Data is gradually distributed when chunks are split or migrated to other shards. A shard key must be supported by an index. You must create an index that supports the new shard key before you run the refineCollectionShardKey command.

The suffix fields are not stored in all documents. To solve this problem, ApsaraDB for MongoDB V4.4 provides the missing shard key feature. Documents in sharded collections can miss the shard key fields. The downside of the missing shard key feature is that it may lead to jumbo chunks. We recommend that you do not use missing shard keys unless necessary.

Compound hashed shard keys

Versions of ApsaraDB for MongoDB earlier than V4.4 do not support compound hashed indexes. You can specify only the hash key with a single field. This may lead to the problem that data in collections is unevenly distributed across shards.

ApsaraDB for MongoDB V4.4 supports compound hashed indexes. You can specify a single hashed field in compound indexes as the prefix or suffix field.

Compound hashed indexes have the following syntax:

sh.shardCollection(
  "examples.compoundHashedCollection",
  { "region_id" : 1, "city_id": 1, field1" : "hashed" }
)
sh.shardCollection(
  "examples.compoundHashedCollection",
  { "_id" : "hashed", "fieldA" : 1}
)
You can take full advantage of compound hashed indexes in the following scenarios:
  • To meet the relevant laws and regulations, you must use the zone sharding feature that is provided by ApsaraDB for MongoDB to evenly distribute data across shards that reside in a zone.
  • A collection uses a monotonically increasing value as the shard key. In the preceding example, the value of the customer_id field is a monotonically increasing number in the {customer_id:1, order_id:1} shard key. Data from the latest customers is also written to the same shard, which results in a large shard. This leads to uneven distribution of data across shards.

If compound hashed indexes are not supported, you must compute the hash value of a single field, store the hash value in a special field of a document as the index value, and then use the ranged sharding feature to specify the index value as your shard key.

In ApsaraDB for MongoDB V4.4, you need only to specify the field as "hashed". In the preceding second scenario, you need only to set the shard key to {customer_id:'hashed', order_id:1} to simplify the business logic.

Hedged reads

Network latencies may cause economic loss. Based on a report from Google, if a page takes more than three seconds to load, over half of visitors leave the page. For more information about the report, visit 7 Page Speed Stats Every Marketer Should Know. To minimize network latencies, ApsaraDB for MongoDB V4.4 provides the hedged reads feature. In sharded clusters, the mongos nodes route a read operation to two replica set members per queried shard and return results from the first respondent per shard to the client. This way, P95 and P99 latencies are reduced. P95 and P99 latencies indicate that average latencies for the slowest 5% and 1% of requests over the last 10 seconds are within the limits.

Hedged reads are specified per operation as part of the Read Preference parameter. Hedged reads are supported for specific operations. If you set the Read Preference parameter to nearest, the system enables the hedged reads feature. If you set the Read Preference parameter to primary, the hedged reads feature is not supported. If you set the Read Preference parameter to a value other than nearest or primary, you must set the hedgeOptions parameter to true to enable the hedged reads feature. If you set the Read Preference parameter to a value other than nearest or primary, hedged reads have the following syntax:
db.collection.find({ }).readPref(
   "secondary",                      // mode
   [ { "datacenter": "B" },  { } ],  // tag set
   { enabled: true }                 // hedge options
)

The support of mongos nodes for the hedged reads feature must also be enabled. To enable the support, you must set the readHedgingMode parameter to on.

You can use the following code to enable the support of mongos nodes for the hedged reads feature:
db.adminCommand( { setParameter: 1, readHedgingMode: "on" } )

Reduced replication latency

ApsaraDB for MongoDB V4.4 reduces the latency of primary/secondary replication. The latency of primary/secondary replication affects read/write operations in ApsaraDB for MongoDB. In some scenarios, secondary databases must replicate and apply the incremental updates of primary databases in a short amount of time. Otherwise, the secondary databases cannot continue to perform read and write operations. Lower latencies provide better primary/secondary consistency.

Streaming replication

Secondary databases in versions of ApsaraDB for MongoDB earlier than V4.4 must poll upstream data to obtain incremental updates. To poll upstream data, a secondary database sends a getMore command to a primary database to scan the oplog collection. If the oplog collection has entries, the secondary database fetches a batch of oplog entries. The batch can be a maximum of 16 MB. If the scan reaches the end of the oplog collection, the secondary database uses the awaitData parameter to block the getMore command. When new data is inserted into the oplog collection, the secondary database fetches the next batch of oplog entries. The fetch operation uses the OplogFetcher thread that requires a round trip time (RTT) between the source and target machines. If the replica set is under poor network conditions, network latency degrades the replication performance.

In ApsaraDB for MongoDB V4.4, a primary database sends a continuous stream of oplog entries to a secondary database instead of waiting for secondary databases to poll upstream data. Compared with the method in which a secondary database polls upstream data, at least half of the RTT is saved for each batch of oplog entries. You can take full advantage of streaming replication in the following scenarios:

  • If you set the writeConcern parameter to "majority" for write operations, "majority" write operations must wait for replication multiple times. Even in high-latency network conditions, streaming replication can improve average performance by 50% for "majority" write operations.
  • If you use causal consistent sessions, you can read your own write operations in secondary databases. This feature requires secondary databases to immediately replicate the oplog collection of primary databases.

Simultaneous indexing

Versions of ApsaraDB for MongoDB earlier than V4.4 require you to create indexes in primary databases before you can replicate the indexes to secondary databases. The method to create indexes in secondary databases varies between different versions. The impact on the oplog collection in secondary databases varies based on different methods to create indexes in secondary databases.

In ApsaraDB for MongoDB V4.2, foreground and background indexes are created in the same way. A collection holds the exclusive lock only at the beginning and end of index creation processes. Despite this fine-grained locking method, CPU and I/O overheads of index creation processes cause replication latency. Some special operations may also affect the oplog collection in secondary databases. For example, if you use the collMod command to modify the metadata of a collection, the oplog collection may be blocked. The oplog collection may also enter the Recovering state because the history oplog collection in the primary databases is overridden.

In ApsaraDB for MongoDB V4.4, indexes are simultaneously created in the primary and secondary databases. This way, the primary/secondary latency is reduced. Secondary databases can read the latest data even in the index creation process.

The indexes can be used only when a majority of voting nodes finish creating the indexes. This can reduce performance differences that are caused by different indexes in read/write splitting scenarios.

Mirrored reads

One common phenomenon in ApsaraDB for MongoDB is that most users who purchase three-node replica set instances perform read and write operations only on the primary node, while the secondary node does not process read traffic. In this case, occasional failover causes noticeable access latency and the access can be restored to the original status after a period of time. This is because the elected primary node processes read traffic for the first time. The elected primary node does not acknowledge the characteristics of the frequently accessed data and has no correspondent cache. After the elected primary node processes read traffic, read operations encounter a large amount of cache misses and data must be reloaded from the disk. This results in increased access latency. This problem is obvious for instances that have large memory capacities.

ApsaraDB for MongoDB V4.4 provides the mirrored reads feature. The primary node can use mirrored reads to mirror a subset of read operations that it receives and send them to a subset of electable secondary databases. This helps the secondary database pre-warm the cache. The mirrored reads are fire-and-forget operations. Fire-and-forget operations are non-blocking operations that have no impacts on the performance of the primary database. However, workloads increase in the secondary databases.

You can specify the rate of mirrored reads by using the mirrorReads parameter. The default rate is 1%.

To set the rate of mirrored reads to 10%, use the following code:

db.adminCommand( { setParameter: 1, mirrorReads: { samplingRate: 0.10 } } )

You can also use the db.serverStatus( { mirroredReads: 1 } ) command to collect statistics of mirrored reads:

SECONDARY> db.serverStatus( { mirroredReads: 1 } ).mirroredReads
{ "seen" : NumberLong(2), "sent" : NumberLong(0) }

Resumable initial synchronization

In versions of ApsaraDB for MongoDB earlier than V4.4, a secondary database must restart the entire initial synchronization process if network interruptions occur.

In ApsaraDB for MongoDB V4.4, a secondary database can attempt to resume the synchronization process. If the secondary database cannot resume the initial synchronization process during the configured period, the system selects a new source and restarts the initial synchronization process from the beginning. By default, the secondary database attempts to resume initial synchronization for 24 hours. You can use the replication.initialSyncTransientErrorRetryPeriodSeconds parameter to change the synchronization source when the secondary database attempts to resume the synchronization process.

If the network encounters a non-transient connection error, the secondary database must restart the entire initial synchronization process.

Time-based oplog retention

In ApsaraDB for MongoDB, the oplog collection records all operations that modify the data in your database. The oplog collection is a vital infrastructure that can be used for replication, incremental backup, data migration, and data subscription.

The oplog collection is a capped collection. Since ApsaraDB for MongoDB V3.6, you can use the replSetResizeOplog command to modify the size of the oplog collection. However, you cannot obtain the accurate incremental oplog entries. In the following scenarios, you can use the time-based oplog retention feature:
  • A secondary node is scheduled to be shut down for maintenance from 02:00:00 to 04:00:00. During this period, the upstream databases may trigger full synchronization due to the missing oplog collection. The full synchronization needs to be avoided.
  • If exceptions occur, the data subscription components in the downstream databases may fail to provide services. The services are restored within 3 hours at most and incremental data is pulled again. In this case, the lack of incremental data in upstream databases need to be avoided.

In most scenarios, you need to retain the oplog entries that are generated in the last period. However, the number of oplog entries that are generated in the last period is difficult to determine.

In ApsaraDB for MongoDB V4.4, you can use the storage.oplogMinRetentionHours parameter to specify the minimum number of hours to preserve an oplog entry. You can use the replSetResizeOplog command to modify the minimum number of hours. The following code shows how to modify the minimum number of hours:

// First, show current configured value
db.getSiblingDB("admin").serverStatus().oplogTruncation.oplogMinRetentionHours
// Modify
db.adminCommand({
  "replSetResizeOplog" : 1,
  "minRetentionHours" : 2
})

Union

Versions of ApsaraDB for MongoDB earlier than V4.4 provide the $lookup stage that is similar to the LEFT OUTER JOIN feature in SQL for union query. In ApsaraDB for MongoDB V4.4, the $unionWith stage provides a similar feature as the UNION ALL operator in SQL. You can use the $lookup stage to combine pipeline results from multiple collections into a single result set, and then query and filter data based on specified conditions. The $unionWith stage differs from the $lookup stage in the support of queries on sharded collections. You can use multiple $unionWith stages to blend multiple collections and aggregate pipelines. The $unionWith stage has the following syntax:

{ $unionWith: { coll: "<collection>", pipeline: [ <stage1>, ... ] } }

You can also specify different stages in the pipeline parameter to filter data in specified collections before aggregation, which is flexible to use. For example, assume that you need to store order data from a business in different collections by table. The following code provides the data in the second quarter:

db.orders_april.insertMany([
  { _id:1, item: "A", quantity: 100 },
  { _id:2, item: "B", quantity: 30 },
]);
db.orders_may.insertMany([
  { _id:1, item: "C", quantity: 20 },
  { _id:2, item: "A", quantity: 50 },
]);
db.orders_june.insertMany([
  { _id:1, item: "C", quantity: 100 },
  { _id:2, item: "D", quantity: 10 },
]);

In versions of ApsaraDB for MongoDB earlier than V4.4, if you need a sales report of different products in the second quarter, you must read all the data by yourself and aggregate the data at the application layer. Alternatively, you must use a data warehouse to analyze the data. In ApsaraDB for MongoDB V4.4, you need only to use a single aggregate statement to aggregate the data. The aggregate statement has the following syntax:

db.orders_april.aggregate( [
   { $unionWith: "orders_may" },
   { $unionWith: "orders_june" },
   { $group: { _id: "$item", total: { $sum: "$quantity" } } },
   { $sort: { total: -1 }}
] )

Custom aggregation expressions

To execute complex queries in versions of ApsaraDB for MongoDB earlier than V4.4, you can use the $where operator in the find command. Alternatively, you can run a JavaScript file on the server by using the MapReduce command. However, these methods cannot use the aggregation pipeline.

In ApsaraDB for MongoDB V4.4, you can use the $accumulator and $function aggregation pipeline operators instead of relying on the $where operator and the MapReduce command. You can define your own custom expressions in JavaScript and execute the expressions on the database server as part of an aggregation pipeline. This way, the aggregation pipeline framework makes it convenient to aggregate complex queries, which enhances user experiences.

The $accumulator operator works in a similar way as the MapReduce command. The init function is used to initialize the states for input documents. Then, the accumate function is used to update the state for each input document and determine whether to execute the merge function.

For example, if you use the $accumulator operator on sharded collections, the merge function is required to merge multiple states. If you also specify the finalize function, the merge function returns the combined result of the merged states based on the result of the finalize function after all documents are processed.

The $function operator works in a similar way as the $where operator. The additional advantage is that the $function operator can work with other aggregation pipeline operators. You can also use the $function operator together with the $expr operator in the find command, which works in a similar way as the $where operator. The MongoDB official manual also suggests to use the$function operator.

New aggregation pipeline operators

In addition to the $accumulator and $function operators, ApsaraDB for MongoDB V4.4 provides new aggregation pipeline operators for multiple purposes. For example, you can manipulate strings, and obtain the last element in an array. You can also obtain the size of a document or binary string. The following table describes the new aggregation pipeline operators.
Operator Description
$accumulator Returns the result of a user-defined accumulator operator.
$binarySize Returns the size of a specified string or binary object. Unit: bytes.
$bsonSize Returns the size of a specified BSON-encoded document. Unit: bytes.
$first Returns the first element in an array.
$function Defines a custom aggregation expression.
$last Returns the last element in an array.
$isNumber Returns the Boolean value true if the specified expression resolves to the integer, decimal, double, or long type. Returns the Boolean value false if the expression resolves to any other BSON type, null, or a missing field.
$replaceOne Replaces the first instance that is matched by a specified string.
$replaceAll Replaces all instances that are matched by a specified string.

Connection monitoring and pooling

In ApsaraDB for MongoDB V4.4, you can use drivers to configure and monitor connection pooling behaviors. Standard APIs are required to subscribe to events that are associated with a connection pool, which includes establishing and closing connections in the connection pool, and clearing the connection pool. You can also use API operations to configure connection pool options, such as the maximum or minimum number of connections allowed for a pool, the maximum amount of time that a connection can remain, and the maximum amount of time that a thread can wait for a connection to become available. For more information, visit Connection Monitoring and Pooling.

Global Read and Write Concerns

In versions of ApsaraDB for MongoDB earlier than V4.4, if you do not specify the readConcern or writeConcern parameter for an operation, the default value is used. The default value of readConcern is local, whereas the default value of writeConcern is {w: 1}. The default values of the two parameters cannot be modified. For example, if you want to use "majority" write concerns for all insert operations, you must set the writeConcern parameter to {w: majority} in all ApsaraDB for MongoDB access code.

In ApsaraDB for MongoDB V4.4, you can use the setDefaultRWConcern command to specify the global default readConcern and writeConcern settings. The following code shows the syntax of the setDefaultRWConcern command:

db.adminCommand({
  "setDefaultRWConcern" : 1,
  "defaultWriteConcern" : {
    "w" : "majority"
  },
  "defaultReadConcern" : { "level" : "majority" }
})

You can also use the getDefaultRWConcern command to obtain the current global default settings for the readConcern and writeConcern parameters.

When slow query logs or diagnostics logs are also stored in ApsaraDB for MongoDB V4.4, the provenance of the read or write concern specified by readConcern or writeConcern is logged. The following table describes the possible provenance of a read or write concern.
Provenance Description
clientSupplied The read or write concern is specified in the application.
customDefault The read or write concern is specified in the setDefaultRWConcern command.
implicitDefault The read or write concern originates from the server in absence of all other read or write concern specifications.
Another possible provenance is available for write concerns.
Provenance Description
getLastErrorDefaults The write concern originates from the settings.getLastErrorDefaults field of the replica set.

New MongoDB Shell (beta)

MongoDB Shell is one of the most common DevOps tools in ApsaraDB for MongoDB. In ApsaraDB for MongoDB V4.4, a new version of MongoDB Shell provides enhanced usability features, such as syntax highlighting, command autocomplete, and easy-to-read error messages. The new version is available in beta. More commands are being actively developed. We recommend that you use the new MongoDB Shell only for trial purposes.New MongoDB Shell

Summary

ApsaraDB for MongoDB V4.4 is a maintenance release. In addition to the preceding key features, general improvements are provided, such as the optimization of the $indexStats aggregation operator, support for TCP Fast Open (TFO) connections, and optimization of index deletion. Some large enhancements are also available, such as structured logs (logv2) and new authentication mechanism for security. For more information, visit Release Notes for MongoDB 4.4.