MongoDB 5.0 marks a new release cycle to deliver new features to users faster than in the past. The combination of Versioned API and live resharding frees users from the burden of future database upgrades or business changes. The native time series platform allows MongoDB to support a wider range of workloads and business scenarios. The new Mongo Shell improves user experience. All of these features come from MongoDB 5.0. This topic describes the features of MongoDB 5.0.

Native time series platform

MongoDB 5.0 natively supports the entire lifecycle of time series data, from ingestion, storage, query, real-time analysis, and visualization through to online archival or automatic expiration as data ages. This makes it faster to build and run time series applications at a lower cost. MongoDB expands the universal application data platform, which makes it easier for developers to process time series data. This further expands the use scenarios of MongoDB in areas such as IoT, financial analysis, and logistics.

MongoDB time series collections automatically store time series data in a highly optimized and compressed format, which reduces storage size and I/O to deliver better performance on a larger scale. Time series collections also shorten the development cycle, which enables you to quickly build a model optimized for the performance and analysis requirements of time series applications.

You can run the following command to create a time series data collection:
db.createCollection("collection_name",{ timeseries: { timeField: "timestamp" } } )

MongoDB can seamlessly adjust the acquisition frequency and automatically process out-of-order measurement values based on dynamically generated time partitions. The new MongoDB Connector for Apache Kafka implements local support for time series. You can directly create a time series collection from Kafka topic messages. This allows you to process and aggregate the data during data collection, and then write it to a MongoDB time series collection.

The time series collection automatically creates a clustered index of data sorted by time to reduce data query latency. The MongoDB query API also extends window functions. This allows you to run analytical queries such as moving averages and cumulative sums. In relational database systems, analytical queries are usually referred to as SQL analysis functions and support windows defined in units of rows such as three-line moving averages. MongoDB goes a step further and adds powerful time series functions such as exponential moving average (EMA), derivative, and integral. This allows you to define a window in units of time such as a 15-minute moving average. Window functions can be used to query MongoDB time series and regular collections, which provides new analysis methods for multiple application types. MongoDB 5.0 also provides new time operators, including $dateAdd, $dateSubstract, $dateDiff, and $dateTrunc. This allows you to summarize and query data by using a custom time window.

You can combine MongoDB time series data with other data of your enterprise. Time series collections can be put together with regular MongoDB collections in the same database. You do not need to select a dedicated time series database that is unable to provide services for other types of applications, nor do you need complex integration to mix time series data and other data. MongoDB provides a unified platform that allows you to build high-performance and efficient time series applications and also provides support for other use cases or workloads. This eliminates the cost and complexity of integrating and running multiple different databases.

Live data resharding

Database version Feature Implementation method
Versions earlier than MongoDB 5.0 The resharding process is complex and requires manual operations.
  • Method 1: First dump the entire collection, and then reload the database into a new collection by using the new shard key.

    This process requires your offline processing. Your application must be suspended for a long period of time before the reload is complete. For example, the dumping and reloading of a collection that is larger than 10 TB on a three-shard cluster may take several days to complete.

  • Method 2: Create a new sharded cluster, configure shard keys for collections, and then perform custom migration to write the collection that you want to reshard from the existing sharded cluster to the new sharded cluster based on the configured shard keys.
    • During this process, you must handle the query routing and migration logic, and constantly check the migration progress to ensure that all data is migrated.
    • Custom migration is a highly complex, labor-intensive, and time-consuming task that may incur risks. For example, one MongoDB user spent three months migrating 10 billion documents.
MongoDB 5.0
  • You can run the reshardCollection command to start resharding.
  • The resharding process is efficient.

    Instead of simply rebalancing data, the resharding process copies and rewrites all the data of the current collection to the new collection in the background. During this process, new writes of the application are also synchronized.

  • The resharding process is fully automated.

    The time spent on resharding is shortened from weeks or months to minutes or hours, and lengthy and complex manual data migration is avoided.

  • By means of online resharding, you can easily evaluate the effects of different shard keys in a development or test environment. You can also modify the shard keys as your needs change.
Live resharding allows you to change the shard key for your collection on demand as your workload grows and evolves. No database downtime or complex migration within the dataset is required in this process. You can run the reshardCollection command in the Mongo Shell to select the database and collection that you want to reshard and specify the new shard key.
reshardCollection: "<database>.<collection>", key: <shardkey>
Note
  • <database>: the name of the database that you want to reshard.
  • <collection>: the name of the collection that you want to reshard.
  • <shardkey>: the name of the shard key.
  • When you run the reshardCollection command, MongoDB clones an existing collection and then applies all operation logs in the existing collection to a new collection. When all operation logs are applied, MongoDB switches business to the new collection and deletes the existing collection.

Versioned API

  • Compatibility with applications

    Starting with MongoDB 5.0, the Versioned API defines a set of commands and parameters that are most commonly used in applications. These commands do not change regardless of whether a database release is an annual major release or a quarterly rapid release. You can pin the driver to a specific version of the MongoDB API by decoupling the application lifecycle from the database lifecycle. This way, even if the database is upgraded and improved, your application can continue to run for several years without modifications to code.

  • New features and improvements flexibly added

    The Versioned API allows new features and improvements to be flexibly added to the database of each version with full backward compatibility. When you want to change an API, you can add a new version of the API and run it on the same server at the same time as the existing version of the API. With the accelerated release of MongoDB versions, the Versioned API enables quicker and easier access to the features of the latest MongoDB version.

Default majority write concern

Starting with MongoDB 5.0, the default write concern is majority. A write operation is committed and write success is passed back to the application only when the write operation is applied to the primary node and persisted to the logs of a majority of secondary nodes. This ensures that MongoDB 5.0 can provide stronger data durability guarantees out of the box.
Note The write concern is fully tunable. You can customize the write concern to strike a balance between database performance and data durability.

Optimization of connection management

By default, a client connection corresponds to a thread on the backend MongoDB server. In other words, net.serviceExecutor is set to synchronous. Large amounts of system resources are required for the create, switch, or destroy process. When the number of connections is large, the threads occupy more resources of the MongoDB server.

Situations where the number of connections is large or the creation of connections is out of control are called a "connection storm". These situations may arise for a variety of reasons, often when the service is affected.

In response, MongoDB 5.0 takes the following measures:
  • Limit the number of connections that the driver tries to create to easily and effectively prevent overloading the database server.
  • Reduce the frequency at which the driver monitors connection pools and allow unresponsive or overloaded server nodes to buffer and recover.
  • The driver directs the workload to a faster server that has the healthiest connection pool rather than randomly selecting a server from the available options.

Both the preceding measures and the improvements in the mongos query routing layer of the previous version further enhance the ability of MongoDB to withstand high concurrency loads.

Long-running snapshot queries

Long-running snapshot queries improve the versatility and flexibility of applications. By default, snapshot queries executed by this feature have an execution duration of 5 minutes. You can also customize the duration of a snapshot query. In addition, this feature maintains strong consistency with snapshot isolation guarantees without impacting the performance of your live, transactional workloads, and enables snapshot queries to be executed on secondary nodes. This allows you to run different workloads in a single cluster and scale your workloads to different shards.

MongoDB implements long-running snapshot queries by means of a project called durable history in the underlying storage engine. The project has been implemented in MongoDB 4.4. Durable history stores a snapshot of all field values that have changed since the start of a query. Queries can use durable history to maintain snapshot isolation. If data is changed, durable history also helps alleviate the caching pressure of the storage engine and enables higher query throughput in high write load scenarios.

New MongoDB Shell

For enhanced user experience, the new MongoDB Shell has been redesigned from the ground up to provide a modern command-line experience, enhanced usability features, and a powerful scripting environment. The new MongoDB Shell has become the default shell for MongoDB. The new MongoDB Shell introduces syntax highlighting, intelligent auto-complete, contextual help, and useful error messages to create an intuitive and interactive experience.

  • Enhanced user experience
    • Easier implementation of queries and aggregations and improved readability

      The new MongoDB Shell supports syntax highlighting, which allows you to distinguish fields, values, and data types to avoid syntax errors. If an error still occurs, the new MongoDB Shell can pinpoint the issue and provide solutions to fix it.

    • Faster query and command typing

      The new MongoDB Shell provides the intelligent auto-complete feature. The new MongoDB Shell can give auto-complete prompts for methods, commands, and MQL expressions based on your MongoDB version.

      If you forget the syntax of a command, you can quickly search the command syntax in the MongoDB Shell. Sample syntax: Sample syntax
  • Advanced scripting environment

    The scripting environment of the new MongoDB Shell is built on top of the Node.js REPL interactive interpreter. You can use all Node.js APIs and npm modules in your scripts. You can also load and run scripts from file systems. In the new MongoDB Shell, you can continue to use the load() method and eval() function to execute scripts as you would do in the old MongoDB Shell.

  • Expandability and plug-ins

    The new MongoDB Shell can be easily expanded and allows you to use all the features of MongoDB to increase productivity.

    In the new MongoDB Shell, the Snippets plug-in can be installed. Snippets can be automatically loaded into the MongoDB Shell, and Snippets can use all Node.js APIs and npm packages. MongoDB also maintains a Snippets repository, which provides some interesting functionalities, such as analyzing plug-ins for specified collection patterns. You can also configure the MongoDB Shell to use plug-ins of your choice.
    Note The plug-in is currently an experimental feature of the MongoDB Shell.

PyMongoArrow and data science

With the release of the new PyMongoArrow API, you can use Python to run complex analysis and machine learning tasks on MongoDB. PyMongoArrow can quickly convert simple MongoDB query results to popular data formats such as Pandas DataFrames and NumPy arrays to simplify your data science workflows.

Schema validation improvements

Schema validation is a method used by MongoDB for data application management. In MongoDB 5.0, schema validation has become more simple and user-friendly. When an operation validation fails, a descriptive error message is generated to highlight the documents that do not conform to the collection validation rules and for what reason. This way, you can quickly identify and correct the error code that affects the validation rules.

Resumable index creation tasks

MongoDB 5.0 allows an ongoing index creation task to automatically resume from where it left off after the node where that task takes place restarts. This reduces the impact of planned maintenance operations on business. For example, when database nodes are restarted or upgraded, you do not need to worry about the ongoing index creation tasks for large collections failing.

Version release adjustment

MongoDB supports many versions and platforms, and each MongoDB release needs to be verified on more than 20 MongoDB-supported platforms. The heavy verification workload reduces the delivery speed of new features of MongoDB. Therefore, starting with the 5.0 release, MongoDB is released as two different release series: Rapid Releases and Major Releases. Rapid Releases are available for evaluation and development purposes. We recommend that you do not use Rapid Releases in a production environment.