Restore downloaded disk backup data to a self-managed MongoDB database - ApsaraDB for MongoDB

ApsaraDB for MongoDB logical backups are created with mongodump. Use this guide to decompress a downloaded backup archive, merge the split .bson files it contains, and load the data into a self-managed MongoDB database with mongorestore.

Usage notes

Version compatibility: Older versions of mongorestore may not work with newer MongoDB versions. Make sure the version you use is compatible with your ApsaraDB for MongoDB instance. See mongorestore compatibility.
Single-part `.bson` files: Even if a collection has only one .bson file (for example, myDatabase/myCollection/data/myCollection_0_part0.bson), you must still merge or rename it. mongorestore processes .bson files based on their filename prefix.
Empty collections: Downloaded disk backups include empty .bson files that carry the database and collection name. mongorestore processes these files without extra handling.
Sharded cluster backups: Downloaded disk backup files do not include shard routing information. You can restore the data to a single-node, replica set, or sharded cluster instance. If you restore to a sharded cluster instance, perform pre-sharding operations before importing the data.

Prerequisites

Before you begin, ensure that you have:

MongoDB installed on the client that hosts your self-managed database (a local server or an Elastic Compute Service (ECS) instance). The MongoDB version must match your ApsaraDB for MongoDB instance version. See Install MongoDB
A downloaded disk backup file. See Download a backup file

Restore disk backup data

Step 1: Copy the backup file to the target device

Copy the downloaded backup file to the device where mongorestore is installed.

Step 2: Decompress the backup file

Downloaded backup files come in two formats. Decompress the file based on the format you downloaded.

You can select the backup file format using the UseZstd parameter of the CreateDownload API operation.

tar.zst (console download)

The zstd tool must be installed locally, and the decompression directory must exist.

zstd -d -c <tar.zst_backup_package> | tar -xvf - -C <decompression_directory_path>

Example:

mkdir -p ./download_test/test1
zstd -d -c test1.tar.zst | tar -xvf - -C /Users/xxx/Desktop/download_test/test1/

tar.gz (default OpenAPI download format)

The decompression directory must exist.

tar -zxvf <tar.gz_backup_package> -C <decompression_directory_path>

Example:

mkdir -p ./download_test/test1
tar -zxvf testDB.tar.gz -C /Users/xxx/Desktop/download_test/test1/

Step 3: Merge the .bson files

Disk backup archives split collection data into multiple .bson part files (for example, myCollection_0_part0.bson, myCollection_0_part1.bson). Before restoring, merge each set of part files into a single .bson file that mongorestore can read.

Copy the following merge_bson_files.py script to a device with a Python environment.

import os
import struct
import sys
import argparse
import shutil
import re

# Handle strings for Python 2 and 3 compatibility
if sys.version_info[0] >= 3:
    unicode = str


def merge_single_bson_dir(input_dir: str, output_dir: str, namespace: str) -> None:
    """
    Merges .bson files in a single directory.

    Args:
        input_dir (str): The path to the directory that contains the .bson files.
        output_dir (str): The path to the directory for the output file.
        namespace (str): The name of the output file, without the extension.
    """
    try:
        # Get all .bson files that match the ***_*_part*.bson pattern and sort them by name
        files = [f for f in os.listdir(input_dir) if re.match(r'^.+_.+_part\d+\.bson$', f)]
        files.sort()  # Sort by file name

        if not files:
            print("No matching .bson files found in {}".format(input_dir))
            return

        output_file = os.path.join(output_dir, "{}.bson".format(namespace))
        if os.path.exists(output_file):
            print("Output file {} already exists, skipping...".format(output_file))
            return

        print("Merging {} files into {}...".format(len(files), output_file))

        # Stream and merge the files
        total_files = len(files)
        with open(output_file, "wb") as out_f:
            for index, filename in enumerate(files, 1):
                file_path = os.path.join(input_dir, filename)
                print("  Processing file {}/{}: {}...".format(index, total_files, filename))

                try:
                    with open(file_path, "rb") as in_f:
                        while True:
                            # Read the BSON document size
                            size_data = in_f.read(4)
                            if not size_data or len(size_data) < 4:
                                break

                            # Parse the document size (little-endian)
                            doc_size = struct.unpack("<i", size_data)[0]

                            # Reread the full document data
                            in_f.seek(in_f.tell() - 4)
                            doc_data = in_f.read(doc_size)

                            if len(doc_data) != doc_size:
                                break

                            out_f.write(doc_data)
                except Exception as e:
                    print("Error reading {}: {}".format(filename, str(e)))
    except Exception as e:
        print("Error in merge_single_bson_dir: {}".format(str(e)))


def merge_bson_files_recursive(input_root: str, output_root: str = None) -> None:
    """
    Recursively traverses directories and merges all .bson files.

    Args:
        input_root (str): The path to the root directory that contains the .bson files.
        output_root (str): The path to the root directory for the output files. Defaults to input_root.
    """
    if output_root is None:
        output_root = input_root

    # Make sure the output root directory exists
    if not os.path.exists(output_root):
        os.makedirs(output_root)

    print("Scanning directories in {}...".format(input_root))

    # Traverse all items in the input root directory
    for item in os.listdir(input_root):
        item_path = os.path.join(input_root, item)

        # If the item is a directory, process it
        if os.path.isdir(item_path):
            print("Processing directory: {}".format(item))

            # Create the corresponding output directory
            output_item_path = os.path.join(output_root, item)
            if not os.path.exists(output_item_path):
                os.makedirs(output_item_path)

            # Traverse all subdirectories and files in the directory
            for item_d in os.listdir(item_path):
                sub_item_path = os.path.join(item_path, item_d)
                for sub_item in os.listdir(sub_item_path):
                    data_path = os.path.join(sub_item_path, sub_item)
                    # If it is a "data" directory, merge the .bson files in it
                    if os.path.isdir(data_path) and sub_item == "data":
                        # Extract the namespace (parent directory name)
                        namespace = os.path.basename(sub_item_path)
                        merge_single_bson_dir(data_path, output_item_path, namespace)
                    # If it is a .metadata.json file, copy it directly to the corresponding output directory
                    elif sub_item.endswith(".metadata.json"):
                        src_file = os.path.join(sub_item_path, sub_item)
                        target_dir = os.path.join(output_item_path, sub_item)
                        shutil.copy(src_file, target_dir)
                        print("Copied metadata file: {}".format(sub_item))
            print("Finished processing directory: {}".format(item))


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Recursively merge .bson files")
    parser.add_argument("input_root", help="The path to the root directory that contains the .bson files")
    parser.add_argument("-o", "--output_root", help="The path to the root directory for the output files. Defaults to the input root directory")

    args = parser.parse_args()
    merge_bson_files_recursive(args.input_root, args.output_root)

Run the script:

python merge_bson_files.py <input_directory> -o <output_directory>

Step 4: Restore the data with mongorestore

Run mongorestore to load the merged .bson files into your self-managed database. The following examples cover three restore scopes.

Restore a single collection

mongorestore \
    --uri=<mongodb-uri> \
    --db <db> \
    --collection <collection> \
    <xxx.bson>

Example:

mongorestore \
    --uri='mongodb://127.x.x.x:27017/?authSource=admin' \
    --db testDB \
    --collection coll1 \
    ./testDB/coll1.bson

Restore a single database

mongorestore \
    --uri=<mongodb-uri> \
    --db <db> \
    --dir <path/to/bson/dir>

Example:

mongorestore \
    --uri='mongodb://127.x.x.x:27017/?authSource=admin' \
    --db testDB \
    --dir ./testDB

Restore an entire instance

mongorestore \
    --uri=<mongodb-uri> \
    --dir <path/to/bson/dir>

Example:

mongorestore \
    --uri='mongodb://127.x.x.x:27017/?authSource=admin' \
    --dir ./

Parameter descriptions

Parameter	Description
`<mongodb-uri>`	The high availability (HA) address of your self-managed database or ApsaraDB for MongoDB instance. It includes the username, password, server IP address, and port. See the MongoDB connection string documentation.
`<db>`	The name of the database to restore.
`<collection>`	The name of the collection to restore.
`<xxx.bson>`	The `.bson` file for restoring a single collection.
`<path/to/bson/dir>`	The directory that contains the `.bson` files to restore.

FAQ

What if my instance architecture doesn't support downloading backup files?

Use Data Transmission Service (DTS) to migrate the instance data to your self-managed database. See Migrate data from a self-managed MongoDB database or an ApsaraDB for MongoDB instance. Alternatively, back up and restore the instance data directly using mongodump and mongorestore provided by ApsaraDB for MongoDB.