All Products
Search
Document Center

ApsaraDB for MongoDB:Restore downloaded disk backup data to a self-managed database

Last Updated:Oct 31, 2025

This topic describes how to use mongorestore to restore data from disk backup files of an ApsaraDB for MongoDB instance to a self-managed MongoDB database.

Background information

MongoDB provides a set of official backup and restore tools: Mongodump and Mongorestore. Logical backups of ApsaraDB for MongoDB are created using Mongodump. To restore a logical backup to a self-managed database, you can use Mongorestore.

Precautions

  • Because MongoDB is frequently updated, older versions of mongorestore may be incompatible with newer MongoDB versions. Make sure that you use a compatible mongorestore version. For more information, see mongorestore.

  • Even if a collection contains a small amount of data and has only one .bson file, such as myDatabase/myCollection/data/myCollection_0_part0.bson, you must merge or rename the file. This is because mongorestore processes .bson files based on their file name prefix.

  • When you download a disk backup, the process also includes empty collections that retain their schema. This creates an empty .bson file that contains the database and collection name information. mongorestore can process these empty files.

  • For sharded cluster instances, the downloaded disk backup files do not contain shard routing information. You can restore the backup data to a single-node, ReplicaSet, or sharded cluster instance. If you want to restore data to a sharded cluster instance, you must perform pre-sharding operations.

Prerequisites

  • Download and install MongoDB on the client that hosts your self-managed database. The client can be a local server or an Elastic Compute Service (ECS) instance. Make sure that the MongoDB version is the same as that of your ApsaraDB for MongoDB instance. For more information about the installation, see Install MongoDB.

  • Download the logical backup file. For more information, see Download a backup file.

Procedure

  1. Copy the downloaded backup file to the device where the client for your self-managed MongoDB database is located. The mongorestore tool must be installed on this device.

  2. Decompress the backup file.

    Downloaded backup files are available in the `tar.zst` and `tar.gz` formats, which use the zstd and gzip compression algorithms, respectively. You can select the format using the `UseZstd` parameter of the CreateDownload API operation.

    tar.zst (console download)

    zstd -d -c <tar.zst_backup_package> | tar -xvf - -C <decompression_directory_path>

    Make sure that the zstd tool is available locally and the decompression directory exists.

    Example:

    mkdir -p ./download_test/test1
    zstd -d -c test1.tar.zst | tar -xvf - -C /Users/xxx/Desktop/download_test/test1/

    tar.gz (default OpenAPI download format)

    tar -zxvf <tar.gz_backup_package> -C <decompression_directory_path>

    Make sure that the decompression directory exists.

    Example:

    mkdir -p ./download_test/test1
    tar -zxvf testDB.tar.gz -C /Users/xxx/Desktop/download_test/test1/

  3. Merge the .bson files.

    Copy the following `merge_bson_files.py` file to a device that has a Python environment.

    import os
    import struct
    import sys
    import argparse
    import shutil
    import re
    
    # Handle strings for Python 2 and 3 compatibility
    if sys.version_info[0] >= 3:
        unicode = str
    
    
    def merge_single_bson_dir(input_dir: str, output_dir: str, namespace: str) -> None:
        """
        Merges .bson files in a single directory.
    
        Args:
            input_dir (str): The path to the directory that contains the .bson files.
            output_dir (str): The path to the directory for the output file.
            namespace (str): The name of the output file, without the extension.
        """
        try:
            # Get all .bson files that match the ***_*_part*.bson pattern and sort them by name
            files = [f for f in os.listdir(input_dir) if re.match(r'^.+_.+_part\d+\.bson$', f)]
            files.sort()  # Sort by file name
    
            if not files:
                print("No matching .bson files found in {}".format(input_dir))
                return
    
            output_file = os.path.join(output_dir, "{}.bson".format(namespace))
            if os.path.exists(output_file):
                print("Output file {} already exists, skipping...".format(output_file))
                return
    
            print("Merging {} files into {}...".format(len(files), output_file))
    
            # Stream and merge the files
            total_files = len(files)
            with open(output_file, "wb") as out_f:
                for index, filename in enumerate(files, 1):
                    file_path = os.path.join(input_dir, filename)
                    print("  Processing file {}/{}: {}...".format(index, total_files, filename))
    
                    try:
                        with open(file_path, "rb") as in_f:
                            while True:
                                # Read the BSON document size
                                size_data = in_f.read(4)
                                if not size_data or len(size_data) < 4:
                                    break
    
                                # Parse the document size (little-endian)
                                doc_size = struct.unpack("<i", size_data)[0]
    
                                # Reread the full document data
                                in_f.seek(in_f.tell() - 4)
                                doc_data = in_f.read(doc_size)
    
                                if len(doc_data) != doc_size:
                                    break
    
                                out_f.write(doc_data)
                    except Exception as e:
                        print("Error reading {}: {}".format(filename, str(e)))
        except Exception as e:
            print("Error in merge_single_bson_dir: {}".format(str(e)))
    
    
    def merge_bson_files_recursive(input_root: str, output_root: str = None) -> None:
        """
        Recursively traverses directories and merges all .bson files.
    
        Args:
            input_root (str): The path to the root directory that contains the .bson files.
            output_root (str): The path to the root directory for the output files. Defaults to input_root.
        """
        if output_root is None:
            output_root = input_root
    
        # Make sure the output root directory exists
        if not os.path.exists(output_root):
            os.makedirs(output_root)
    
        print("Scanning directories in {}...".format(input_root))
        
        # Traverse all items in the input root directory
        for item in os.listdir(input_root):
            item_path = os.path.join(input_root, item)
            
            # If the item is a directory, process it
            if os.path.isdir(item_path):
                print("Processing directory: {}".format(item))
                
                # Create the corresponding output directory
                output_item_path = os.path.join(output_root, item)
                if not os.path.exists(output_item_path):
                    os.makedirs(output_item_path)
                
                # Traverse all subdirectories and files in the directory
                for item_d in os.listdir(item_path):
                    sub_item_path = os.path.join(item_path, item_d)
                    for sub_item in os.listdir(sub_item_path):
                        data_path = os.path.join(sub_item_path, sub_item)
                        # If it is a "data" directory, merge the .bson files in it
                        if os.path.isdir(data_path) and sub_item == "data":
                            # Extract the namespace (parent directory name)
                            namespace = os.path.basename(sub_item_path)
                            merge_single_bson_dir(data_path, output_item_path, namespace)
                        # If it is a .metadata.json file, copy it directly to the corresponding output directory
                        elif sub_item.endswith(".metadata.json"):
                            src_file = os.path.join(sub_item_path, sub_item)
                            target_dir = os.path.join(output_item_path, sub_item)
                            shutil.copy(src_file, target_dir)
                            print("Copied metadata file: {}".format(sub_item))
                print("Finished processing directory: {}".format(item))
    
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser(description="Recursively merge .bson files")
        parser.add_argument("input_root", help="The path to the root directory that contains the .bson files")
        parser.add_argument("-o", "--output_root", help="The path to the root directory for the output files. Defaults to the input root directory")
    
        args = parser.parse_args()
        merge_bson_files_recursive(args.input_root, args.output_root)

    Run the command:

    python merge_bson_files.py <input_directory> -o <output_directory>
  4. Use the mongorestore tool to restore the backup data to the database instance.

    # Restore a single collection
    mongorestore --uri=<mongodb-uri> --db <db> --collection <collection>  <xxx.bson>
    # Example of restoring a single collection
    mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --db testDB --collection coll1 ./testDB/coll1.bson 
    # Restore a single database
    mongorestore --uri=<mongodb-uri> --db <db> --dir </path/to/bson/dir>
    # Example of restoring a single database
    mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --db testDB --dir ./testDB 
    # Restore an entire instance
    mongorestore --uri=<mongodb-uri>  --dir </path/to/bson/dir>
    # Example of restoring an entire instance
    mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --dir ./

    Parameter descriptions:

    • `<mongodb-uri>`: The high availability (HA) address of the self-managed database or ApsaraDB for MongoDB instance. The URI contains the username, password, server IP address, and port. For more information, see the official documentation.

    • `<db>`: The name of the database to restore.

    • `<collection>`: The name of the collection to restore.

    • `<xxx.bson>`: The backup .bson file for restoring a single collection.

    • `<path/to/bson/dir>`: The directory that contains the .bson files to be restored.

FAQ

How do I restore the instance data to a self-managed database if the instance architecture does not allow I to download backup files?