This topic describes how to use mongorestore to restore data from disk backup files of an ApsaraDB for MongoDB instance to a self-managed MongoDB database.
Background information
MongoDB provides a set of official backup and restore tools: Mongodump and Mongorestore. Logical backups of ApsaraDB for MongoDB are created using Mongodump. To restore a logical backup to a self-managed database, you can use Mongorestore.
Precautions
Because MongoDB is frequently updated, older versions of mongorestore may be incompatible with newer MongoDB versions. Make sure that you use a compatible mongorestore version. For more information, see mongorestore.
Even if a collection contains a small amount of data and has only one .bson file, such as
myDatabase/myCollection/data/myCollection_0_part0.bson, you must merge or rename the file. This is becausemongorestoreprocesses .bson files based on their file name prefix.When you download a disk backup, the process also includes empty collections that retain their schema. This creates an empty .bson file that contains the database and collection name information. mongorestore can process these empty files.
For sharded cluster instances, the downloaded disk backup files do not contain shard routing information. You can restore the backup data to a single-node, ReplicaSet, or sharded cluster instance. If you want to restore data to a sharded cluster instance, you must perform pre-sharding operations.
Prerequisites
Download and install MongoDB on the client that hosts your self-managed database. The client can be a local server or an Elastic Compute Service (ECS) instance. Make sure that the MongoDB version is the same as that of your ApsaraDB for MongoDB instance. For more information about the installation, see Install MongoDB.
Download the logical backup file. For more information, see Download a backup file.
Procedure
Copy the downloaded backup file to the device where the client for your self-managed MongoDB database is located. The mongorestore tool must be installed on this device.
Decompress the backup file.
Downloaded backup files are available in the `tar.zst` and `tar.gz` formats, which use the zstd and gzip compression algorithms, respectively. You can select the format using the `UseZstd` parameter of the CreateDownload API operation.
tar.zst (console download)
zstd -d -c <tar.zst_backup_package> | tar -xvf - -C <decompression_directory_path>Make sure that the zstd tool is available locally and the decompression directory exists.
Example:
mkdir -p ./download_test/test1 zstd -d -c test1.tar.zst | tar -xvf - -C /Users/xxx/Desktop/download_test/test1/tar.gz (default OpenAPI download format)
tar -zxvf <tar.gz_backup_package> -C <decompression_directory_path>Make sure that the decompression directory exists.
Example:
mkdir -p ./download_test/test1 tar -zxvf testDB.tar.gz -C /Users/xxx/Desktop/download_test/test1/Merge the .bson files.
Copy the following `merge_bson_files.py` file to a device that has a Python environment.
import os import struct import sys import argparse import shutil import re # Handle strings for Python 2 and 3 compatibility if sys.version_info[0] >= 3: unicode = str def merge_single_bson_dir(input_dir: str, output_dir: str, namespace: str) -> None: """ Merges .bson files in a single directory. Args: input_dir (str): The path to the directory that contains the .bson files. output_dir (str): The path to the directory for the output file. namespace (str): The name of the output file, without the extension. """ try: # Get all .bson files that match the ***_*_part*.bson pattern and sort them by name files = [f for f in os.listdir(input_dir) if re.match(r'^.+_.+_part\d+\.bson$', f)] files.sort() # Sort by file name if not files: print("No matching .bson files found in {}".format(input_dir)) return output_file = os.path.join(output_dir, "{}.bson".format(namespace)) if os.path.exists(output_file): print("Output file {} already exists, skipping...".format(output_file)) return print("Merging {} files into {}...".format(len(files), output_file)) # Stream and merge the files total_files = len(files) with open(output_file, "wb") as out_f: for index, filename in enumerate(files, 1): file_path = os.path.join(input_dir, filename) print(" Processing file {}/{}: {}...".format(index, total_files, filename)) try: with open(file_path, "rb") as in_f: while True: # Read the BSON document size size_data = in_f.read(4) if not size_data or len(size_data) < 4: break # Parse the document size (little-endian) doc_size = struct.unpack("<i", size_data)[0] # Reread the full document data in_f.seek(in_f.tell() - 4) doc_data = in_f.read(doc_size) if len(doc_data) != doc_size: break out_f.write(doc_data) except Exception as e: print("Error reading {}: {}".format(filename, str(e))) except Exception as e: print("Error in merge_single_bson_dir: {}".format(str(e))) def merge_bson_files_recursive(input_root: str, output_root: str = None) -> None: """ Recursively traverses directories and merges all .bson files. Args: input_root (str): The path to the root directory that contains the .bson files. output_root (str): The path to the root directory for the output files. Defaults to input_root. """ if output_root is None: output_root = input_root # Make sure the output root directory exists if not os.path.exists(output_root): os.makedirs(output_root) print("Scanning directories in {}...".format(input_root)) # Traverse all items in the input root directory for item in os.listdir(input_root): item_path = os.path.join(input_root, item) # If the item is a directory, process it if os.path.isdir(item_path): print("Processing directory: {}".format(item)) # Create the corresponding output directory output_item_path = os.path.join(output_root, item) if not os.path.exists(output_item_path): os.makedirs(output_item_path) # Traverse all subdirectories and files in the directory for item_d in os.listdir(item_path): sub_item_path = os.path.join(item_path, item_d) for sub_item in os.listdir(sub_item_path): data_path = os.path.join(sub_item_path, sub_item) # If it is a "data" directory, merge the .bson files in it if os.path.isdir(data_path) and sub_item == "data": # Extract the namespace (parent directory name) namespace = os.path.basename(sub_item_path) merge_single_bson_dir(data_path, output_item_path, namespace) # If it is a .metadata.json file, copy it directly to the corresponding output directory elif sub_item.endswith(".metadata.json"): src_file = os.path.join(sub_item_path, sub_item) target_dir = os.path.join(output_item_path, sub_item) shutil.copy(src_file, target_dir) print("Copied metadata file: {}".format(sub_item)) print("Finished processing directory: {}".format(item)) if __name__ == "__main__": parser = argparse.ArgumentParser(description="Recursively merge .bson files") parser.add_argument("input_root", help="The path to the root directory that contains the .bson files") parser.add_argument("-o", "--output_root", help="The path to the root directory for the output files. Defaults to the input root directory") args = parser.parse_args() merge_bson_files_recursive(args.input_root, args.output_root)Run the command:
python merge_bson_files.py <input_directory> -o <output_directory>Use the mongorestore tool to restore the backup data to the database instance.
# Restore a single collection mongorestore --uri=<mongodb-uri> --db <db> --collection <collection> <xxx.bson> # Example of restoring a single collection mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --db testDB --collection coll1 ./testDB/coll1.bson # Restore a single database mongorestore --uri=<mongodb-uri> --db <db> --dir </path/to/bson/dir> # Example of restoring a single database mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --db testDB --dir ./testDB # Restore an entire instance mongorestore --uri=<mongodb-uri> --dir </path/to/bson/dir> # Example of restoring an entire instance mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --dir ./Parameter descriptions:
`<mongodb-uri>`: The high availability (HA) address of the self-managed database or ApsaraDB for MongoDB instance. The URI contains the username, password, server IP address, and port. For more information, see the official documentation.
`<db>`: The name of the database to restore.
`<collection>`: The name of the collection to restore.
`<xxx.bson>`: The backup .bson file for restoring a single collection.
`<path/to/bson/dir>`: The directory that contains the .bson files to be restored.