ApsaraDB for MongoDB logical backups are created with mongodump. Use this guide to decompress a downloaded backup archive, merge the split .bson files it contains, and load the data into a self-managed MongoDB database with mongorestore.
Usage notes
-
Version compatibility: Older versions of
mongorestoremay not work with newer MongoDB versions. Make sure the version you use is compatible with your ApsaraDB for MongoDB instance. See mongorestore compatibility. -
Single-part `.bson` files: Even if a collection has only one
.bsonfile (for example,myDatabase/myCollection/data/myCollection_0_part0.bson), you must still merge or rename it.mongorestoreprocesses.bsonfiles based on their filename prefix. -
Empty collections: Downloaded disk backups include empty
.bsonfiles that carry the database and collection name.mongorestoreprocesses these files without extra handling. -
Sharded cluster backups: Downloaded disk backup files do not include shard routing information. You can restore the data to a single-node, replica set, or sharded cluster instance. If you restore to a sharded cluster instance, perform pre-sharding operations before importing the data.
Prerequisites
Before you begin, ensure that you have:
-
MongoDB installed on the client that hosts your self-managed database (a local server or an Elastic Compute Service (ECS) instance). The MongoDB version must match your ApsaraDB for MongoDB instance version. See Install MongoDB
-
A downloaded disk backup file. See Download a backup file
Restore disk backup data
Step 1: Copy the backup file to the target device
Copy the downloaded backup file to the device where mongorestore is installed.
Step 2: Decompress the backup file
Downloaded backup files come in two formats. Decompress the file based on the format you downloaded.
You can select the backup file format using the UseZstd parameter of the CreateDownload API operation.
tar.zst (console download)
The zstd tool must be installed locally, and the decompression directory must exist.
zstd -d -c <tar.zst_backup_package> | tar -xvf - -C <decompression_directory_path>
Example:
mkdir -p ./download_test/test1
zstd -d -c test1.tar.zst | tar -xvf - -C /Users/xxx/Desktop/download_test/test1/
tar.gz (default OpenAPI download format)
The decompression directory must exist.
tar -zxvf <tar.gz_backup_package> -C <decompression_directory_path>
Example:
mkdir -p ./download_test/test1
tar -zxvf testDB.tar.gz -C /Users/xxx/Desktop/download_test/test1/
Step 3: Merge the .bson files
Disk backup archives split collection data into multiple .bson part files (for example, myCollection_0_part0.bson, myCollection_0_part1.bson). Before restoring, merge each set of part files into a single .bson file that mongorestore can read.
Copy the following merge_bson_files.py script to a device with a Python environment.
import os
import struct
import sys
import argparse
import shutil
import re
# Handle strings for Python 2 and 3 compatibility
if sys.version_info[0] >= 3:
unicode = str
def merge_single_bson_dir(input_dir: str, output_dir: str, namespace: str) -> None:
"""
Merges .bson files in a single directory.
Args:
input_dir (str): The path to the directory that contains the .bson files.
output_dir (str): The path to the directory for the output file.
namespace (str): The name of the output file, without the extension.
"""
try:
# Get all .bson files that match the ***_*_part*.bson pattern and sort them by name
files = [f for f in os.listdir(input_dir) if re.match(r'^.+_.+_part\d+\.bson$', f)]
files.sort() # Sort by file name
if not files:
print("No matching .bson files found in {}".format(input_dir))
return
output_file = os.path.join(output_dir, "{}.bson".format(namespace))
if os.path.exists(output_file):
print("Output file {} already exists, skipping...".format(output_file))
return
print("Merging {} files into {}...".format(len(files), output_file))
# Stream and merge the files
total_files = len(files)
with open(output_file, "wb") as out_f:
for index, filename in enumerate(files, 1):
file_path = os.path.join(input_dir, filename)
print(" Processing file {}/{}: {}...".format(index, total_files, filename))
try:
with open(file_path, "rb") as in_f:
while True:
# Read the BSON document size
size_data = in_f.read(4)
if not size_data or len(size_data) < 4:
break
# Parse the document size (little-endian)
doc_size = struct.unpack("<i", size_data)[0]
# Reread the full document data
in_f.seek(in_f.tell() - 4)
doc_data = in_f.read(doc_size)
if len(doc_data) != doc_size:
break
out_f.write(doc_data)
except Exception as e:
print("Error reading {}: {}".format(filename, str(e)))
except Exception as e:
print("Error in merge_single_bson_dir: {}".format(str(e)))
def merge_bson_files_recursive(input_root: str, output_root: str = None) -> None:
"""
Recursively traverses directories and merges all .bson files.
Args:
input_root (str): The path to the root directory that contains the .bson files.
output_root (str): The path to the root directory for the output files. Defaults to input_root.
"""
if output_root is None:
output_root = input_root
# Make sure the output root directory exists
if not os.path.exists(output_root):
os.makedirs(output_root)
print("Scanning directories in {}...".format(input_root))
# Traverse all items in the input root directory
for item in os.listdir(input_root):
item_path = os.path.join(input_root, item)
# If the item is a directory, process it
if os.path.isdir(item_path):
print("Processing directory: {}".format(item))
# Create the corresponding output directory
output_item_path = os.path.join(output_root, item)
if not os.path.exists(output_item_path):
os.makedirs(output_item_path)
# Traverse all subdirectories and files in the directory
for item_d in os.listdir(item_path):
sub_item_path = os.path.join(item_path, item_d)
for sub_item in os.listdir(sub_item_path):
data_path = os.path.join(sub_item_path, sub_item)
# If it is a "data" directory, merge the .bson files in it
if os.path.isdir(data_path) and sub_item == "data":
# Extract the namespace (parent directory name)
namespace = os.path.basename(sub_item_path)
merge_single_bson_dir(data_path, output_item_path, namespace)
# If it is a .metadata.json file, copy it directly to the corresponding output directory
elif sub_item.endswith(".metadata.json"):
src_file = os.path.join(sub_item_path, sub_item)
target_dir = os.path.join(output_item_path, sub_item)
shutil.copy(src_file, target_dir)
print("Copied metadata file: {}".format(sub_item))
print("Finished processing directory: {}".format(item))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Recursively merge .bson files")
parser.add_argument("input_root", help="The path to the root directory that contains the .bson files")
parser.add_argument("-o", "--output_root", help="The path to the root directory for the output files. Defaults to the input root directory")
args = parser.parse_args()
merge_bson_files_recursive(args.input_root, args.output_root)
Run the script:
python merge_bson_files.py <input_directory> -o <output_directory>
Step 4: Restore the data with mongorestore
Run mongorestore to load the merged .bson files into your self-managed database. The following examples cover three restore scopes.
Restore a single collection
mongorestore \
--uri=<mongodb-uri> \
--db <db> \
--collection <collection> \
<xxx.bson>
Example:
mongorestore \
--uri='mongodb://127.x.x.x:27017/?authSource=admin' \
--db testDB \
--collection coll1 \
./testDB/coll1.bson
Restore a single database
mongorestore \
--uri=<mongodb-uri> \
--db <db> \
--dir <path/to/bson/dir>
Example:
mongorestore \
--uri='mongodb://127.x.x.x:27017/?authSource=admin' \
--db testDB \
--dir ./testDB
Restore an entire instance
mongorestore \
--uri=<mongodb-uri> \
--dir <path/to/bson/dir>
Example:
mongorestore \
--uri='mongodb://127.x.x.x:27017/?authSource=admin' \
--dir ./
Parameter descriptions
| Parameter | Description |
|---|---|
<mongodb-uri> |
The high availability (HA) address of your self-managed database or ApsaraDB for MongoDB instance. It includes the username, password, server IP address, and port. See the MongoDB connection string documentation. |
<db> |
The name of the database to restore. |
<collection> |
The name of the collection to restore. |
<xxx.bson> |
The .bson file for restoring a single collection. |
<path/to/bson/dir> |
The directory that contains the .bson files to restore. |
FAQ
What if my instance architecture doesn't support downloading backup files?
Use Data Transmission Service (DTS) to migrate the instance data to your self-managed database. See Migrate data from a self-managed MongoDB database or an ApsaraDB for MongoDB instance. Alternatively, back up and restore the instance data directly using mongodump and mongorestore provided by ApsaraDB for MongoDB.