ダウンロードしたディスクバックアップデータを自己管理 MongoDB データベースに復元する - ApsaraDB for MongoDB

このトピックでは、mongorestore を使用して ApsaraDB for MongoDB インスタンスのディスクバックアップファイルから自己管理 MongoDB データベースにデータを復元する方法について説明します。

背景情報

MongoDB は、公式のバックアップおよび復元ツールセットである Mongodump と Mongorestore を提供しています。ApsaraDB for MongoDB の論理バックアップは Mongodump を使用して作成されます。論理バックアップを自己管理データベースに復元するには、Mongorestore を使用できます。

注意事項

MongoDB は頻繁に更新されるため、古いバージョンの mongorestore は新しい MongoDB バージョンと互換性がない場合があります。互換性のある mongorestore バージョンを使用していることを確認してください。詳細については、「mongorestore」をご参照ください。
コレクションに含まれるデータ量が少なく、myDatabase/myCollection/data/myCollection_0_part0.bson のように .bson ファイルが 1 つしかない場合でも、ファイルをマージまたは名前変更する必要があります。これは、mongorestore がファイル名のプレフィックスに基づいて .bson ファイルを処理するためです。
ディスクバックアップをダウンロードすると、プロセスにはスキーマを保持する空のコレクションも含まれます。これにより、データベースとコレクション名の情報を含む空の .bson ファイルが作成されます。mongorestore はこれらの空のファイルを処理できます。
シャードクラスターインスタンスの場合、ダウンロードされたディスクバックアップファイルにはシャードルーティング情報が含まれていません。バックアップデータは、単一ノード、ReplicaSet、またはシャードクラスターインスタンスに復元できます。シャードクラスターインスタンスにデータを復元する場合は、pre-sharding 操作を実行する必要があります。

前提条件

自己管理データベースをホストするクライアントに MongoDB をダウンロードしてインストールします。クライアントは、ローカルサーバーまたは Elastic Compute Service (ECS) インスタンスにすることができます。MongoDB のバージョンが ApsaraDB for MongoDB インスタンスのバージョンと同じであることを確認してください。インストールの詳細については、「Install MongoDB」をご参照ください。
論理バックアップファイルをダウンロードします。詳細については、「バックアップファイルのダウンロード」をご参照ください。

手順

ダウンロードしたバックアップファイルを、自己管理 MongoDB データベースのクライアントがあるデバイスにコピーします。このデバイスには mongorestore ツールがインストールされている必要があります。
バックアップファイルを展開します。
ダウンロードされたバックアップファイルは、`tar.zst` と `tar.gz` のフォーマットで利用でき、それぞれ zstd と gzip 圧縮アルゴリズムを使用します。CreateDownload API 操作の `UseZstd` パラメーターを使用してフォーマットを選択できます。
tar.zst (コンソールダウンロード)
```
zstd -d -c <tar.zst_backup_package> | tar -xvf - -C <decompression_directory_path>
```
zstd ツールがローカルで利用可能であり、展開ディレクトリが存在することを確認してください。
例:
```
mkdir -p ./download_test/test1
zstd -d -c test1.tar.zst | tar -xvf - -C /Users/xxx/Desktop/download_test/test1/
```
tar.gz (デフォルトの OpenAPI ダウンロードフォーマット)
```
tar -zxvf <tar.gz_backup_package> -C <decompression_directory_path>
```
展開ディレクトリが存在することを確認してください。
例:
```
mkdir -p ./download_test/test1
tar -zxvf testDB.tar.gz -C /Users/xxx/Desktop/download_test/test1/
```

.bson ファイルをマージします。

次の `merge_bson_files.py` ファイルを Python 環境のあるデバイスにコピーします。

import os
import struct
import sys
import argparse
import shutil
import re

# Python 2 と 3 の互換性のために文字列を処理します
if sys.version_info[0] >= 3:
    unicode = str


def merge_single_bson_dir(input_dir: str, output_dir: str, namespace: str) -> None:
    """
    単一のディレクトリ内の .bson ファイルをマージします。

    Args:
        input_dir (str): .bson ファイルを含むディレクトリへのパス。
        output_dir (str): 出力ファイルのディレクトリへのパス。
        namespace (str): 拡張子なしの出力ファイルの名前。
    """
    try:
        # ***_*_part*.bson パターンに一致するすべての .bson ファイルを取得し、名前でソートします
        files = [f for f in os.listdir(input_dir) if re.match(r'^.+_.+_part\d+\.bson$', f)]
        files.sort()  # ファイル名でソート

        if not files:
            print("No matching .bson files found in {}".format(input_dir))
            return

        output_file = os.path.join(output_dir, "{}.bson".format(namespace))
        if os.path.exists(output_file):
            print("Output file {} already exists, skipping...".format(output_file))
            return

        print("Merging {} files into {}...".format(len(files), output_file))

        # ファイルをストリームしてマージします
        total_files = len(files)
        with open(output_file, "wb") as out_f:
            for index, filename in enumerate(files, 1):
                file_path = os.path.join(input_dir, filename)
                print("  Processing file {}/{}: {}...".format(index, total_files, filename))

                try:
                    with open(file_path, "rb") as in_f:
                        while True:
                            # BSON ドキュメントのサイズを読み取ります
                            size_data = in_f.read(4)
                            if not size_data or len(size_data) < 4:
                                break

                            # ドキュメントサイズを解析します (リトルエンディアン)
                            doc_size = struct.unpack("<i", size_data)[0]

                            # 完全なドキュメントデータを再読み込みします
                            in_f.seek(in_f.tell() - 4)
                            doc_data = in_f.read(doc_size)

                            if len(doc_data) != doc_size:
                                break

                            out_f.write(doc_data)
                except Exception as e:
                    print("Error reading {}: {}".format(filename, str(e)))
    except Exception as e:
        print("Error in merge_single_bson_dir: {}".format(str(e)))


def merge_bson_files_recursive(input_root: str, output_root: str = None) -> None:
    """
    ディレクトリを再帰的に走査し、すべての .bson ファイルをマージします。

    Args:
        input_root (str): .bson ファイルを含むルートディレクトリへのパス。
        output_root (str): 出力ファイルのルートディレクトリへのパス。デフォルトは input_root です。
    """
    if output_root is None:
        output_root = input_root

    # 出力ルートディレクトリが存在することを確認します
    if not os.path.exists(output_root):
        os.makedirs(output_root)

    print("Scanning directories in {}...".format(input_root))
    
    # 入力ルートディレクトリ内のすべての項目を走査します
    for item in os.listdir(input_root):
        item_path = os.path.join(input_root, item)
        
        # 項目がディレクトリの場合は処理します
        if os.path.isdir(item_path):
            print("Processing directory: {}".format(item))
            
            # 対応する出力ディレクトリを作成します
            output_item_path = os.path.join(output_root, item)
            if not os.path.exists(output_item_path):
                os.makedirs(output_item_path)
            
            # ディレクトリ内のすべてのサブディレクトリとファイルを走査します
            for item_d in os.listdir(item_path):
                sub_item_path = os.path.join(item_path, item_d)
                for sub_item in os.listdir(sub_item_path):
                    data_path = os.path.join(sub_item_path, sub_item)
                    # "data" ディレクトリの場合は、その中の .bson ファイルをマージします
                    if os.path.isdir(data_path) and sub_item == "data":
                        # 名前空間 (親ディレクトリ名) を抽出します
                        namespace = os.path.basename(sub_item_path)
                        merge_single_bson_dir(data_path, output_item_path, namespace)
                    # .metadata.json ファイルの場合は、対応する出力ディレクトリに直接コピーします
                    elif sub_item.endswith(".metadata.json"):
                        src_file = os.path.join(sub_item_path, sub_item)
                        target_dir = os.path.join(output_item_path, sub_item)
                        shutil.copy(src_file, target_dir)
                        print("Copied metadata file: {}".format(sub_item))
            print("Finished processing directory: {}".format(item))


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Recursively merge .bson files")
    parser.add_argument("input_root", help="The path to the root directory that contains the .bson files")
    parser.add_argument("-o", "--output_root", help="The path to the root directory for the output files. Defaults to the input root directory")

    args = parser.parse_args()
    merge_bson_files_recursive(args.input_root, args.output_root)

コマンドを実行します:

python merge_bson_files.py <input_directory> -o <output_directory>

mongorestore ツールを使用して、バックアップデータをデータベースインスタンスに復元します。
```
# 単一のコレクションを復元
mongorestore --uri=<mongodb-uri> --db <db> --collection <collection>  <xxx.bson>
# 単一のコレクションを復元する例
mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --db testDB --collection coll1 ./testDB/coll1.bson 
# 単一のデータベースを復元
mongorestore --uri=<mongodb-uri> --db <db> --dir </path/to/bson/dir>
# 単一のデータベースを復元する例
mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --db testDB --dir ./testDB 
# インスタンス全体を復元
mongorestore --uri=<mongodb-uri>  --dir </path/to/bson/dir>
# インスタンス全体を復元する例
mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --dir ./
```
パラメーターの説明:
- `<mongodb-uri>`: 自己管理データベースまたは ApsaraDB for MongoDB インスタンスの高可用性 (HA) アドレス。URI には、ユーザー名、パスワード、サーバー IP アドレス、およびポートが含まれます。詳細については、「公式ドキュメント」をご参照ください。
- `<db>`: 復元するデータベースの名前。
- `<collection>`: 復元するコレクションの名前。
- `<xxx.bson>`: 単一のコレクションを復元するためのバックアップ .bson ファイル。
- `<path/to/bson/dir>`: 復元する .bson ファイルを含むディレクトリ。

よくある質問

インスタンスアーキテクチャがバックアップファイルのダウンロードを許可していない場合、インスタンスデータを自己管理データベースに復元するにはどうすればよいですか?

DTS を使用して、インスタンスデータを自己管理データベースに移行できます。詳細については、「自己管理 MongoDB データベースまたは ApsaraDB for MongoDB インスタンスからのデータ移行」をご参照ください。
ApsaraDB for MongoDB が提供する mongodump と mongorestore を使用してインスタンスをバックアップおよび復元できます。

背景情報

注意事項

前提条件

手順

tar.zst (コンソールダウンロード)

tar.gz (デフォルトの OpenAPI ダウンロードフォーマット)

よくある質問