When retiring a semi-hosted JindoFS cluster, you need to move its metadata and data files to a new storage layer without losing incremental writes that occur during the transition. Use jindo distjob to run a two-phase migration: a full data migration followed by an optional incremental sync to keep OSS-HDFS up to date while the source cluster remains in service.
Prerequisites
Before you begin, ensure that you have:
OSS-HDFS enabled on the bucket that stores the semi-hosted JindoFS cluster data
Audit logging enabled on the semi-hosted JindoFS cluster
JindoSDK installed and configured. For installation instructions, see JindoSDK download
Step 1: Migrate full data
Full data migration copies directory metadata from the semi-hosted JindoFS cluster to OSS-HDFS. Each run migrates one source directory to one destination directory, and the destination must be a first-level subdirectory in OSS-HDFS.
Run the following command:
jindo distjob -migrateImport -srcPath <srcPath> -destPath <destPath> -backendLoc <backendLoc>Parameters
| Parameter | Description | Notes |
|---|---|---|
-srcPath | Source directory in the semi-hosted JindoFS cluster | Use the jfs:// scheme, for example, jfs://mycluster/foo |
-destPath | Destination directory in OSS-HDFS | Must be a first-level subdirectory; use the oss:// scheme, for example, oss://examplebucket/bar/ |
-backendLoc | OSS path that stores the raw data of the semi-hosted JindoFS cluster | This is the data backend for the JindoFS cluster, not the migration destination |
Example
Migrate the jfs://mycluster/foo directory to the bar subdirectory in examplebucket:
jindo distjob -migrateImport -srcPath jfs://mycluster/foo -destPath oss://examplebucket/bar/Step 2: Migrate incremental data (optional)
If the source JindoFS cluster continues to receive writes after full migration completes, run incremental migration to sync those changes to OSS-HDFS. Incremental migration converts audit logs into change logs, then applies those changes to the destination.
2.1 Convert audit logs to change logs
Run the following command to convert the cluster's audit logs into change logs that jindo distjob can process:
jindo distjob -mkchangelog -auditLogDir <auditLogDir> -changeLogDir <changeLogDir> -startTime <startTime>Parameters
| Parameter | Description | Notes |
|---|---|---|
-auditLogDir | Path that stores the audit logs of the semi-hosted JindoFS cluster | |
-changeLogDir | Path where the generated change logs are stored | Choose a path accessible to both the source cluster and OSS-HDFS |
-startTime | Start time for converting audit logs | ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ. Only audit logs from this time onward are converted |
Example
Convert audit logs stored at oss://examplebucket/sysinfo/auditlog, starting from June 1, 2022, and write the change logs to oss://examplebucket/sysinfo/changelog:
jindo distjob -mkchangelog \
-auditLogDir oss://examplebucket/sysinfo/auditlog \
-changeLogDir oss://examplebucket/sysinfo/changelog \
-startTime 2022-06-01T12:00:00Z2.2 Apply incremental changes
After generating the change logs, apply the incremental changes to OSS-HDFS:
jindo distjob -migrateImport -srcPath <srcPath> -destPath <destPath> -changeLogDir <changeLogDir> -backendLoc <backendLoc> -updateParameters
| Parameter | Description | Notes |
|---|---|---|
-srcPath | Source directory in the semi-hosted JindoFS cluster | |
-destPath | Destination directory in OSS-HDFS | |
-changeLogDir | Path that stores the generated change logs | Must match the -changeLogDir path used in step 2.1 |
-backendLoc | OSS path that stores the raw data of the semi-hosted JindoFS cluster | |
-update | Enables incremental migration mode | Only copies files that are new or modified since the last run; unchanged files are skipped |
Example
Apply incremental changes from jfs://mycluster/foo to oss://examplebucket/bar/, using change logs at oss://logbucket/logdir/:
jindo distjob -migrateImport \
-srcPath jfs://mycluster/foo \
-destPath oss://examplebucket/bar/ \
-changeLogDir oss://logbucket/logdir/ \
-backendLoc oss://examplebucket/jfsdataDir \
-update2.3 Run multiple incremental migrations (optional)
To sync a series of changes over time, repeat steps 2.1 and 2.2 with adjusted -startTime values to cover successive time ranges.