As HDFS clusters grow, capacity bottlenecks become inevitable. JindoTable's moveTo command migrates Hive tables and partitions from HDFS to Alibaba Cloud Object Storage Service (OSS) or OSS-HDFS — an OSS-compatible HDFS API layer — without restructuring your Hive workloads. Partition filtering lets you migrate exactly the data you need.
Quick reference: common migration scenarios
| Scenario | Key parameters | Example |
|---|---|---|
| Preview partitions before migrating | -e (explain mode) | jindotable -moveTo -t tdb.test_table -d <dest> -c "dt > 'v'" -e |
| Migrate partitions matching a condition | -c "<condition>" | jindotable -moveTo -t tdb.test_table -d <dest> -c "dt > 'v'" |
| Migrate the full table | -fullTable | jindotable -moveTo -t tdb.test_table -d <dest> -fullTable |
| Migrate partitions older than N days | -b <days> | jindotable -moveTo -t tdb.test_table -d <dest> -b 30 |
| Migrate back to HDFS | HDFS path as -d | jindotable -moveTo -t tdb.test_table -d hdfs://<path> -c "dt > 'v'" |
| Delete source data after migration | -r/-removeSource | Add -r only after verifying the migration succeeded |
Prerequisites
Before you begin, make sure you have:
An E-MapReduce (EMR) cluster of V3.42.0 or a later minor version, or V5.8.0 or a later minor version. See Create a cluster.
A partitioned Hive table with data written to it. The examples in this topic use a table named
test_tablein databasetdb, with partition keydtand partition valuevalue.OSS-HDFS enabled with access permissions granted. See Enable OSS-HDFS and grant access permissions.
Usage notes
JindoTable uses lock files to prevent two copy tasks from writing to the same destination directory simultaneously. Add the jindotable.moveto.tablelock.base.dir configuration item to core-site.xml or hdfs-site.xml in the $HADOOP_CONF_DIR directory.
Set the value to an HDFS directory where lock files will be stored. Only the moveTo tool should have access to that directory. If you skip this step, the default directory hdfs:///tmp/jindotable-lock/ is used. If moveTo lacks permissions to access the directory, an error is reported.
The moveTo command
jindotable -moveTo -t <dbName.tableName> -d <destination path> [-c "<condition>" | -fullTable] [-b/-before <before days>] [-p/-parallel <parallelism>] [-s/-storagePolicy <OSS storage policy>] [-o/-overWrite] [-r/-removeSource] [-skipTrash] [-e/-explain] [-q/-queue <yarn queue>] [-w/-workingDir <working directory>] [-l/-logDir <log directory>]To get the full help output, run:
jindotable -help moveToParameters
| Parameter | Description | Required | Default | Example |
|---|---|---|---|---|
-t <dbName.tableName> | The table to move. | Yes | — | -t tdb.test_table |
-d <destination path> | The destination directory. Partition subdirectories are created automatically. | Yes | — | -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table |
-c "<condition>" | -fullTable | A partition filter expression. Basic operators are supported. User-defined functions (UDFs) are not supported. | No | — | -c "dt > 'v'" |
-b/-before <before days> | Migrates only partitions created more than N days ago. Unit: days. | No | — | -b 30 |
-p/-parallel <parallelism> | Maximum parallelism for the task. | No | 1 | -p 4 |
-s/-storagePolicy <OSS storage policy> | Storage class for data files copied to OSS. Valid values: Standard, IA, Archive, ColdArchive. | No | Standard | -s IA |
-o/-overWrite | Overwrites data in the destination directories of migrated partitions. Directories of non-migrated partitions remain unchanged. Use only when the destination can be safely cleared. | No | — | — |
-r/-removeSource | Deletes source data after migration completes. Irreversible. Use only after verifying the migration succeeded. | No | — | — |
-skipTrash | Bypasses the recycle bin when deleting source data. Irreversible — deleted data cannot be recovered. Use only when rollback is not needed. | No | — | — |
-e/-explain | Explain mode: lists the partitions to be migrated without actually migrating them. Run this before every migration to confirm the scope. | No | — | — |
-q/-queue <yarn queue> | YARN queue for distributed copy. | No | — | -q default |
-w/-workingDir | Temporary working directory for distributed copy. | No | — | — |
-l/-logDir <log directory> | Directory for log files. | No | /tmp/<current user>/ | -l /var/log/jindotable |
-r/-removeSource and -skipTrash permanently delete source data and cannot be undone. Always verify the migration result before using them. -o/-overWrite overwrites destination data in migrated partitions; use it only when you are sure the destination can be cleared.
Migrate partitions to OSS-HDFS
Log on to the master node of your cluster via SSH. See Log on to a cluster.
Preview the partitions to migrate (recommended). Run
moveToin explain mode to see which partitions match your filter, without moving any data.jindotable -moveTo -t tdb.test_table -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table -c " dt > 'v' " -eReview the output and confirm the listed partitions are correct before proceeding. This step does not move any data.
Migrate the partitions to OSS-HDFS.
jindotable -moveTo -t tdb.test_table -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table -c " dt > 'v' "Verify the migration in the Hive CLI.
desc formatted test_table partition (dt='value');Check the
Locationfield in the output. It should point to the OSS-HDFS destination path you specified.(Optional) Clean up source data after verifying the migration. Once you have confirmed that the migrated data is accessible and correct, remove the source data from HDFS by adding
-r/-removeSourceto the original command:Warning-r/-removeSourcepermanently deletes source data. Run the verification in step 4 first. To skip the recycle bin entirely, add-skipTrash— deleted data cannot be recovered.jindotable -moveTo -t tdb.test_table -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table -c " dt > 'v' " -r
Migrate partitions back to HDFS
To move data from OSS-HDFS back to HDFS, run moveTo with an HDFS destination path:
jindotable -moveTo -t tdb.test_table -d hdfs://<hdfs-path>/user/hive/warehouse/tdb.db/test_table -c " dt > 'v' "If the command returns No successfully moved partition, the destination HDFS directory is not empty. To overwrite the existing data, add -overWrite:
jindotable -moveTo -t tdb.test_table -d hdfs://<hdfs-path>/user/hive/warehouse/tdb.db/test_table -c " dt > 'v' " -overWriteException handling
JindoTable automatically checks the destination directory before copying to prevent two tasks from writing to the same location simultaneously. If a conflict is detected, the current migration command fails. Stop all other copy tasks, clear the destination directory, then re-run the command.
For a non-partitioned table, clear the table's storage directory.
For a partitioned table, clear only the directories for the specific partitions you are migrating. Directories for other partitions are not affected.
If a migration is interrupted mid-run, no data loss occurs — the copy is incomplete, and the source data and Hive metadata remain unchanged. However, manual intervention may be required to clear any partial state before retrying.
Common causes of interrupted migrations:
The command process is killed before it finishes.
An exception such as memory overflow terminates the process.
What's next
If you are running JindoTable outside an EMR cluster, install and deploy JindoSDK first. See Deploy JindoSDK in an environment other than EMR.