Use the JindoTable MoveTo command to migrate Apache Hive tables and partitions to OSS-HDFS. The command copies the underlying data and automatically updates the Hive metastore, so queries continue to work without modification after migration.
Prerequisites
Before you begin, ensure that you have:
An E-MapReduce (EMR) cluster running version 3.36.0 or later (excluding 3.39.x), or version 5.2.0 or later (excluding 5.5.x)
A partitioned Hive table with data written to it. The examples in this topic use a table named
test_tablewith a partition keydtOSS-HDFS enabled on your bucket, with access permissions configured. See Connect non-EMR clusters to OSS-HDFS
How it works
MoveTo migrates a Hive table or partition in two steps: it copies the data to the destination path, then updates the partition location in the Hive metastore. Because the metadata update is automatic, downstream queries point to the new location without any manual intervention. JindoTable also provides protective measures to ensure data integrity and security during migration.
To narrow the scope of migration, pass a filter condition with -c. For example, migrate only partitions whose key value exceeds a threshold, or only partitions created before a certain date.
Only one MoveTo process can run on a cluster at a time. If you start a second process while one is running, the request is rejected because the configuration lock is held. Either wait for the running process to finish, or terminate it and start a new one.
Before running a full migration, use the -e (explain) flag to preview which partitions will be moved without transferring any data. This confirms the scope before you commit.
Migrate partitions to OSS-HDFS
Log on to your EMR cluster over SSH. See Log on to a cluster.
Preview the partitions to migrate using explain mode. Replace
" dt > 'v' "with the filter condition that matches your partitions.sudo jindotable -moveTo \ -t tdb.test_table \ -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table \ -c " dt > 'v' " \ -eExpected output:
Found 1 partitions to move: dt=value-2 MoveTo finished for table tdb.test_table to destination oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table with condition " dt > 'v' " (explain only).Run the migration. Remove the
-eflag to move the data.sudo jindotable -moveTo \ -t tdb.test_table \ -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table \ -c " dt > 'v' "Expected output:
Found 1 partitions in total, and all are successfully moved. Successfully moved partitions: dt=value-2 No failed partition. MoveTo finished for table tdb.test_table to destination oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table with condition " dt > 'v' ".Verify the partition location.
sudo hive> desc formatted test_table partition (dt='value-2');In the output, confirm that
Locationpoints to the OSS-HDFS path:Location: oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table/dt=value-2(Optional) Migrate the partition back to Hadoop Distributed File System (HDFS). If the destination directory in HDFS already contains data, the migration fails with
New location is not empty but -overWrite is not enabled. Add-oto overwrite it.sudo jindotable -moveTo \ -t tdb.test_table \ -d hdfs://<hdfs-path>/user/hive/warehouse/tdb.db/test_table \ -c " dt > 'v' " \ -oExpected output:
Found 1 partitions in total, and all are successfully moved. Successfully moved partitions: dt=value-2 No failed partition. MoveTo finished for table tdb.test_table to destination hdfs:///user/hive/warehouse/tdb.db/test_table with condition " dt > 'v' ", overwriting new locations.
Parameters
Both sudo jindo table -moveTo and sudo jindotable -moveTo accept the same parameters.
sudo jindo table -moveTo \
-t <dbName.tableName> \
-d <destination path> \
[-c "<condition>" | -fullTable] \
[-b/-before <before days>] \
[-p/-parallel <parallelism>] \
[-s/-storagePolicy <OSS storage policy>] \
[-o/-overWrite] \
[-r/-removeSource] \
[-skipTrash] \
[-e/-explain] \
[-l/-logDir <log directory>]| Parameter | Required | Description | Notes |
|---|---|---|---|
-t <dbName.tableName> | Yes | The table to migrate, in database.table format. Example: tdb.test_table | Supports both partitioned and non-partitioned tables |
-d <destination path> | Yes | The table-level destination path. Example: oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table | For partitioned tables, the full partition path is <destination path>/<partition key>=<value> |
-c "<condition>" | One of -c or -fullTable is required | A filter condition for selecting partitions. Example: -c " dt > 'v' " | Supports common operators such as >. User-Defined Functions (UDFs) are not yet supported. Cannot be used together with -fullTable |
-fullTable | One of -c or -fullTable is required | Migrates the entire table, including all partitions | Cannot be used together with -c |
-b/-before <before days> | No | Migrates only tables or partitions created at least N days ago. Example: -b 30 | Based on creation time, not last modified time |
-p/-parallel <parallelism> | No | Maximum number of concurrent partition migrations. Default: 1. Example: -p 4 | |
-s/-storagePolicy <OSS storage policy> | No | OSS storage class for the destination. Valid values: Standard (default), IA, Archive, ColdArchive. Example: -s IA | Not applicable for OSS-HDFS destinations. To use ColdArchive, enable Cold Archive on the bucket first |
-o/-overWrite | No | Overwrites the destination path if it already contains data | For partitioned tables, only the destination path of the migrated partition is overwritten |
-r/-removeSource | No | Deletes the source path after a successful migration | For partitioned tables, only the source path of the migrated partition is deleted |
-skipTrash | No | Skips the Trash directory when deleting the source path, so data is immediately removed from the file system | Must be used together with -r/-removeSource |
-e/-explain | No | Explain mode: lists the partitions that would be migrated without moving any data | Use this to verify scope before running a full migration |
-l/-logDir <log directory> | No | Directory for log files. Default: /tmp/<current user>/. Example: -l /var/log/jindo |
To get help from the command line, run:
sudo jindo table -help moveToTroubleshooting
Conflicts found
If the migration fails with a Conflicts found error:
Make sure no other tools—such as DistCp or JindoDistCp—are writing data to the same destination path at the same time.
Delete the conflicting destination directory:
For non-partitioned tables: delete the table-level directory.
For partitioned tables: delete the partition-level directory that conflicts.
Do not delete the source directory.
After resolving the conflict, re-run the MoveTo command.
Destination directory not empty
If you see New location is not empty but -overWrite is not enabled, the destination path already contains data. Add -o to the command to overwrite it.