All Products
Search
Document Center

E-MapReduce:Use JindoTable to migrate Hive tables and partitions to OSS or OSS-HDFS

Last Updated:Mar 25, 2026

As HDFS clusters grow, capacity bottlenecks become inevitable. JindoTable's moveTo command migrates Hive tables and partitions from HDFS to Alibaba Cloud Object Storage Service (OSS) or OSS-HDFS — an OSS-compatible HDFS API layer — without restructuring your Hive workloads. Partition filtering lets you migrate exactly the data you need.

Quick reference: common migration scenarios

ScenarioKey parametersExample
Preview partitions before migrating-e (explain mode)jindotable -moveTo -t tdb.test_table -d <dest> -c "dt > 'v'" -e
Migrate partitions matching a condition-c "<condition>"jindotable -moveTo -t tdb.test_table -d <dest> -c "dt > 'v'"
Migrate the full table-fullTablejindotable -moveTo -t tdb.test_table -d <dest> -fullTable
Migrate partitions older than N days-b <days>jindotable -moveTo -t tdb.test_table -d <dest> -b 30
Migrate back to HDFSHDFS path as -djindotable -moveTo -t tdb.test_table -d hdfs://<path> -c "dt > 'v'"
Delete source data after migration-r/-removeSourceAdd -r only after verifying the migration succeeded

Prerequisites

Before you begin, make sure you have:

  • An E-MapReduce (EMR) cluster of V3.42.0 or a later minor version, or V5.8.0 or a later minor version. See Create a cluster.

  • A partitioned Hive table with data written to it. The examples in this topic use a table named test_table in database tdb, with partition key dt and partition value value.

  • OSS-HDFS enabled with access permissions granted. See Enable OSS-HDFS and grant access permissions.

Usage notes

JindoTable uses lock files to prevent two copy tasks from writing to the same destination directory simultaneously. Add the jindotable.moveto.tablelock.base.dir configuration item to core-site.xml or hdfs-site.xml in the $HADOOP_CONF_DIR directory.

Set the value to an HDFS directory where lock files will be stored. Only the moveTo tool should have access to that directory. If you skip this step, the default directory hdfs:///tmp/jindotable-lock/ is used. If moveTo lacks permissions to access the directory, an error is reported.

The moveTo command

jindotable -moveTo -t <dbName.tableName> -d <destination path> [-c "<condition>" | -fullTable] [-b/-before <before days>] [-p/-parallel <parallelism>] [-s/-storagePolicy <OSS storage policy>] [-o/-overWrite] [-r/-removeSource] [-skipTrash] [-e/-explain] [-q/-queue <yarn queue>] [-w/-workingDir <working directory>] [-l/-logDir <log directory>]

To get the full help output, run:

jindotable -help moveTo

Parameters

ParameterDescriptionRequiredDefaultExample
-t <dbName.tableName>The table to move.Yes-t tdb.test_table
-d <destination path>The destination directory. Partition subdirectories are created automatically.Yes-d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table
-c "<condition>" | -fullTableA partition filter expression. Basic operators are supported. User-defined functions (UDFs) are not supported.No-c "dt > 'v'"
-b/-before <before days>Migrates only partitions created more than N days ago. Unit: days.No-b 30
-p/-parallel <parallelism>Maximum parallelism for the task.No1-p 4
-s/-storagePolicy <OSS storage policy>Storage class for data files copied to OSS. Valid values: Standard, IA, Archive, ColdArchive.NoStandard-s IA
-o/-overWriteOverwrites data in the destination directories of migrated partitions. Directories of non-migrated partitions remain unchanged. Use only when the destination can be safely cleared.No
-r/-removeSourceDeletes source data after migration completes. Irreversible. Use only after verifying the migration succeeded.No
-skipTrashBypasses the recycle bin when deleting source data. Irreversible — deleted data cannot be recovered. Use only when rollback is not needed.No
-e/-explainExplain mode: lists the partitions to be migrated without actually migrating them. Run this before every migration to confirm the scope.No
-q/-queue <yarn queue>YARN queue for distributed copy.No-q default
-w/-workingDirTemporary working directory for distributed copy.No
-l/-logDir <log directory>Directory for log files.No/tmp/<current user>/-l /var/log/jindotable
Warning

-r/-removeSource and -skipTrash permanently delete source data and cannot be undone. Always verify the migration result before using them. -o/-overWrite overwrites destination data in migrated partitions; use it only when you are sure the destination can be cleared.

Migrate partitions to OSS-HDFS

  1. Log on to the master node of your cluster via SSH. See Log on to a cluster.

  2. Preview the partitions to migrate (recommended). Run moveTo in explain mode to see which partitions match your filter, without moving any data.

    jindotable -moveTo -t tdb.test_table -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table -c " dt > 'v' " -e

    Review the output and confirm the listed partitions are correct before proceeding. This step does not move any data.

  3. Migrate the partitions to OSS-HDFS.

    jindotable -moveTo -t tdb.test_table -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table -c " dt > 'v' "
  4. Verify the migration in the Hive CLI.

    desc formatted test_table partition (dt='value');

    Check the Location field in the output. It should point to the OSS-HDFS destination path you specified.

  5. (Optional) Clean up source data after verifying the migration. Once you have confirmed that the migrated data is accessible and correct, remove the source data from HDFS by adding -r/-removeSource to the original command:

    Warning

    -r/-removeSource permanently deletes source data. Run the verification in step 4 first. To skip the recycle bin entirely, add -skipTrash — deleted data cannot be recovered.

    jindotable -moveTo -t tdb.test_table -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table -c " dt > 'v' " -r

Migrate partitions back to HDFS

To move data from OSS-HDFS back to HDFS, run moveTo with an HDFS destination path:

jindotable -moveTo -t tdb.test_table -d hdfs://<hdfs-path>/user/hive/warehouse/tdb.db/test_table -c " dt > 'v' "

If the command returns No successfully moved partition, the destination HDFS directory is not empty. To overwrite the existing data, add -overWrite:

jindotable -moveTo -t tdb.test_table -d hdfs://<hdfs-path>/user/hive/warehouse/tdb.db/test_table -c " dt > 'v' " -overWrite

Exception handling

JindoTable automatically checks the destination directory before copying to prevent two tasks from writing to the same location simultaneously. If a conflict is detected, the current migration command fails. Stop all other copy tasks, clear the destination directory, then re-run the command.

  • For a non-partitioned table, clear the table's storage directory.

  • For a partitioned table, clear only the directories for the specific partitions you are migrating. Directories for other partitions are not affected.

If a migration is interrupted mid-run, no data loss occurs — the copy is incomplete, and the source data and Hive metadata remain unchanged. However, manual intervention may be required to clear any partial state before retrying.

Common causes of interrupted migrations:

  • The command process is killed before it finishes.

  • An exception such as memory overflow terminates the process.

What's next

If you are running JindoTable outside an EMR cluster, install and deploy JindoSDK first. See Deploy JindoSDK in an environment other than EMR.