JindoTable allows you to run the MoveTo command to migrate data in tables or partitions. This topic describes how to use the MoveTo command.
- Java Development Kit (JDK) 8 is installed on your computer.
- An E-MapReduce (EMR) cluster is created. For more information, see Create a cluster.
The MoveTo command can automatically update metadata after the command copies the underlying data. This way, data in a table or partitions can be fully migrated to the destination path. You can configure filter conditions for the MoveTo command to migrate a large number of partitions at the same time. JindoTable also provides some protective measures to ensure data integrity and security when the MoveTo command is used to migrate data.
The MoveTo command is supported only in EMR V3.36.0 and later minor versions, and in EMR V5.2.0 and later minor versions.
Use the MoveTo command
- Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
- Run the following command to obtain help information:
jindo table -help moveToInformation similar to the following output is returned:
<dbName.tableName> The table to move. <destination path> The destination base directory which is always at the same level of a 'table location', where the moved partitions or un-partitioned data would located in. <condition>/-fullTable A filter condition to determine which partitions should be moved, supporting common operators (like '>') and built-in UDFs (like to_date) (UDFs not supported yet...), while -fullTable means that all partitions (or a whole un-partitioned table) should be moved. One but only one option must be specified among -c "<condition>" and -fullTable. <before days> Optional, saying that table/partitions should be moved only when they are created (not updated or modified) more than some days before from now. <parallelism> The maximum concurrency when copying partitions, 1 by default. <OSS storage policy>: Storage policy for OSS destination, which can be Standard (by default), IA, Archive, or ColdArchive. Not applicable for destinations other than OSS. NOTE: if you are willing to use ColdArchive storage policy, please make sure that Cold Archive has been enabled for your OSS bucket. -o/-overWrite Overwriting the final paths where the data would be moved. For partitioned tables this overwrites partitions' locations which are subdirectories of <destination path>; for un-partitioned table this overwrites the <destination path> itself. -r/-removeSource Let the source data be removed when the corresponding table/partition is successfully moved to the new destination. Otherwise (by default), the source data would be left as it was. -skipTrash Applicable only when [-r/-removeSource] is enabled. If present, source data would be immediately deleted from the file system, bypassing the trash. -e/-explain If present, the command would not really move data, but only prints the table/partitions that would be moved for given conditions. <log directory> A directory to locate log files, '/tmp/<current user>/' by default.MoveTo syntax:
jindo table -moveTo \ -t <dbName.tableName> \ -d <destination path> \ [-c "<condition>" | -fullTable] \ [-b/-before <before days>] \ [-p/-parallel <parallelism>] \ [-s/-storagePolicy <OSS storage policy>] \ [-o/-overWrite] \ [-r/-removeSource] \ [-skipTrash] \ [-e/-explain] \ [-l/-logDir <log directory>]
Parameter Description Required -t <dbName.tableName> The name of the table that you want to migrate. You must specify this parameter in the
Database name.Table nameformat.
Separate the database name and table name with a period (.). The table can be a partitioned table or a non-partitioned table.
Yes -d <destination path> The destination path. No matter whether you want to migrate a specific partition or an entire non-partitioned table, this parameter specifies a table-level path. If you want to migrate a partition, the complete path of the partition is composed of the value of this parameter and the name of the partition, such as
Yes -c "<condition>" | -fullTable You must specify either
- If you specify
-fullTable, the entire partitioned or non-partitioned table is archived.
- If you specify
-c "<condition>", only the partitions that meet the filter condition are archived. Common operators, such as greater-than signs (>), are supported.
For example, if the partition key column is the ds column whose data type is String and you want to archive partitions whose partition names are greater than 'd', use
-c " ds > 'd' ".
No -b/before <before days> Only the tables or partitions that were created at least the specified days ago can be migrated. No -p/-parallel <parallelism> The parallelism among migration operations. No -s/-storagePolicy <OSS storage policy> The storage class that you want to use after data is migrated to Object Storage Service (OSS). Valid values:
Note Make sure that the storage class you want to use is enabled on the destination OSS bucket.
No -o/-overWrite The destination path is forcibly cleared. For a partitioned table, only the destination path of the partition that you want to migrate is cleared. No -r/-removeSource After data is migrated and metadata is updated, the source path is cleared. For a partitioned table, only the source path of the partition that is migrated is cleared. No -skipTrash The trash is skipped when the source path is cleared.Note You can specify this option only if -r/-removeSource is specified. No -e/-explain The explain mode is used. In explain mode, the list of partitions to be migrated is displayed, but no data is migrated. No -l/-logDir <log directory> The directory in which log files are stored. No
- If you specify
Configure a lock directory
- Go to the HDFS service page.
- Log on to the Alibaba Cloud EMR console.
- In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
- Click the Cluster Management tab.
- On the Cluster Management page, find your cluster and click Details in the Actions column.
- In the left-side navigation pane of the Cluster Overview page, choose .
- Add a custom item.
- Click the Configure tab. Then, click the hdfs-site or core-site tab in the Service Configuration section.
- In the upper-right corner of the Service Configuration section, click Custom Configuration.
- In the Add Configuration Item dialog box, add the jindotable.moveto.tablelock.base.dir parameter and set it to an existing HDFS path. Notice When you customize a lock directory, make sure that no MoveTo process is running on the nodes of the cluster. Otherwise, the MoveTo process may fail, which may even cause data pollution.
- Save the configuration.
- In the upper-right corner of the Service Configuration section, click Save.
- In the Confirm Changes dialog box, specify Description and click OK.