JindoTable's archiveTable and unarchiveTable commands move table data in Object Storage Service (OSS) between storage classes — Archive, Infrequent Access (IA), and Standard. Unlike the original archive and unarchive commands, these commands do not require the Jindo Namespace Service component of SmartData, so you can run them on clusters without the SmartData service deployed.
Prerequisites
Before you begin, ensure that you have:
Java Development Kit (JDK) 8 installed on your computer
An E-MapReduce (EMR) cluster (see Create a cluster)
Table data stored in OSS — only table data (partitioned or non-partitioned) can be archived
How archiveTable and unarchiveTable differ from the original commands
The original archive and unarchive commands in JindoTable rely on the Jindo Namespace Service component of SmartData. The archiveTable and unarchiveTable commands remove that dependency and add the following capabilities:
Run on clusters where SmartData is not deployed, including self-managed clusters
Use filter conditions to archive or unarchive large numbers of partitions concurrently across multiple threads
Run Hadoop MapReduce jobs across the entire cluster when local multithreading is not enough
For information about the original commands, see Use JindoTable.
Limits
Supported in EMR V3.36.0 and later minor versions, and EMR V5.2.0 and later minor versions.
Only table data can be archived. Only OSS-backed tables (partitioned or non-partitioned) are supported.
When using
-i(IA) inarchiveTable, files already in Archive storage class are skipped.When using
-oinunarchiveTable, files in Standard storage class, IA storage class, and files that were previously temporarily unarchived are all skipped to prevent repeated unarchiving.
Archive table data
Use archiveTable to move OSS table data to Archive or IA storage class.
- Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
- Run the following command to view the available parameters:
jindo table -help archiveTable - Run
archiveTablewith the appropriate parameters. Parameters without brackets are required. Parameters in brackets ([...]) are optional. The-a/-iflag is required — use one of the two options.Parameter Description Required -t <dbName.tableName>The table to archive, in Database name.Table nameformat. Supports both partitioned and non-partitioned tables.Yes -a/-iThe target storage class: -afor Archive,-ifor IA. When-iis specified, files already in Archive storage class are skipped.Yes (one of the two) -c "<condition>"/-fullTableSpecifies which data to archive. Use -fullTableto archive the entire table. Use-c "<condition>"to archive partitions matching a filter condition. Common operators such as>are supported. Example:-c " ds > 'd' ". Specify one but not both.No -b/-before <before days>Archives only tables or partitions created at least the specified number of days ago. No -p/-parallel <parallelism>Maximum number of concurrent archiving threads. Default: 1.No -mr/-mapReduceUses a cluster-level Hadoop MapReduce job instead of local multithreading. No -e/-explainExplain mode: prints the list of partitions that would be archived without actually archiving any data. No -w/-workingDir <working directory>Working directory for the MapReduce job. Must not be a local file system path. Default: hdfs:///tmp/<current user>/jindotable-policy/. Temporary files are created during the job and deleted automatically after it completes.No -l/-logDir <log directory>Directory for log files. Default: /tmp/<current user>/.No -archiveTable -t <dbName.tableName> \ -a/-i \ [-c "<condition>" | -fullTable] \ [-b/-before <before days>] \ [-p/-parallel <parallelism>] \ [-mr/-mapReduce] \ [-e/-explain] \ [-w/-workingDir <working directory>] \ [-l/-logDir <log directory>]
Unarchive table data
Use unarchiveTable to restore OSS table data to Standard or IA storage class. The syntax mirrors archiveTable, with one key difference: the required -a/-i flag is replaced by the optional -i/-o flag.
- Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
- Run the following command to view the available parameters:
jindo table -help unarchiveTable - Run
unarchiveTablewith the appropriate parameters. All parameters are the same as inarchiveTableexcept for-i/-o.-i/-ooptionBehavior Not specified Changes the storage class of archived data to Standard. -iChanges the storage class to IA. Files already in Standard storage class are skipped. -oTemporarily restores archived data without changing its storage class. Skips Standard files, IA files, and files that were previously temporarily unarchived. -unarchiveTable -t <dbName.tableName> \ [-i/-o] \ [-c "<condition>" | -fullTable] \ [-b/-before <before days>] \ [-p/-parallel <parallelism>] \ [-mr/-mapReduce] \ [-e/-explain] \ [-w/-workingDir <working directory>] \ [-l/-logDir <log directory>]
What's next
To learn about additional JindoTable features such as usage statistics and table optimization, see Use JindoTable.