Archive or Restore OSS Tables via JindoTable SDK (No Namespace) - E-MapReduce

JindoTable's archiveTable and unarchiveTable commands move table data in Object Storage Service (OSS) between storage classes — Archive, Infrequent Access (IA), and Standard. Unlike the original archive and unarchive commands, these commands do not require the Jindo Namespace Service component of SmartData, so you can run them on clusters without the SmartData service deployed.

Prerequisites

Before you begin, ensure that you have:

Java Development Kit (JDK) 8 installed on your computer
An E-MapReduce (EMR) cluster (see Create a cluster)
Table data stored in OSS — only table data (partitioned or non-partitioned) can be archived

How archiveTable and unarchiveTable differ from the original commands

The original archive and unarchive commands in JindoTable rely on the Jindo Namespace Service component of SmartData. The archiveTable and unarchiveTable commands remove that dependency and add the following capabilities:

Run on clusters where SmartData is not deployed, including self-managed clusters
Use filter conditions to archive or unarchive large numbers of partitions concurrently across multiple threads
Run Hadoop MapReduce jobs across the entire cluster when local multithreading is not enough

For information about the original commands, see Use JindoTable.

Limits

Supported in EMR V3.36.0 and later minor versions, and EMR V5.2.0 and later minor versions.
Only table data can be archived. Only OSS-backed tables (partitioned or non-partitioned) are supported.
When using -i (IA) in archiveTable, files already in Archive storage class are skipped.
When using -o in unarchiveTable, files in Standard storage class, IA storage class, and files that were previously temporarily unarchived are all skipped to prevent repeated unarchiving.

Archive table data

Use archiveTable to move OSS table data to Archive or IA storage class.

Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to view the available parameters:
```
jindo table -help archiveTable
```

Run archiveTable with the appropriate parameters. Parameters without brackets are required. Parameters in brackets ([...]) are optional. The -a/-i flag is required — use one of the two options.

Parameter	Description	Required
`-t <dbName.tableName>`	The table to archive, in `Database name.Table name` format. Supports both partitioned and non-partitioned tables.	Yes
`-a` / `-i`	The target storage class: `-a` for Archive, `-i` for IA. When `-i` is specified, files already in Archive storage class are skipped.	Yes (one of the two)
`-c "<condition>"` / `-fullTable`	Specifies which data to archive. Use `-fullTable` to archive the entire table. Use `-c "<condition>"` to archive partitions matching a filter condition. Common operators such as `>` are supported. Example: `-c " ds > 'd' "`. Specify one but not both.	No
`-b` / `-before <before days>`	Archives only tables or partitions created at least the specified number of days ago.	No
`-p` / `-parallel <parallelism>`	Maximum number of concurrent archiving threads. Default: `1`.	No
`-mr` / `-mapReduce`	Uses a cluster-level Hadoop MapReduce job instead of local multithreading.	No
`-e` / `-explain`	Explain mode: prints the list of partitions that would be archived without actually archiving any data.	No
`-w` / `-workingDir <working directory>`	Working directory for the MapReduce job. Must not be a local file system path. Default: `hdfs:///tmp/<current user>/jindotable-policy/`. Temporary files are created during the job and deleted automatically after it completes.	No
`-l` / `-logDir <log directory>`	Directory for log files. Default: `/tmp/<current user>/`.	No

-archiveTable -t <dbName.tableName> \
-a/-i \
[-c "<condition>" | -fullTable] \
[-b/-before <before days>] \
[-p/-parallel <parallelism>] \
[-mr/-mapReduce] \
[-e/-explain] \
[-w/-workingDir <working directory>] \
[-l/-logDir <log directory>]

Unarchive table data

Use unarchiveTable to restore OSS table data to Standard or IA storage class. The syntax mirrors archiveTable, with one key difference: the required -a/-i flag is replaced by the optional -i/-o flag.

Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to view the available parameters:
```
jindo table -help unarchiveTable
```

Run unarchiveTable with the appropriate parameters. All parameters are the same as in archiveTable except for -i/-o.

`-i/-o` option	Behavior
Not specified	Changes the storage class of archived data to Standard.
`-i`	Changes the storage class to IA. Files already in Standard storage class are skipped.
`-o`	Temporarily restores archived data without changing its storage class. Skips Standard files, IA files, and files that were previously temporarily unarchived.

-unarchiveTable -t <dbName.tableName> \
[-i/-o] \
[-c "<condition>" | -fullTable] \
[-b/-before <before days>] \
[-p/-parallel <parallelism>] \
[-mr/-mapReduce] \
[-e/-explain] \
[-w/-workingDir <working directory>] \
[-l/-logDir <log directory>]

What's next

To learn about additional JindoTable features such as usage statistics and table optimization, see Use JindoTable.