JindoTable allows you to run the archiveTable and unarchiveTable commands in SDK mode to archive and unarchive data in Object Storage Service (OSS). The commands do not rely on the Jindo Namespace Service component of SmartData. This topic describes how to use the archiveTable and unarchiveTable commands.

Prerequisites

  • An E-MapReduce (EMR) cluster is created. For more information, see Create a cluster.
  • The partitioned table or non-partitioned table that you want to archive is stored in OSS. Only table data can be archived.

Background information

You can use the original archive and unarchive commands of JindoTable to archive or unarchive tables or partitions in OSS. However, these commands rely on the Jindo Namespace Service component of SmartData. The new commands archiveTable and unarchiveTable do not rely on the Jindo Namespace Service component.

The archiveTable and unarchiveTable commands have the following advantages over the archive and unarchive commands:
  • You can run the archiveTable and unarchiveTable commands even if the SmartData service is not deployed in your cluster. For example, you can run the commands on a self-managed cluster.
  • You can configure filter parameters in the archiveTable or unarchiveTable command to archive or unarchive a large number of partitions on multiple threads at the same time. If local multithreading cannot meet your business requirements, you can run MapReduce tasks on the entire cluster to archive or unarchive data.

For more information about the original archive and unarchive commands, see Use JindoTable.

Limits

The archiveTable and unarchiveTable commands are supported only in EMR V3.36.0 and later minor versions, and EMR V5.2.0 and later minor versions.

archiveTable

You can use the archiveTable command to archive tables or partitions in OSS.

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
  2. Run the following command to obtain help information:
    jindo table -help archiveTable
    archiveTable syntax:
    -archiveTable -t <dbName.tableName> \
    -i/-a/-ca \
    [-c "<condition>" | -fullTable] \
    [-b/-before <before days>] \
    [-p/-parallel <parallelism>] \
    [-mr/-mapReduce] \
    [-e/-explain] \
    [-w/-workingDir <working directory>] \
    [-l/-logDir <log directory>]
    Parameter Description Required
    -t <dbName.tableName> The name of the table that you want to archive. You must configure this parameter in the Database name.Table name format.

    Separate the database name and table name with a period (.). The table can be a partitioned table or a non-partitioned table.

    Yes
    -i/-a/-ca The storage class in which you want to archive data. You can use one of the following options to specify a storage class:
    • -i: Infrequent Access (IA)
    • -a: Archive
    • -ca: Cold Archive

    If you use the -i option in the command, the files whose storage class is Archive or Cold Archive are skipped. If you use the -a option in the command, the files whose storage class is Cold Archive are skipped.

    Yes
    -c "<condition>" | -fullTable You must specify either -fullTable or -c "<condition>".
    • If you specify -fullTable, the entire partitioned or non-partitioned table is archived.
    • If you specify -c "<condition>", only the partitions that meet the filter condition are archived. Common operators, such as greater-than signs (>), are supported.

      For example, if the partition key column is the ds column whose data type is String and you want to archive partitions whose partition names are greater than 'd', use -c " ds > 'd' ".

    Yes
    -b/before <before days> Only the tables or partitions that were created at least the specified days ago can be archived. No
    -p/-parallel <parallelism> The parallelism of archiving operations. No
    -mr/-mapReduce Hadoop MapReduce instead of local multithreading is used to archive data. No
    -e/-explain The explain mode is used. In explain mode, the list of partitions to be archived is displayed, but no data is archived. No
    -w/-workingDir The working directory of a MapReduce job. This option is used only when you use a MapReduce job to archive data. You must have read and write permissions on the directory. The directory can be empty or not. Temporary files are created when you run the MapReduce job and are automatically deleted after the job is completed. No
    -l/-logDir <log directory> The directory in which log files are stored. No

unarchiveTable

The syntax of the unarchiveTable command is similar to the syntax of the archiveTable command. You can use the unarchiveTable command to unarchive tables or partitions in OSS.

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
  2. Run the following command to obtain help information:
    jindo table -help unarchiveTable
    unarchiveTable syntax:
    -archiveTable -t <dbName.tableName> \
    -i/-a/-o/-cr \
    [-notWait] \
    [-c "<condition>" | -fullTable] \
    [-b/-before <before days>] \
    [-p/-parallel <parallelism>] \
    [-mr/-mapReduce] \
    [-e/-explain] \
    [-w/-workingDir <working directory>] \
    [-l/-logDir <log directory>]
The parameters of the unarchiveTable command differ from the parameters of the archiveTable command in the following aspects:
  • The optional parameter -i/-a/-o/-cr is used instead of the required parameter -i/-a/-ca.
  • The optional parameter -notWait is used.
Parameter Description Required
-i/-a/-o/-cr The conversion of the storage class for data in the Cold Archive storage class.
  • If you do not specify -i/-a/-o/-cr, the storage class of the data is changed from Cold Archive to Standard.
    Note The storage class of the data can be changed from Cold Archive to Standard, IA, or Archive only when the data is completely unarchived.
  • If you specify -i/-a/-o/-cr, the following descriptions prevail:
    • If you specify the -i option, the storage class of the data is changed from Cold Archive to IA. Files whose storage class is Standard are skipped.
    • If you specify the -a option, the storage class of the data is changed from Cold Archive to Archive. Files whose storage class is Standard or IA are skipped.
    • If you specify the -o option, data is only temporarily unarchived and its storage class is retained. Files whose storage class is Standard or IA are skipped. Files that are previously unarchived are also skipped. This way, these files are not repeatedly unarchived.
    • You can specify the -cr option to check whether the files stored to Archive or Cold Archive storage are unarchived.
No
-notWait This parameter is valid only when you unarchive data. This parameter is used to unarchive data that is stored to Cold Archive storage. If you specify this parameter, the system only sends an unarchive command, but does not wait for the completion of the unarchiving operation. No