JindoTable is used to implement tiered storage, optimize table files, and collect data statistics based on the popularity of tables or partitions. This topic describes how to use JindoTable.

Prerequisites

  • Java Development Kit (JDK) 8 is installed on your on-premises machine.
  • An EMR cluster of the 3.30.0 version or later is created. For more information about how to create a cluster, see Create a cluster.

Use JindoTable

Common commands:
Notice Specify tables in the format of database.table. Specify partitions in the format of partitionCol1=1,partitionCol2=2,....

-accessStat

  • Syntax

    jindo table -accessStat {-d} <days> {-n} <topNums>

  • Description

    This command is used to query the access records in which tables or partitions are visited most in a specified time range.

    <days> and <topNums> must be positive integers. If <days> is 1, all the access records generated from 00:00 (local time) on the current day to the current time are queried.

  • Example: Query the 20 access records of tables or partitions that are most frequently visited within the last seven days.
    jindo table -accessStat -d 7 -n 20

-cache

  • Syntax

    jindo table -cache {-t} <dbName.tableName> [-p] <partitionSpec> [-pin]

  • Description

    This command is used to cache data of specified tables or partitions to local disks.

    The data of the tables or partitions must be stored in Object Storage Service (OSS) or JindoFileSystem (JindoFS). Specify tables in the format of database.table. Specify partitions in the format of partitionCol1=1,partitionCol2=2,.... When you specify -pin, if the cache space is insufficient, do not delete related data if possible.

  • Example: Cache data of the db1.t1 table that is generated on March 16, 2020 to local disks.
    jindo table -cache -t db1.t1 -p date=2020-03-16

-uncache

  • Syntax

    jindo table -uncache {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to delete cached data of specified tables or partitions from local disks.

    The data of the tables or partitions must be stored in OSS or JindoFS. Specify tables in the format of database.table. Specify partitions in the format of partitionCol1=1,partitionCol2=2,....

  • Examples:
    • Delete cached data of the db1.t2 table from local disks.
      jindo table -cache -t db1.t2
    • Delete cached data of the db1.t1 table from local disks.
      jindo table -uncache -t db1.t1 -p date=2020-03-16,category=1

-archive

  • Syntax

    jindo table -archive {-a|i} {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to lower the level of the policy that is used to store data of specified tables or partitions. By default, the Archive storage class is used.

    To use the Infrequent Access (IA) storage class, add -i to the command. Specify tables in the format of database.table. Specify partitions in the format of 'partitionCol1=1,partitionCol2=2,...'.

  • Example: Cache data of the db1.t1 table to local disks.
    jindo table -archive -t db1.t1 -p date=2020-10-12

-unarchive

  • Syntax

    jindo table -unarchive [-o|-i] {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to change the storage class from Archive to Standard.

    If -o is added to the command, an Archived object is temporarily restored. If -i is added to the command, an Archived object is changed to an IA object.

  • Example:
    • jindo table -unarchive -o -t db1.t1 -p date=2020-03-16,category=1
    • jindo table -unarchive -i -t db1.t2

-status

  • Syntax

    jindo table -status {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to view the data storage status of specified tables or partitions.

  • Examples:
    • View the data storage status of the db1.t2 table.
      jindo table -status -t db1.t2
    • View the data storage status of the db1.t1 table on March 16, 2020.
      jindo table -status -t db1.t1 -p date=2020-03-16

-optimize

  • Syntax

    jindo table -optimize {-t} <dbName.tableName>

  • Description

    This command is used to optimize the data organization of tables at the storage layer.

  • Example: Optimize the data organization of the db1.t1 table at the storage layer.
    jindo table -optimize -t db1.t1

-showTable

  • Syntax

    jindo table -showTable {-t} <dbName.tableName>

  • Description

    This command is used to display all partitions in a partitioned table or display the data storage of a non-partitioned table.

  • Example: Display all partitions in the db1.t1 partitioned table.
    jindo table -showTable -t db1.t1

-showPartition

  • Syntax

    jindo table -showPartition {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to display the data storage of partitions.

  • Example: Display the data storage of all partitions in the db1.t1 partitioned table on October 12, 2020.
    jindo table -showPartition -t db1.t1 -p date=2020-10-12

-listTables

  • Syntax

    jindo table -listTables [-db] <dbName.tableName>

  • Description

    This command is used to display all tables in a specified database. If you do not specify [-db], tables in the default database are displayed.

  • Examples:
    • Display tables in the default database.
      jindo table -listTables
    • Display tables in the db1 database.
      jindo table -listTables -db db1

-dumpmc

  • Syntax
    jindo table -dumpmc {-i} <accessId> {-k} <accessKey> {-m} <numMaps> {-t} <tunnelUrl> {-project} <projectName> {-table} <tablename> {-p} <partitionSpec> {-f} <csv|tfrecord> {-o} <outputPath>
    Parameter Description Required
    -i The AccessKey ID of your Alibaba Cloud account. Yes
    -k The AccessKey secret of your Alibaba Cloud account. Yes
    -m The number of map tasks. Yes
    -t The VPC Tunnel endpoint of MaxCompute. Yes
    -project The name of the MaxCompute project. Yes
    -table The name of the MaxCompute table. Yes
    -p The partition information. Example: pt=xxx. Separate multiple partitions with commas (,), such as pt=xxx,dt=xxx. No
    -f The file format. Valid values:
    • tfrecord
    • csv
    Yes
    -o The destination path. Yes
  • Description

    This command is used to dump MaxCompute tables to an EMR cluster or OSS. The formats CSV and TFRECORD are supported.

  • Examples:
    • Dump a MaxCompute table in the TFRECORD format to an EMR cluster.
      jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o /tmp/outputtf1 -f tfrecord
    • Dump a MaxCompute table in the CSV format to OSS.
      jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o oss://bucket1/tmp/outputcsv -f csv