JindoTable collects the frequent-access statistics on tables and partitions, provides tiered storage, and optimizes tables. This topic describes how to use JindoTable.

Prerequisites

  • Java Development Kit (JDK) 8 is installed on your on-premises machine.
  • An E-MapReduce (EMR) cluster of V3.30.0 or later is created. For more information about how to create a cluster, see Create a cluster.

Use JindoTable

Common commands:
Notice Specify tables in the format of database.table. Specify partitions in the format of partitionCol1=1,partitionCol2=2,...

-accessStat

  • Syntax

    jindo table -accessStat {-d} <days> {-n} <topNums>

  • Description

    This command is used to query the tables or partitions that are visited most frequently within a specified period of time, and the number of times each of them is visited.

    <days> and <topNums> must be positive integers. For example, if days is set to 1 and topNums is not specified, the frequent-access statistics on all the tables or partitions that are visited on the current day (from 00:00 to the current time) are queried.

  • Example: Run the following command to query the first 20 tables or partitions that have been visited most frequently within the last seven days, and the number of times each of them has been visited:
    jindo table -accessStat -d 7 -n 20

-cache

  • Syntax

    jindo table -cache {-t} <dbName.tableName> [-p] <partitionSpec> [-pin]

  • Description

    This command is used to cache the data of specified tables or partitions to local disks.

    The data of the tables or partitions must be stored in Object Storage Service (OSS) or JindoFS. Specify tables in the format of database.table. Specify partitions in the format of partitionCol1=1,partitionCol2=2,...If the available cache space becomes insufficient when you specify -pin, do not delete related data.

  • Example: Cache the data of the db1.t1 table that is generated on March 16, 2020 to local disks.
    jindo table -cache -t db1.t1 -p date=2020-03-16

-uncache

  • Syntax

    jindo table -uncache {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to delete the cached data of specified tables or partitions from local disks.

    The data of the tables or partitions must be stored in OSS or JindoFS. Specify tables in the format of database.table. Specify partitions in the format of partitionCol1=1,partitionCol2=2,...

  • Examples:
    • Delete the cached data of the db1.t2 table from local disks.
      jindo table -uncache -t db1.t2
    • Delete the cached data of the db1.t1 table from local disks.
      jindo table -uncache -t db1.t1 -p date=2020-03-16,category=1

-archive

  • Syntax

    jindo table -archive {-a|i} {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to lower the level of the data storage policy of specific tables or partitions. By default, the Archive storage class is used.

    To use the Infrequent Access (IA) storage class, add -i to the command. Specify tables in the format of database.table. Specify partitions in the format of partitionCol1=1,partitionCol2=2,...

  • Example: Cache the data of the db1.t1 table to local disks.
    jindo table -archive -t db1.t1 -p date=2020-10-12

-unarchive

  • Syntax

    jindo table -unarchive [-o|-i] {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to change the storage class from Archive to Standard.

    If -o is added to the command, an Archived object is temporarily restored. If -i is added to the command, an Archived object is changed to an IA object.

  • Examples:
    • jindo table -unarchive -o -t db1.t1 -p date=2020-03-16,category=1
    • jindo table -unarchive -i -t db1.t2

-status

  • Syntax

    jindo table -status {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to view the data storage status of specific tables or partitions.

  • Examples:
    • View the data storage status of the db1.t2 table.
      jindo table -status -t db1.t2
    • View the data storage status of the db1.t1 table on March 16, 2020.
      jindo table -status -t db1.t1 -p date=2020-03-16

-optimize

  • Syntax

    jindo table -optimize {-t} <dbName.tableName>

  • Description

    This command is used to optimize the data organization of tables at the storage layer.

  • Example: Optimize the data organization of the db1.t1 table at the storage layer.
    jindo table -optimize -t db1.t1

-showTable

  • Syntax

    jindo table -showTable {-t} <dbName.tableName>

  • Description

    This command is used to display all the partitions in a partitioned table or display the data storage of a non-partitioned table.

  • Example: Display all partitions in the db1.t1 partitioned table.
    jindo table -showTable -t db1.t1

-showPartition

  • Syntax

    jindo table -showPartition {-t} <dbName.tableName> [-p] <partitionSpec>

  • Description

    This command is used to display the data storage of partitions.

  • Example: Display the data storage of all partitions in the db1.t1 partitioned table on October 12, 2020.
    jindo table -showPartition -t db1.t1 -p date=2020-10-12

-listTables

  • Syntax

    jindo table -listTables [-db] <dbName.tableName>

  • Description

    This command is used to display all the tables in a specified database. If you do not specify [-db], the tables in the default database are displayed.

  • Examples:
    • Display the tables in the default database.
      jindo table -listTables
    • Display the tables in the db1 database.
      jindo table -listTables -db db1

-dumpmc

  • Syntax
    jindo table -dumpmc {-i} <accessId> {-k} <accessKey> {-m} <numMaps> {-t} <tunnelUrl> {-project} <projectName> {-table} <tablename> {-p} <partitionSpec> {-f} <csv|tfrecord> {-o} <outputPath>
    Parameter Description Required
    -i The AccessKey ID of your Alibaba Cloud account. Yes
    -k The AccessKey secret of your Alibaba Cloud account. Yes
    -m The number of map tasks. Yes
    -t The Tunnel endpoint of the virtual private cloud (VPC) where the MaxCompute project resides. Yes
    -project The name of the MaxCompute project. Yes
    -table The name of the MaxCompute table. Yes
    -p The partition information. Example: pt=xxx. Separate multiple partitions with commas (,), such as pt=xxx,dt=xxx. No
    -f The file format. Valid values:
    • tfrecord
    • csv
    Yes
    -o The destination path. Yes
  • Description

    This command is used to dump MaxCompute tables to an EMR cluster or OSS. The formats CSV and TFRECORD are supported.

  • Examples:
    • Dump a MaxCompute table in the TFRECORD format to an EMR cluster.
      jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o /tmp/outputtf1 -f tfrecord
    • Dump a MaxCompute table in the CSV format to OSS.
      jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o oss://bucket1/tmp/outputcsv -f csv

-leastUseStat

  • Syntax

    jindo table -leastUseStat -n <num> [-i/-ignoreNever]

  • Description

    This command is used to display the tables or partitions that have not been accessed for the longest time.

    num indicates the number of tables or partitions displayed. Set this parameter to a positive integer. -i/-ignoreNever is an optional parameter. If it is set, the tables or partitions that have never been accessed are filtered out.

  • Example: Query the first 20 tables or partitions that have not been accessed for the longest time.
    jindo table -leastUseStat -n 20