JindoTable collects frequent-access statistics on tables and partitions, provides tiered storage management, and optimizes table organization at the storage layer.
Prerequisites
Before you begin, make sure that you have:
-
Java Development Kit (JDK) 8 installed on your on-premises machine
-
An E-MapReduce (EMR) cluster of V3.30.0 or later
For details on creating an EMR cluster, see Create a cluster.
Commands
JindoTable provides the following commands. Specify tables in the format database.table and partitions in the format partitionCol1=1,partitionCol2=2,....
| Command | Description |
|---|---|
-accessStat |
Query the most frequently accessed tables or partitions in a time window |
-leastUseStat |
Query the tables or partitions that have been idle the longest |
-cache |
Cache table or partition data to local disk |
-uncache |
Remove cached table or partition data from local disk |
-archive |
Lower the storage class of table or partition data to Archive or Infrequent Access |
-unarchive |
Restore archived data to Standard or Infrequent Access storage class |
-status |
View the storage status of a table or partition |
-optimize |
Optimize table data organization at the storage layer |
-showTable |
List all partitions in a partitioned table, or show storage details of a non-partitioned table |
-showPartition |
Show storage details of a specific partition |
-listTables |
List all tables in a database |
-dumpmc |
Dump a MaxCompute table to an EMR cluster or Object Storage Service (OSS) |
For SDK-mode archive operations and data migration, see -archiveTable and -unarchiveTable and -moveTo.
-accessStat
Query the tables or partitions accessed most frequently within a specified number of days, along with their access counts.
Syntax
jindo table -accessStat -d <days> [-n <topNums>]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-d <days> |
Yes | — | Number of past days to include in the query. Must be a positive integer. If set to 1, the query covers from 00:00 to the current time on the current day. |
-n <topNums> |
No | All results | Number of top results to return. Must be a positive integer. |
Example
Query the 20 most-accessed tables or partitions over the last seven days:
jindo table -accessStat -d 7 -n 20
-leastUseStat
Query the tables or partitions that have been idle the longest, ranked by time since last access.
Syntax
jindo table -leastUseStat -n <num> [-i | -ignoreNever]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-n <num> |
Yes | — | Number of results to return. Must be a positive integer. |
-i / -ignoreNever |
No | Include all | When specified, excludes tables or partitions that have never been accessed. |
Example
Query the 20 tables or partitions that have been idle the longest:
jindo table -leastUseStat -n 20
-cache
Cache the data of a table or partition from OSS or JindoFS to local disk, speeding up subsequent reads.
To remove cached data, use -uncache.
Syntax
jindo table -cache -t <dbName.tableName> [-p <partitionSpec>] [-pin]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-t <dbName.tableName> |
Yes | — | The table to cache. The data must be stored in OSS or JindoFS. |
-p <partitionSpec> |
No | Entire table | The partition to cache. Format: partitionCol1=val1,partitionCol2=val2,... |
-pin |
No | Not pinned | When specified, prevents the cached data from being evicted when cache space runs low. |
Example
Cache the March 16, 2020 partition of db1.t1 to local disk:
jindo table -cache -t db1.t1 -p date=2020-03-16
-uncache
Remove the cached data of a table or partition from local disk.
To cache data, use -cache.
Syntax
jindo table -uncache -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-t <dbName.tableName> |
Yes | — | The table whose cached data to remove. The data must be stored in OSS or JindoFS. |
-p <partitionSpec> |
No | Entire table | The partition whose cached data to remove. Format: partitionCol1=val1,partitionCol2=val2,... |
Examples
Remove cached data for the entire db1.t2 table:
jindo table -uncache -t db1.t2
Remove cached data for a specific partition of db1.t1:
jindo table -uncache -t db1.t1 -p date=2020-03-16,category=1
-archive
Lower the storage class of table or partition data. The default target is Archive storage class. To use Infrequent Access (IA) instead, add -i.
To restore archived data, use -unarchive. For SDK-mode archiving that does not rely on the Jindo Namespace Service, see -archiveTable.
Syntax
jindo table -archive [-a | -i] -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-t <dbName.tableName> |
Yes | — | The table to archive. |
-a |
No | Archive storage class | When specified, explicitly archives data to Archive storage class (default behavior). |
-i |
No | Archive storage class | When specified, archives data to Infrequent Access (IA) storage class instead of Archive. |
-p <partitionSpec> |
No | Entire table | The partition to archive. Format: partitionCol1=val1,partitionCol2=val2,... |
Example
Archive the October 12, 2020 partition of db1.t1:
jindo table -archive -t db1.t1 -p date=2020-10-12
-unarchive
Restore archived table or partition data to a higher storage class.
-
No flag: Converts archived data to Standard storage class.
-
-o: Temporarily restores an archived object. The object remains in Archive storage class after the restore window expires. -
-i: Converts an archived object to Infrequent Access (IA) storage class.
To archive data, use -archive.
Syntax
jindo table -unarchive [-o | -i] -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-t <dbName.tableName> |
Yes | — | The table to unarchive. |
-o |
No | Standard | Temporarily restores an archived object without permanently changing its storage class. |
-i |
No | Standard | Converts archived data to Infrequent Access (IA) storage class. |
-p <partitionSpec> |
No | Entire table | The partition to unarchive. Format: partitionCol1=val1,partitionCol2=val2,... |
Examples
Temporarily restore a specific partition of db1.t1:
jindo table -unarchive -o -t db1.t1 -p date=2020-03-16,category=1
Convert all partitions of db1.t2 from Archive to Infrequent Access:
jindo table -unarchive -i -t db1.t2
-status
View the data storage status of a table or partition.
Syntax
jindo table -status -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-t <dbName.tableName> |
Yes | — | The table to inspect. |
-p <partitionSpec> |
No | Entire table | The partition to inspect. Format: partitionCol1=val1,partitionCol2=val2,... |
Examples
View the storage status of db1.t2:
jindo table -status -t db1.t2
View the storage status of the March 16, 2020 partition of db1.t1:
jindo table -status -t db1.t1 -p date=2020-03-16
-optimize
Optimize the data organization of a table at the storage layer, improving read efficiency for downstream queries.
Syntax
jindo table -optimize -t <dbName.tableName>
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-t <dbName.tableName> |
Yes | — | The table to optimize. |
Example
Optimize the storage layout of db1.t1:
jindo table -optimize -t db1.t1
-showTable
For a partitioned table, list all partitions and their storage details. For a non-partitioned table, show storage details of the table itself.
Syntax
jindo table -showTable -t <dbName.tableName>
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-t <dbName.tableName> |
Yes | — | The table to inspect. |
Example
List all partitions in db1.t1:
jindo table -showTable -t db1.t1
-showPartition
Show the storage details of a specific partition.
Syntax
jindo table -showPartition -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-t <dbName.tableName> |
Yes | — | The partitioned table to inspect. |
-p <partitionSpec> |
No | — | The partition to inspect. Format: partitionCol1=val1,partitionCol2=val2,... |
Example
Show the storage details of the October 12, 2020 partition of db1.t1:
jindo table -showPartition -t db1.t1 -p date=2020-10-12
-listTables
List all tables in a database. If no database is specified, tables in the default database are listed.
Syntax
jindo table -listTables [-db <dbName>]
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-db <dbName> |
No | Default database | The database to list tables from. |
Examples
List tables in the default database:
jindo table -listTables
List tables in db1:
jindo table -listTables -db db1
-dumpmc
Dump a MaxCompute table to an EMR cluster or OSS. Supported output formats are CSV and TFRECORD.
Do not hardcode your AccessKey ID and AccessKey secret in commands. Use environment variables or a secure credential store instead.
Syntax
jindo table -dumpmc -i <accessId> -k <accessKey> -m <numMaps> -t <tunnelUrl> -project <projectName> -table <tableName> [-p <partitionSpec>] -f <csv|tfrecord> -o <outputPath>
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
-i <accessId> |
Yes | — | The AccessKey ID of your Alibaba Cloud account. |
-k <accessKey> |
Yes | — | The AccessKey secret of your Alibaba Cloud account. |
-m <numMaps> |
Yes | — | The number of map tasks. |
-t <tunnelUrl> |
Yes | — | The Tunnel endpoint of the virtual private cloud (VPC) where the MaxCompute project resides. |
-project <projectName> |
Yes | — | The name of the MaxCompute project. |
-table <tableName> |
Yes | — | The name of the MaxCompute table. |
-p <partitionSpec> |
No | All partitions | Partition filter. Example: pt=xxx. Separate multiple partitions with commas: pt=xxx,dt=xxx. |
-f <csv|tfrecord> |
Yes | — | Output file format. Valid values: csv, tfrecord. |
-o <outputPath> |
Yes | — | The output path. Use a local path (for example, /tmp/output) for an EMR cluster, or an OSS path (for example, oss://bucket/path) for OSS. |
Examples
Dump a MaxCompute table in TFRECORD format to an EMR cluster:
jindo table -dumpmc -m 10 -project mctest_project -table t1 \
-t http://dt.xxx.maxcompute.aliyun-inc.com \
-k xxxxxxxxx -i XXXXXX \
-o /tmp/outputtf1 -f tfrecord
Dump a MaxCompute table in CSV format to OSS:
jindo table -dumpmc -m 10 -project mctest_project -table t1 \
-t http://dt.xxx.maxcompute.aliyun-inc.com \
-k xxxxxxxxx -i XXXXXX \
-o oss://bucket1/tmp/outputcsv -f csv