JindoTable is used to implement tiered storage, optimize table files, and collect data statistics based on the popularity of tables or partitions. This topic describes how to use JindoTable.
Prerequisites
Before you begin, ensure that you have:
-
JDK 8 installed on your on-premises machine
-
An EMR cluster of version 3.30.0 or later. For more information, see Create a cluster.
Usage notes
-
Specify tables in the format
database.table. -
Specify partitions in the format
partitionCol1=1,partitionCol2=2,....
Commands
JindoTable supports the following commands:
-
-accessStat: Query the most-accessed tables or partitions in a time range
-
-cache: Cache table or partition data to local disks
-
-uncache: Remove cached data from local disks
-
-archive: Move table or partition data to a lower-cost storage class
-
-unarchive: Restore archived data to Standard or Infrequent Access storage
-
-status: View the storage status of a table or partition
-
-optimize: Optimize data organization at the storage layer
-
-showTable: List partitions in a table or view storage of a non-partitioned table
-
-showPartition: View storage details for a specific partition
-
-listTables: List all tables in a database
-
-dumpmc: Dump MaxCompute tables to an EMR cluster or OSS
-accessStat
Use this command to identify which tables or partitions have the highest access frequency in a given time range. This helps you determine which data to cache for performance or which data is cold enough to archive.
Syntax
jindo table -accessStat -d <days> -n <topNums>
Parameters
| Parameter | Description | Required |
|---|---|---|
-d <days> |
Number of days to look back. Must be a positive integer. If set to 1, all access records from 00:00 (local time) on the current day to the current time are returned. |
Yes |
-n <topNums> |
Number of top results to return. Must be a positive integer. | Yes |
Example
Return the top 20 most-accessed tables or partitions in the last 7 days:
jindo table -accessStat -d 7 -n 20
-cache
Use this command to cache data of a table or partition to local disks. This speeds up subsequent queries on frequently accessed data stored in Object Storage Service (OSS) or JindoFileSystem (JindoFS).
Syntax
jindo table -cache -t <dbName.tableName> [-p <partitionSpec>] [-pin]
Parameters
| Parameter | Description | Required |
|---|---|---|
-t <dbName.tableName> |
The table to cache. Use the format database.table. |
Yes |
-p <partitionSpec> |
The partition to cache. Use the format partitionCol1=1,partitionCol2=2,.... |
No |
-pin |
When set, pinned data is not evicted even if cache space is insufficient. | No |
Data must be stored in OSS or JindoFS to be cached.
Example
Cache the March 16, 2020 partition of db1.t1:
jindo table -cache -t db1.t1 -p date=2020-03-16
-uncache
Use this command to remove cached data of a table or partition from local disks, freeing up cache space.
Syntax
jindo table -uncache -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Description | Required |
|---|---|---|
-t <dbName.tableName> |
The table whose cache to remove. Use the format database.table. |
Yes |
-p <partitionSpec> |
The partition whose cache to remove. Use the format partitionCol1=1,partitionCol2=2,.... |
No |
Data must be stored in OSS or JindoFS.
Examples
Remove all cached data for db1.t2:
jindo table -uncache -t db1.t2
Remove cached data for a specific partition of db1.t1:
jindo table -uncache -t db1.t1 -p date=2020-03-16,category=1
-archive
Use this command to move data of a table or partition to a lower-cost storage class. Use Archive for data that is rarely accessed. Use Infrequent Access (IA) for data that is accessed less frequently but may still need to be retrieved without a restore step.
Syntax
jindo table -archive [-a | -i] -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Description | Required |
|---|---|---|
-a |
Move to Archive storage class. This is the default if neither -a nor -i is specified. |
No |
-i |
Move to Infrequent Access (IA) storage class instead of Archive. | No |
-t <dbName.tableName> |
The table to archive. Use the format database.table. |
Yes |
-p <partitionSpec> |
The partition to archive. Use the format partitionCol1=1,partitionCol2=2,.... |
No |
Example
Move the October 12, 2020 partition of db1.t1 to Archive storage:
jindo table -archive -t db1.t1 -p date=2020-10-12
-unarchive
Use this command to restore archived data. Temporarily restore an Archived object for retrieval, or permanently change it to a lower-cost active storage class.
Syntax
jindo table -unarchive [-o | -i] -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Description | Required |
|---|---|---|
-o |
Temporarily restore an Archived object. | No |
-i |
Change an Archived object to Infrequent Access (IA) storage class. | No |
-t <dbName.tableName> |
The table to restore. Use the format database.table. |
Yes |
-p <partitionSpec> |
The partition to restore. Use the format partitionCol1=1,partitionCol2=2,.... |
No |
Examples
Temporarily restore an Archived partition of db1.t1:
jindo table -unarchive -o -t db1.t1 -p date=2020-03-16,category=1
Change db1.t2 from Archive to Infrequent Access storage class:
jindo table -unarchive -i -t db1.t2
-status
Use this command to check the current storage status of a table or partition, including which storage class the data is in.
Syntax
jindo table -status -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Description | Required |
|---|---|---|
-t <dbName.tableName> |
The table to check. Use the format database.table. |
Yes |
-p <partitionSpec> |
The partition to check. Use the format partitionCol1=1,partitionCol2=2,.... |
No |
Examples
View the storage status of db1.t2:
jindo table -status -t db1.t2
View the storage status of the March 16, 2020 partition of db1.t1:
jindo table -status -t db1.t1 -p date=2020-03-16
-optimize
Use this command to optimize the data organization of a table at the storage layer.
Syntax
jindo table -optimize -t <dbName.tableName>
Parameters
| Parameter | Description | Required |
|---|---|---|
-t <dbName.tableName> |
The table to optimize. Use the format database.table. |
Yes |
Example
Optimize the storage organization of db1.t1:
jindo table -optimize -t db1.t1
-showTable
Use this command to list all partitions in a partitioned table, or view the data storage details of a non-partitioned table.
Syntax
jindo table -showTable -t <dbName.tableName>
Parameters
| Parameter | Description | Required |
|---|---|---|
-t <dbName.tableName> |
The table to display. Use the format database.table. |
Yes |
Example
List all partitions in db1.t1:
jindo table -showTable -t db1.t1
-showPartition
Use this command to view the storage details of a specific partition.
Syntax
jindo table -showPartition -t <dbName.tableName> [-p <partitionSpec>]
Parameters
| Parameter | Description | Required |
|---|---|---|
-t <dbName.tableName> |
The table containing the partition. Use the format database.table. |
Yes |
-p <partitionSpec> |
The partition to display. Use the format partitionCol1=1,partitionCol2=2,.... |
No |
Example
View the storage details of the October 12, 2020 partition of db1.t1:
jindo table -showPartition -t db1.t1 -p date=2020-10-12
-listTables
Use this command to list all tables in a database. If no database is specified, tables in the default database are returned.
Syntax
jindo table -listTables [-db <dbName>]
Parameters
| Parameter | Description | Required |
|---|---|---|
-db <dbName> |
The database to list tables from. If omitted, the default database is used. | No |
Examples
List all tables in the default database:
jindo table -listTables
List all tables in db1:
jindo table -listTables -db db1
-dumpmc
Use this command to dump a MaxCompute table to an EMR cluster or OSS bucket. Both CSV and TFRecord formats are supported.
Syntax
jindo table -dumpmc -i <accessId> -k <accessKey> -m <numMaps> -t <tunnelUrl> -project <projectName> -table <tableName> [-p <partitionSpec>] -f <csv|tfrecord> -o <outputPath>
Parameters
| Parameter | Description | Required |
|---|---|---|
-i <accessId> |
The AccessKey ID of your Alibaba Cloud account. | Yes |
-k <accessKey> |
The AccessKey secret of your Alibaba Cloud account. | Yes |
-m <numMaps> |
The number of map tasks. | Yes |
-t <tunnelUrl> |
The VPC Tunnel endpoint of MaxCompute. | Yes |
-project <projectName> |
The name of the MaxCompute project. | Yes |
-table <tableName> |
The name of the MaxCompute table. | Yes |
-p <partitionSpec> |
The partition to dump. Example: pt=xxx. Separate multiple partitions with commas, for example, pt=xxx,dt=xxx. |
No |
-f <csv|tfrecord> |
The output file format. Valid values: csv, tfrecord. |
Yes |
-o <outputPath> |
The destination path. Use a local path for an EMR cluster or an OSS path (for example, oss://bucket/path) for OSS. |
Yes |
Examples
Dump a MaxCompute table in TFRecord format to an EMR cluster:
jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o /tmp/outputtf1 -f tfrecord
Dump a MaxCompute table in CSV format to OSS:
jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o oss://bucket1/tmp/outputcsv -f csv