JindoTable is used to implement tiered storage, optimize table files, and collect data statistics based on the popularity of tables or partitions. This topic describes how to use JindoTable.
Prerequisites
Before you begin, ensure that you have:
-
Java Development Kit (JDK) 8 installed on your on-premises machine
-
An E-MapReduce (EMR) cluster of version 3.30.0 or later (Create a cluster)
How it works
JindoTable commands follow a storage management workflow:
-
Run
-accessStatto identify which tables or partitions are accessed most frequently. -
Run
-cacheto pull hot data onto local disks, or-archiveto move cold data to a lower-cost storage class. -
Run
-statusto verify the current storage state of a table or partition. -
Run
-optimizeto optimize the data organization of tables at the storage layer.
database.table. Specify partitions in the format partitionCol1=val1,partitionCol2=val2,....Command reference
JindoTable provides 11 commands. In the syntax descriptions, {-flag} indicates a required parameter and [-flag] indicates an optional parameter.
| Command | Description |
|---|---|
-accessStat |
Query the most-accessed tables or partitions in a time range |
-cache |
Cache table or partition data to local disks |
-uncache |
Remove cached data from local disks |
-archive |
Move data to Archive or Infrequent Access (IA) storage class |
-unarchive |
Restore archived data to Standard storage class |
-status |
View the storage status of a table or partition |
-optimize |
Optimize table data organization at the storage layer |
-showTable |
List all partitions in a partitioned table, or show storage details of a non-partitioned table |
-showPartition |
Show storage details of a specific partition |
-listTables |
List all tables in a database |
-dumpmc |
Dump a MaxCompute table to an EMR cluster or Object Storage Service (OSS) |
-accessStat
Query the tables or partitions with the most access records in a specified time range.
Syntax
jindo table -accessStat {-d} <days> {-n} <topNums>
Parameters
| Parameter | Required | Description |
|---|---|---|
-d <days> |
Yes | Number of days to look back. Must be a positive integer. If set to 1, all access records from 00:00 local time on the current day to the current time are returned. |
-n <topNums> |
Yes | Number of top results to return. Must be a positive integer. |
Example
Return the top 20 most-accessed tables or partitions in the last 7 days:
jindo table -accessStat -d 7 -n 20
-cache
Cache data of a table or partition from OSS or JindoFileSystem (JindoFS) to local disks.
Syntax
jindo table -cache {-t} <dbName.tableName> [-p] <partitionSpec> [-pin]
Parameters
| Parameter | Required | Description |
|---|---|---|
-t <dbName.tableName> |
Yes | Table to cache. Format: database.table. |
-p <partitionSpec> |
No | Partition to cache. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the entire table is cached. |
-pin |
No | If cache space is insufficient, do not delete related data if possible. |
Example
Cache the date=2020-03-16 partition of db1.t1:
jindo table -cache -t db1.t1 -p date=2020-03-16
-uncache
Remove cached data of a table or partition from local disks.
Syntax
jindo table -uncache {-t} <dbName.tableName> [-p] <partitionSpec>
Parameters
| Parameter | Required | Description |
|---|---|---|
-t <dbName.tableName> |
Yes | Table whose cache to remove. Format: database.table. |
-p <partitionSpec> |
No | Partition whose cache to remove. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the entire table's cache is removed. |
Examples
Remove the cached data of the entire db1.t2 table:
jindo table -uncache -t db1.t2
Remove the cached data of the date=2020-03-16,category=1 partition of db1.t1:
jindo table -uncache -t db1.t1 -p date=2020-03-16,category=1
-archive
Move data of a table or partition to a lower-cost storage class. The default target is the Archive storage class. Add -i to use Infrequent Access (IA) instead.
Syntax
jindo table -archive [-a|-i] {-t} <dbName.tableName> [-p] <partitionSpec>
Parameters
| Parameter | Required | Description |
|---|---|---|
-a |
No | Archive to the Archive storage class (default behavior). |
-i |
No | Archive to the Infrequent Access (IA) storage class instead of Archive. |
-t <dbName.tableName> |
Yes | Table to archive. Format: database.table. |
-p <partitionSpec> |
No | Partition to archive. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the entire table is archived. |
Example
Archive the date=2020-10-12 partition of db1.t1:
jindo table -archive -t db1.t1 -p date=2020-10-12
-unarchive
Restore archived data to Standard storage class, or change it to IA storage class.
Syntax
jindo table -unarchive [-o|-i] {-t} <dbName.tableName> [-p] <partitionSpec>
Parameters
| Parameter | Required | Description |
|---|---|---|
-o |
No | Temporarily restore an archived object. |
-i |
No | Change an archived object to IA storage class. |
-t <dbName.tableName> |
Yes | Table to unarchive. Format: database.table. |
-p <partitionSpec> |
No | Partition to unarchive. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the entire table is unarchived. |
Examples
Temporarily restore the date=2020-03-16,category=1 partition of db1.t1 from Archive:
jindo table -unarchive -o -t db1.t1 -p date=2020-03-16,category=1
Change the entire db1.t2 table from Archive to IA:
jindo table -unarchive -i -t db1.t2
-status
View the data storage status of a table or partition.
Syntax
jindo table -status {-t} <dbName.tableName> [-p] <partitionSpec>
Parameters
| Parameter | Required | Description |
|---|---|---|
-t <dbName.tableName> |
Yes | Table to inspect. Format: database.table. |
-p <partitionSpec> |
No | Partition to inspect. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the status of the entire table is returned. |
Examples
View the storage status of the entire db1.t2 table:
jindo table -status -t db1.t2
View the storage status of the date=2020-03-16 partition of db1.t1:
jindo table -status -t db1.t1 -p date=2020-03-16
-optimize
Optimize the data organization of a table at the storage layer to improve query performance.
Syntax
jindo table -optimize {-t} <dbName.tableName>
Parameters
| Parameter | Required | Description |
|---|---|---|
-t <dbName.tableName> |
Yes | Table to optimize. Format: database.table. |
Example
Optimize the data organization of db1.t1:
jindo table -optimize -t db1.t1
-showTable
Display all partitions in a partitioned table, or show the data storage details of a non-partitioned table.
Syntax
jindo table -showTable {-t} <dbName.tableName>
Parameters
| Parameter | Required | Description |
|---|---|---|
-t <dbName.tableName> |
Yes | Table to display. Format: database.table. |
Example
Display all partitions in db1.t1:
jindo table -showTable -t db1.t1
-showPartition
Display the data storage details of a specific partition.
Syntax
jindo table -showPartition {-t} <dbName.tableName> [-p] <partitionSpec>
Parameters
| Parameter | Required | Description |
|---|---|---|
-t <dbName.tableName> |
Yes | Table that contains the partition. Format: database.table. |
-p <partitionSpec> |
No | Partition to display. Format: partitionCol1=val1,partitionCol2=val2,.... |
Example
Display the storage details of the date=2020-10-12 partition in db1.t1:
jindo table -showPartition -t db1.t1 -p date=2020-10-12
-listTables
List all tables in a database.
Syntax
jindo table -listTables [-db] <dbName>
Parameters
| Parameter | Required | Description |
|---|---|---|
-db <dbName> |
No | Database to list tables from. If omitted, tables in the default database are listed. |
Examples
List all tables in the default database:
jindo table -listTables
List all tables in db1:
jindo table -listTables -db db1
-dumpmc
Dump a MaxCompute table to an EMR cluster or OSS. Supported output formats are CSV and TFRECORD.
Syntax
jindo table -dumpmc {-i} <accessId> {-k} <accessKey> {-m} <numMaps> {-t} <tunnelUrl> {-project} <projectName> {-table} <tableName> [-p] <partitionSpec> {-f} <csv|tfrecord> {-o} <outputPath>
Parameters
| Parameter | Required | Description |
|---|---|---|
-i <accessId> |
Yes | AccessKey ID of your Alibaba Cloud account. |
-k <accessKey> |
Yes | AccessKey secret of your Alibaba Cloud account. |
-m <numMaps> |
Yes | Number of map tasks. |
-t <tunnelUrl> |
Yes | VPC Tunnel endpoint of MaxCompute. |
-project <projectName> |
Yes | Name of the MaxCompute project. |
-table <tableName> |
Yes | Name of the MaxCompute table. |
-p <partitionSpec> |
No | Partition to dump. Example: pt=xxx. Separate multiple partitions with commas: pt=xxx,dt=xxx. |
-f <csv|tfrecord> |
Yes | Output file format. Valid values: csv, tfrecord. |
-o <outputPath> |
Yes | Destination path. Use a local EMR path (for example, /tmp/output) or an OSS path (for example, oss://bucket/path). |
Examples
Dump a MaxCompute table to an EMR cluster in TFRECORD format:
jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o /tmp/outputtf1 -f tfrecord
Dump a MaxCompute table to OSS in CSV format:
jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o oss://bucket1/tmp/outputcsv -f csv