All Products
Search
Document Center

E-MapReduce:JindoTable user guide

Last Updated:Mar 26, 2026

JindoTable is used to implement tiered storage, optimize table files, and collect data statistics based on the popularity of tables or partitions. This topic describes how to use JindoTable.

Prerequisites

Before you begin, ensure that you have:

  • Java Development Kit (JDK) 8 installed on your on-premises machine

  • An E-MapReduce (EMR) cluster of version 3.30.0 or later (Create a cluster)

How it works

JindoTable commands follow a storage management workflow:

  1. Run -accessStat to identify which tables or partitions are accessed most frequently.

  2. Run -cache to pull hot data onto local disks, or -archive to move cold data to a lower-cost storage class.

  3. Run -status to verify the current storage state of a table or partition.

  4. Run -optimize to optimize the data organization of tables at the storage layer.

Important Specify tables in the format database.table. Specify partitions in the format partitionCol1=val1,partitionCol2=val2,....

Command reference

JindoTable provides 11 commands. In the syntax descriptions, {-flag} indicates a required parameter and [-flag] indicates an optional parameter.

Command Description
-accessStat Query the most-accessed tables or partitions in a time range
-cache Cache table or partition data to local disks
-uncache Remove cached data from local disks
-archive Move data to Archive or Infrequent Access (IA) storage class
-unarchive Restore archived data to Standard storage class
-status View the storage status of a table or partition
-optimize Optimize table data organization at the storage layer
-showTable List all partitions in a partitioned table, or show storage details of a non-partitioned table
-showPartition Show storage details of a specific partition
-listTables List all tables in a database
-dumpmc Dump a MaxCompute table to an EMR cluster or Object Storage Service (OSS)

-accessStat

Query the tables or partitions with the most access records in a specified time range.

Syntax

jindo table -accessStat {-d} <days> {-n} <topNums>

Parameters

Parameter Required Description
-d <days> Yes Number of days to look back. Must be a positive integer. If set to 1, all access records from 00:00 local time on the current day to the current time are returned.
-n <topNums> Yes Number of top results to return. Must be a positive integer.

Example

Return the top 20 most-accessed tables or partitions in the last 7 days:

jindo table -accessStat -d 7 -n 20

-cache

Cache data of a table or partition from OSS or JindoFileSystem (JindoFS) to local disks.

Syntax

jindo table -cache {-t} <dbName.tableName> [-p] <partitionSpec> [-pin]

Parameters

Parameter Required Description
-t <dbName.tableName> Yes Table to cache. Format: database.table.
-p <partitionSpec> No Partition to cache. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the entire table is cached.
-pin No If cache space is insufficient, do not delete related data if possible.

Example

Cache the date=2020-03-16 partition of db1.t1:

jindo table -cache -t db1.t1 -p date=2020-03-16

-uncache

Remove cached data of a table or partition from local disks.

Syntax

jindo table -uncache {-t} <dbName.tableName> [-p] <partitionSpec>

Parameters

Parameter Required Description
-t <dbName.tableName> Yes Table whose cache to remove. Format: database.table.
-p <partitionSpec> No Partition whose cache to remove. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the entire table's cache is removed.

Examples

Remove the cached data of the entire db1.t2 table:

jindo table -uncache -t db1.t2

Remove the cached data of the date=2020-03-16,category=1 partition of db1.t1:

jindo table -uncache -t db1.t1 -p date=2020-03-16,category=1

-archive

Move data of a table or partition to a lower-cost storage class. The default target is the Archive storage class. Add -i to use Infrequent Access (IA) instead.

Syntax

jindo table -archive [-a|-i] {-t} <dbName.tableName> [-p] <partitionSpec>

Parameters

Parameter Required Description
-a No Archive to the Archive storage class (default behavior).
-i No Archive to the Infrequent Access (IA) storage class instead of Archive.
-t <dbName.tableName> Yes Table to archive. Format: database.table.
-p <partitionSpec> No Partition to archive. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the entire table is archived.

Example

Archive the date=2020-10-12 partition of db1.t1:

jindo table -archive -t db1.t1 -p date=2020-10-12

-unarchive

Restore archived data to Standard storage class, or change it to IA storage class.

Syntax

jindo table -unarchive [-o|-i] {-t} <dbName.tableName> [-p] <partitionSpec>

Parameters

Parameter Required Description
-o No Temporarily restore an archived object.
-i No Change an archived object to IA storage class.
-t <dbName.tableName> Yes Table to unarchive. Format: database.table.
-p <partitionSpec> No Partition to unarchive. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the entire table is unarchived.

Examples

Temporarily restore the date=2020-03-16,category=1 partition of db1.t1 from Archive:

jindo table -unarchive -o -t db1.t1 -p date=2020-03-16,category=1

Change the entire db1.t2 table from Archive to IA:

jindo table -unarchive -i -t db1.t2

-status

View the data storage status of a table or partition.

Syntax

jindo table -status {-t} <dbName.tableName> [-p] <partitionSpec>

Parameters

Parameter Required Description
-t <dbName.tableName> Yes Table to inspect. Format: database.table.
-p <partitionSpec> No Partition to inspect. Format: partitionCol1=val1,partitionCol2=val2,.... If omitted, the status of the entire table is returned.

Examples

View the storage status of the entire db1.t2 table:

jindo table -status -t db1.t2

View the storage status of the date=2020-03-16 partition of db1.t1:

jindo table -status -t db1.t1 -p date=2020-03-16

-optimize

Optimize the data organization of a table at the storage layer to improve query performance.

Syntax

jindo table -optimize {-t} <dbName.tableName>

Parameters

Parameter Required Description
-t <dbName.tableName> Yes Table to optimize. Format: database.table.

Example

Optimize the data organization of db1.t1:

jindo table -optimize -t db1.t1

-showTable

Display all partitions in a partitioned table, or show the data storage details of a non-partitioned table.

Syntax

jindo table -showTable {-t} <dbName.tableName>

Parameters

Parameter Required Description
-t <dbName.tableName> Yes Table to display. Format: database.table.

Example

Display all partitions in db1.t1:

jindo table -showTable -t db1.t1

-showPartition

Display the data storage details of a specific partition.

Syntax

jindo table -showPartition {-t} <dbName.tableName> [-p] <partitionSpec>

Parameters

Parameter Required Description
-t <dbName.tableName> Yes Table that contains the partition. Format: database.table.
-p <partitionSpec> No Partition to display. Format: partitionCol1=val1,partitionCol2=val2,....

Example

Display the storage details of the date=2020-10-12 partition in db1.t1:

jindo table -showPartition -t db1.t1 -p date=2020-10-12

-listTables

List all tables in a database.

Syntax

jindo table -listTables [-db] <dbName>

Parameters

Parameter Required Description
-db <dbName> No Database to list tables from. If omitted, tables in the default database are listed.

Examples

List all tables in the default database:

jindo table -listTables

List all tables in db1:

jindo table -listTables -db db1

-dumpmc

Dump a MaxCompute table to an EMR cluster or OSS. Supported output formats are CSV and TFRECORD.

Syntax

jindo table -dumpmc {-i} <accessId> {-k} <accessKey> {-m} <numMaps> {-t} <tunnelUrl> {-project} <projectName> {-table} <tableName> [-p] <partitionSpec> {-f} <csv|tfrecord> {-o} <outputPath>

Parameters

Parameter Required Description
-i <accessId> Yes AccessKey ID of your Alibaba Cloud account.
-k <accessKey> Yes AccessKey secret of your Alibaba Cloud account.
-m <numMaps> Yes Number of map tasks.
-t <tunnelUrl> Yes VPC Tunnel endpoint of MaxCompute.
-project <projectName> Yes Name of the MaxCompute project.
-table <tableName> Yes Name of the MaxCompute table.
-p <partitionSpec> No Partition to dump. Example: pt=xxx. Separate multiple partitions with commas: pt=xxx,dt=xxx.
-f <csv|tfrecord> Yes Output file format. Valid values: csv, tfrecord.
-o <outputPath> Yes Destination path. Use a local EMR path (for example, /tmp/output) or an OSS path (for example, oss://bucket/path).

Examples

Dump a MaxCompute table to an EMR cluster in TFRECORD format:

jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o /tmp/outputtf1 -f tfrecord

Dump a MaxCompute table to OSS in CSV format:

jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o oss://bucket1/tmp/outputcsv -f csv