All Products
Search
Document Center

E-MapReduce:Use JindoTable

Last Updated:Mar 26, 2026

JindoTable is used to implement tiered storage, optimize table files, and collect data statistics based on the popularity of tables or partitions. This topic describes how to use JindoTable.

Prerequisites

Before you begin, ensure that you have:

  • JDK 8 installed on your on-premises machine

  • An EMR cluster of version 3.30.0 or later. For more information, see Create a cluster.

Usage notes

  • Specify tables in the format database.table.

  • Specify partitions in the format partitionCol1=1,partitionCol2=2,....

Commands

JindoTable supports the following commands:

  • -accessStat: Query the most-accessed tables or partitions in a time range

  • -cache: Cache table or partition data to local disks

  • -uncache: Remove cached data from local disks

  • -archive: Move table or partition data to a lower-cost storage class

  • -unarchive: Restore archived data to Standard or Infrequent Access storage

  • -status: View the storage status of a table or partition

  • -optimize: Optimize data organization at the storage layer

  • -showTable: List partitions in a table or view storage of a non-partitioned table

  • -showPartition: View storage details for a specific partition

  • -listTables: List all tables in a database

  • -dumpmc: Dump MaxCompute tables to an EMR cluster or OSS

-accessStat

Use this command to identify which tables or partitions have the highest access frequency in a given time range. This helps you determine which data to cache for performance or which data is cold enough to archive.

Syntax

jindo table -accessStat -d <days> -n <topNums>

Parameters

Parameter Description Required
-d <days> Number of days to look back. Must be a positive integer. If set to 1, all access records from 00:00 (local time) on the current day to the current time are returned. Yes
-n <topNums> Number of top results to return. Must be a positive integer. Yes

Example

Return the top 20 most-accessed tables or partitions in the last 7 days:

jindo table -accessStat -d 7 -n 20

-cache

Use this command to cache data of a table or partition to local disks. This speeds up subsequent queries on frequently accessed data stored in Object Storage Service (OSS) or JindoFileSystem (JindoFS).

Syntax

jindo table -cache -t <dbName.tableName> [-p <partitionSpec>] [-pin]

Parameters

Parameter Description Required
-t <dbName.tableName> The table to cache. Use the format database.table. Yes
-p <partitionSpec> The partition to cache. Use the format partitionCol1=1,partitionCol2=2,.... No
-pin When set, pinned data is not evicted even if cache space is insufficient. No
Data must be stored in OSS or JindoFS to be cached.

Example

Cache the March 16, 2020 partition of db1.t1:

jindo table -cache -t db1.t1 -p date=2020-03-16

-uncache

Use this command to remove cached data of a table or partition from local disks, freeing up cache space.

Syntax

jindo table -uncache -t <dbName.tableName> [-p <partitionSpec>]

Parameters

Parameter Description Required
-t <dbName.tableName> The table whose cache to remove. Use the format database.table. Yes
-p <partitionSpec> The partition whose cache to remove. Use the format partitionCol1=1,partitionCol2=2,.... No
Data must be stored in OSS or JindoFS.

Examples

Remove all cached data for db1.t2:

jindo table -uncache -t db1.t2

Remove cached data for a specific partition of db1.t1:

jindo table -uncache -t db1.t1 -p date=2020-03-16,category=1

-archive

Use this command to move data of a table or partition to a lower-cost storage class. Use Archive for data that is rarely accessed. Use Infrequent Access (IA) for data that is accessed less frequently but may still need to be retrieved without a restore step.

Syntax

jindo table -archive [-a | -i] -t <dbName.tableName> [-p <partitionSpec>]

Parameters

Parameter Description Required
-a Move to Archive storage class. This is the default if neither -a nor -i is specified. No
-i Move to Infrequent Access (IA) storage class instead of Archive. No
-t <dbName.tableName> The table to archive. Use the format database.table. Yes
-p <partitionSpec> The partition to archive. Use the format partitionCol1=1,partitionCol2=2,.... No

Example

Move the October 12, 2020 partition of db1.t1 to Archive storage:

jindo table -archive -t db1.t1 -p date=2020-10-12

-unarchive

Use this command to restore archived data. Temporarily restore an Archived object for retrieval, or permanently change it to a lower-cost active storage class.

Syntax

jindo table -unarchive [-o | -i] -t <dbName.tableName> [-p <partitionSpec>]

Parameters

Parameter Description Required
-o Temporarily restore an Archived object. No
-i Change an Archived object to Infrequent Access (IA) storage class. No
-t <dbName.tableName> The table to restore. Use the format database.table. Yes
-p <partitionSpec> The partition to restore. Use the format partitionCol1=1,partitionCol2=2,.... No

Examples

Temporarily restore an Archived partition of db1.t1:

jindo table -unarchive -o -t db1.t1 -p date=2020-03-16,category=1

Change db1.t2 from Archive to Infrequent Access storage class:

jindo table -unarchive -i -t db1.t2

-status

Use this command to check the current storage status of a table or partition, including which storage class the data is in.

Syntax

jindo table -status -t <dbName.tableName> [-p <partitionSpec>]

Parameters

Parameter Description Required
-t <dbName.tableName> The table to check. Use the format database.table. Yes
-p <partitionSpec> The partition to check. Use the format partitionCol1=1,partitionCol2=2,.... No

Examples

View the storage status of db1.t2:

jindo table -status -t db1.t2

View the storage status of the March 16, 2020 partition of db1.t1:

jindo table -status -t db1.t1 -p date=2020-03-16

-optimize

Use this command to optimize the data organization of a table at the storage layer.

Syntax

jindo table -optimize -t <dbName.tableName>

Parameters

Parameter Description Required
-t <dbName.tableName> The table to optimize. Use the format database.table. Yes

Example

Optimize the storage organization of db1.t1:

jindo table -optimize -t db1.t1

-showTable

Use this command to list all partitions in a partitioned table, or view the data storage details of a non-partitioned table.

Syntax

jindo table -showTable -t <dbName.tableName>

Parameters

Parameter Description Required
-t <dbName.tableName> The table to display. Use the format database.table. Yes

Example

List all partitions in db1.t1:

jindo table -showTable -t db1.t1

-showPartition

Use this command to view the storage details of a specific partition.

Syntax

jindo table -showPartition -t <dbName.tableName> [-p <partitionSpec>]

Parameters

Parameter Description Required
-t <dbName.tableName> The table containing the partition. Use the format database.table. Yes
-p <partitionSpec> The partition to display. Use the format partitionCol1=1,partitionCol2=2,.... No

Example

View the storage details of the October 12, 2020 partition of db1.t1:

jindo table -showPartition -t db1.t1 -p date=2020-10-12

-listTables

Use this command to list all tables in a database. If no database is specified, tables in the default database are returned.

Syntax

jindo table -listTables [-db <dbName>]

Parameters

Parameter Description Required
-db <dbName> The database to list tables from. If omitted, the default database is used. No

Examples

List all tables in the default database:

jindo table -listTables

List all tables in db1:

jindo table -listTables -db db1

-dumpmc

Use this command to dump a MaxCompute table to an EMR cluster or OSS bucket. Both CSV and TFRecord formats are supported.

Syntax

jindo table -dumpmc -i <accessId> -k <accessKey> -m <numMaps> -t <tunnelUrl> -project <projectName> -table <tableName> [-p <partitionSpec>] -f <csv|tfrecord> -o <outputPath>

Parameters

Parameter Description Required
-i <accessId> The AccessKey ID of your Alibaba Cloud account. Yes
-k <accessKey> The AccessKey secret of your Alibaba Cloud account. Yes
-m <numMaps> The number of map tasks. Yes
-t <tunnelUrl> The VPC Tunnel endpoint of MaxCompute. Yes
-project <projectName> The name of the MaxCompute project. Yes
-table <tableName> The name of the MaxCompute table. Yes
-p <partitionSpec> The partition to dump. Example: pt=xxx. Separate multiple partitions with commas, for example, pt=xxx,dt=xxx. No
-f <csv|tfrecord> The output file format. Valid values: csv, tfrecord. Yes
-o <outputPath> The destination path. Use a local path for an EMR cluster or an OSS path (for example, oss://bucket/path) for OSS. Yes

Examples

Dump a MaxCompute table in TFRecord format to an EMR cluster:

jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o /tmp/outputtf1 -f tfrecord

Dump a MaxCompute table in CSV format to OSS:

jindo table -dumpmc -m 10 -project mctest_project -table t1 -t http://dt.xxx.maxcompute.aliyun-inc.com -k xxxxxxxxx -i XXXXXX -o oss://bucket1/tmp/outputcsv -f csv