All Products
Search
Document Center

E-MapReduce:Use JindoTable to collect access frequency statistics on tables and partitions

Last Updated:Mar 26, 2026

Use JindoTable to collect access frequency statistics on tables and partitions, then separate cold and hot data based on those statistics to reduce storage costs and improve cache efficiency.

Prerequisites

Before you begin, ensure that you have:

How it works

JindoTable tracks how often each table and partition is accessed and stores the statistics in the namespaces of the SmartData service on your cluster.

SmartData 3.2.X and later supports statistics collection for Spark, Hive, and Presto. The following table summarizes the default state and the configuration parameter for each engine:

Engine Default state Configuration parameter
Spark Enabled spark.sql.queryExecutionListeners
Presto Enabled event-listener.name
Hive Disabled hive.exec.post.hooks

To enable statistics collection for Hive, see Enable statistics collection for Hive. To disable it for Spark or Presto, see Disable statistics collection.

Query access frequency statistics

Use the jindo table -accessStat command to answer questions such as:

  • Which tables or partitions are accessed most this week?

  • Which partitions are good candidates for cold storage migration?

Syntax

jindo table -accessStat <-d [days]> <-n [topNums]>

Both days and topNums must be positive integers. If you omit -n, the command returns all tables and partitions accessed in the specified period.

Parameters

Parameter Description Example
-d Number of days to look back. A value of 1 covers from 00:00 on the current day to the current time. 7
-n Number of top results to return, ranked by access frequency. 20

Example

Run the following command to get the top 20 tables or partitions by access frequency over the last seven days:

jindo table -accessStat -d 7 -n 20

For more information about JindoTable, see Use JindoTable.

Enable statistics collection for Hive

  1. Log on to the Alibaba Cloud EMR console.

  2. In the top navigation bar, select the region where your cluster resides and select a resource group.

  3. Click the Cluster Management tab.

  4. On the Cluster Management page, find your cluster and click Details in the Actions column.

  5. In the left-side navigation pane, choose Cluster Service > Hive.

  6. On the Hive service page, click the Configure tab.

  7. Search for the hive.exec.post.hooks parameter and append com.aliyun.emr.table.hive.HivePostHook to the existing value, separated by a comma (,).

  8. Save the configuration:

    1. In the upper-right corner of the Service Configuration section, click Save.

    2. In the Confirm Changes dialog box, enter a description and turn on Auto-update Configuration.

    3. Click OK.

  9. Restart HiveServer2:

    1. In the upper-right corner of the Hive service page, choose Actions > Restart HiveServer2.

    2. In the Cluster Activities dialog box, enter a description and click OK.

    3. In the confirmation message, click OK.

Disable statistics collection

Follow the steps below to disable statistics collection for the relevant engine.

  1. Log on to the Alibaba Cloud EMR console.

  2. In the top navigation bar, select the region where your cluster resides and select a resource group.

  3. Click the Cluster Management tab.

  4. On the Cluster Management page, find your cluster and click Details in the Actions column.

  5. Navigate to the service configuration page for your engine and remove the relevant parameter value:

    • Hive: Choose Cluster Service > Hive > Configure tab. Search for hive.exec.post.hooks and remove com.aliyun.emr.table.hive.HivePostHook from the value. hive-site

    • Spark: Choose Cluster Service > Spark > Configure tab. Search for spark.sql.queryExecutionListeners and remove com.aliyun.emr.table.spark.SparkSQLQueryListener from the value. spark_default

    • Presto: Choose Cluster Service > Presto > Configure tab. Search for event-listener.name and clear the parameter value.

  6. Save the configuration:

    1. In the upper-right corner of the Service Configuration section, click Save.

    2. In the Confirm Changes dialog box, enter a description and turn on Auto-update Configuration.

    3. Click OK.

  7. Restart the service for your engine:

    • Hive: Choose Actions > Restart HiveServer2. In the Cluster Activities dialog box, enter a description, click OK, then click OK in the confirmation message.

    • Spark: Choose Actions > Restart ThriftServer. In the Cluster Activities dialog box, enter a description, click OK, then click OK in the confirmation message.

    • Presto: Choose Actions > Restart All Components. In the Cluster Activities dialog box, enter a description, click OK, then click OK in the confirmation message.

What's next

  • Use the access frequency statistics from jindo table -accessStat to identify infrequently accessed tables and partitions, then move them to cold storage to reduce costs.

  • For a full reference of JindoTable commands and tiered storage operations, see Use JindoTable.