You can use JindoTable to collect access frequency statistics of tables and partitions, and separate cold and hot data based on the statistics. This helps you reduce storage costs and improves cache usage efficiency.

Collect access records

In SmartData 3.1.X, Spark and Hive support this feature. In SmartData 3.2.X, Spark, Hive, and Presto support this feature. JindoTable can be used to collect access records of Hive tables. The collected data is saved in the namespaces of the SmartData service of a cluster.

By default, the collection of access records is enabled. If you want to disable this feature, perform the operations that are described in Disable the collection of access records.

Query access frequency statistics

JindoTable can be used to run a command to query access frequency statistics.
  • Syntax
    jindo table -accessStat <-d [days]> <-n [topNums]>

    Set days and topNums to positive integers. For example, if days is set to 1, all the access records generated from 00:00 (local time) on the current day to the current time are queried.

  • Description

    This command is used to query a specific number of access records of tables or partitions that are most frequently visited within a specified period of time.

  • Example: Query 20 access records of tables or partitions that are most frequently visited within the last seven days.
    jindo table -accessStat -d 7 -n 20

For more information about how to use JindoTable, see Use JindoTable.

Disable the collection of access records

  1. Log on to the Alibaba Cloud EMR console.
  2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
  3. Click the Cluster Management tab.
  4. On the Cluster Management page, find your cluster and click Details in the Actions column.
  5. Modify parameters.
    • Hive:
      1. In the left-side navigation pane, choose Cluster Service > Hive.
      2. On the Hive service page, click the Configure tab.
      3. Find the hive.exec.post.hooks parameter and delete com.aliyun.emr.table.hive.HivePostHook from the parameter value. hive-site
    • Spark:
      1. In the left-side navigation pane, choose Cluster Service > Spark.
      2. On the Spark service page, click the Configure tab.
      3. Find the spark.sql.queryExecutionListeners parameter and delete com.aliyun.emr.table.spark.SparkSQLQueryListener from the parameter value. spark_default
    • Presto:
      1. In the left-side navigation pane, choose Cluster Service > Presto.
      2. On the Presto service page, click the Configure tab.
      3. Find the event-listener.name parameter and delete the parameter value.
  6. Save the configurations.
    1. In the upper-right corner of the Service Configuration section, click Save.
    2. In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
    3. Click OK.
  7. Restart the related service.
    • Hive:
      1. In the upper-right corner of the Hive service page, choose Actions > Restart HiveServer2.
      2. In the Cluster Activities dialog box, specify Description.
      3. Click OK.
      4. In the Confirm message, click OK.
    • Spark:
      1. In the upper-right corner of the Spark service page, choose Actions > Restart ThriftServer.
      2. In the Cluster Activities dialog box, specify Description.
      3. Click OK.
      4. In the Confirm message, click OK.
    • Presto:
      1. In the upper-right corner of the Presto service page, choose Actions > Restart All Components.
      2. In the Cluster Activities dialog box, specify Description.
      3. Click OK.
      4. In the Confirm message, click OK.