You can use JindoTable to collect access frequency statistics of tables and partitions, and separate cold and hot data based on the statistics. This helps you reduce storage costs and improves cache usage efficiency.

Collect access records

JindoTable allows you to collect access records of Hive tables. The Spark and Hive engines support this feature. The collected data is saved in the namespaces of the SmartData service of a cluster.

By default, the collection of access records is enabled. If you want to disable this feature, perform the operations described in Disable the collection of access records.

Query access frequency statistics

JindoTable allows you to run a command to query access frequency statistics.
  • Syntax
    jindo table -accessStat <-d [days]> <-n [topNums]>

    Set days and topNums to positive integers. If days is set to 1, all the access records generated from 0:00 (local time) on the current day to the current time are queried.

  • Description

    This command is used to query a specific number of access records of tables or partitions that are most frequently visited within a specified time range.

  • Example: Query 20 access records of tables or partitions that are most frequently visited within the last seven days.
    jindo table -accessStat -d 7 -n 20

For more information about how to use JindoTable, see Use JindoTable.

Disable the collection of access records

  1. Log on to the Alibaba Cloud EMR console.
  2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
  3. Click the Cluster Management tab.
  4. On the Cluster Management page, find your cluster and click Details in the Actions column.
  5. Modify parameters.
    Notice You can perform the following operations to delete part of the value of the related parameter.
    • Hive:
      1. In the left-side navigation pane, choose Cluster Service > Hive.
      2. Click the Configure tab.
      3. Click the hive-site tab in the Service Configuration section.
      4. Search for the hive.exec.post.hooks parameter in the Configuration Filter section and delete com.aliyun.emr.table.hive.HivePostHook from the parameter value.hive-site
    • Spark:
      1. In the left-side navigation pane, choose Cluster Service > Spark.
      2. Click the Configure tab.
      3. Click the spark-defaults tab in the Service Configuration section.
      4. Search for the spark.sql.queryExecutionListeners parameter in the Configuration Filter section and delete com.aliyun.emr.table.spark.SparkSQLQueryListener from the parameter value.spark_default
  6. Save the configurations.
    1. In the upper-right corner of the Service Configuration section, click Save.
    2. In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
    3. Click OK.
  7. Restart the related service.
    • Hive:
      1. In the upper-right corner of the page, choose Actions > Restart HiveServer2.
      2. In the Cluster Activities dialog box, specify the related parameters.
      3. Click OK.
      4. In the Confirm message, click OK.
    • Spark:
      1. In the upper-right corner of the page, choose Actions > Restart ThriftServer.
      2. In the Cluster Activities dialog box, specify the related parameters.
      3. Click OK.
      4. In the Confirm message, click OK.