Collecting access frequency for JindoTable tables or partitions helps you distinguish between cold and hot data, reduce storage costs, and improve cache efficiency.
Prerequisites
You need an Alibaba Cloud E-MapReduce cluster. For more information, see Create a cluster.
Background
JindoTable supports collecting access records for Hive tables. The collected data is stored in the namespace of the SmartData service.
SmartData 3.2.x and later support access frequency collection for the Spark, Hive, and Presto engines. This feature is enabled by default for Spark and Presto. To disable it, see Disable access frequency collection. This feature is disabled by default for Hive. To enable it, see Enable access frequency collection for Hive.
Query data
You can use a JindoTable command to query access frequency.
-
Syntax
jindo table -accessStat <-d [days]> <-n [topNums]>The values for
daysandtopNumsmust be positive integers. If you setdaysto 1, the command queries all access records from 00:00 (local time) on the current day to the current time. -
Function
Queries the top N most frequently accessed tables or partitions within a specified time range.
-
Example: Query the 20 most frequently accessed tables or partitions in the last seven days.
jindo table -accessStat -d 7 -n 20
For more information about JindoTable, see the JindoTable User Guide.
Enable access frequency collection for Hive
-
Log on to the Alibaba Cloud E-MapReduce console.
-
In the top navigation bar, select your region and resource group.
-
Click the Clusters tab.
-
On the Clusters page, find your cluster and click Details in the Actions column.
-
Modify the Hive service configuration.
-
In the navigation pane on the left, choose .
-
On the Hive service page, click the Configure tab.
-
Search for the hive.exec.post.hooks parameter and append com.aliyun.emr.table.hive.HivePostHook to its value.
-
-
Save the configuration.
-
In the upper-right corner, click Save.
-
In the Confirm dialog box, enter an Execution Reason and enable auto-update configuration.
-
Click OK.
-
-
Restart the service.
-
On the Hive service page, choose in the upper-right corner.
-
In the Execute Cluster Operation dialog box, enter an Execution Reason.
-
Click OK.
-
In the Confirm dialog box, click OK.
-
Disable access frequency collection
-
Log on to the Alibaba Cloud E-MapReduce console.
-
In the top navigation bar, select your region and resource group.
-
Click the Clusters tab.
-
On the Clusters page, find your cluster and click Details in the Actions column.
-
Modify parameter values.
-
Hive service:
-
In the navigation pane on the left, choose .
-
On the Hive service page, click the Configure tab.
-
Search for the hive.exec.post.hooks parameter and remove com.aliyun.emr.table.hive.HivePostHook from its value. To find this parameter, in the Configuration Search box, enter
hive.exec.post.hooks. The parameter is in the hive-site section.
-
-
Spark service:
-
In the navigation pane on the left, choose .
-
On the Spark service page, click the Configure tab.
-
Search for the spark.sql.queryExecutionListeners parameter and remove com.aliyun.emr.table.spark.SparkSQLQueryListener from its value. To find this parameter, enter
spark.sql.queryExecutionListenersin the Configuration Search box. The parameter is in the spark-defaults section, and its current value iscom.aliyun.emr.table.spark.SparkSQLQueryListener.
-
-
Presto service:
-
In the navigation pane on the left, choose .
-
On the Presto service page, click the Configure tab.
-
Search for the event-listener.name parameter and clear its value.
-
-
-
Save the configuration.
-
In the upper-right corner, click Save.
-
In the Confirm dialog box, enter an Execution Reason and enable auto-update configuration.
-
Click OK.
-
-
Restart the services.
-
Hive service:
-
On the Hive service page, choose in the upper-right corner.
-
In the Execute Cluster Operation dialog box, enter an Execution Reason.
-
Click OK.
-
In the Confirm dialog box, click OK.
-
-
Spark service:
-
On the Spark service page, choose in the upper-right corner.
-
In the Execute Cluster Operation dialog box, enter an Execution Reason.
-
Click OK.
-
In the Confirm dialog box, click OK.
-
-
Presto service:
-
On the Presto service page, choose in the upper-right corner.
-
In the Execute Cluster Operation dialog box, enter an Execution Reason.
-
Click OK.
-
In the Confirm dialog box, click OK.
-
-