Upgrade the EMR-HOOK component in an E-MapReduce (EMR) gateway to enable access frequency metrics on the Data Overview tab and data access frequency rules on the Lifecycle Management page in the Data Lake Formation (DLF) console.
The upgrade does not affect running computing tasks. After the upgrade, newly started tasks pick up the changes automatically.
Prerequisites
Before you begin, make sure that:
DLF manages metadata in your EMR cluster
Your gateway was deployed using EMR-CLI
Before you begin
Review the following before running any commands.
JAR file names — Hive
| Hive version | EMR-5.10.1, EMR-3.44.1, and later | Earlier than EMR-3.44.1 |
|---|---|---|
| Hive 2 | hive-hook-hive23.jar | hive-hook-<emrhook-version>-hive23.jar |
| Hive 3 | hive-hook-hive31.jar | hive-hook-<emrhook-version>-hive31.jar |
JAR file names — Spark
| Spark version | EMR-5.10.1, EMR-3.44.1, and later | Earlier than EMR-3.44.1 |
|---|---|---|
| Spark 2 | spark-hook-spark24.jar | spark-hook-<emrhook-version>-spark24.jar |
| Spark 3 | spark-hook-spark30.jar | spark-hook-<emrhook-version>-spark30.jar |
For versions earlier than EMR-3.44.1, replace <emrhook-version> with the EMR-HOOK minor version for your EMR release. For example, EMR-3.43.1 ships EMR-HOOK version 1.1.4, so the Hive 2 JAR file is hive-hook-1.1.4-hive23.jar.
Configuration value separators
| Component | Separator |
|---|---|
| Hive | Comma (,) |
| Spark | Colon (:) |
Upgrade procedure
Select the procedure for your EMR version:
EMR-5.10.1, EMR-3.44.1, and later versions
Step 1: Upgrade the JAR files
Log in to the gateway via SSH using an account with root privileges. Replace <region> with your region ID (for example, cn-hangzhou), then run:
sudo mkdir -p /opt/apps/EMRHOOK/upgrade/
sudo wget https://dlf-repo-<region>.oss-<region>-internal.aliyuncs.com/emrhook/latest/emrhook.tar.gz -P /opt/apps/EMRHOOK/upgrade
sudo tar -p -zxf /opt/apps/EMRHOOK/upgrade/emrhook.tar.gz -C /opt/apps/EMRHOOK/upgrade/
sudo cp -p /opt/apps/EMRHOOK/upgrade/emrhook/* /opt/apps/EMRHOOK/emrhook-current/Step 2: Update Hive configuration
Use the correct <hive-jar> for your Hive version. See Before you begin for the file name.
| Configuration file | Configuration item | Value to append/add |
|---|---|---|
hive-site.xml (/etc/taihao-apps/hive-conf/hive-site.xml) | hive.aux.jars.path | Append ,/opt/apps/EMRHOOK/emrhook-current/<hive-jar> (separator: ,) |
hive-site.xml | hive.exec.post.hooks | Add com.aliyun.emr.meta.hive.hook.LineageLoggerHook |
hive-env.sh (/etc/taihao-apps/hive-conf/hive-env.sh) | hive_aux_jars_path | Append ,/opt/apps/EMRHOOK/emrhook-current/<hive-jar> (separator: ,) |
Step 3: Update Spark configuration
Use the correct <spark-jar> for your Spark version. See Before you begin for the file name.
| Configuration file | Configuration item | Value to append/add |
|---|---|---|
spark-defaults.conf (/etc/taihao-apps/spark-conf/spark-defaults.conf) | spark.driver.extraClassPath | Append :/opt/apps/EMRHOOK/emrhook-current/<spark-jar> (separator: :) |
spark-defaults.conf | spark.executor.extraClassPath | Append :/opt/apps/EMRHOOK/emrhook-current/<spark-jar> (separator: :) |
spark-defaults.conf | spark.sql.queryExecutionListeners | Add com.aliyun.emr.meta.spark.listener.EMRQueryLogger |
Versions earlier than EMR-3.44.1
Step 1: Upgrade the JAR files
Log in to the gateway via SSH using an account with root privileges. Replace
<region>with your region ID (for example,cn-hangzhou), then run the following script to download and extract the latest EMR-HOOK JAR files:sudo mkdir -p /opt/apps/EMRHOOK/upgrade/ sudo wget https://dlf-repo-<region>.oss-<region>-internal.aliyuncs.com/emrhook/latest/emrhook.tar.gz -P /opt/apps/EMRHOOK/upgrade sudo tar -p -zxf /opt/apps/EMRHOOK/upgrade/emrhook.tar.gz -C /opt/apps/EMRHOOK/upgrade/Rename the extracted JAR files to match the EMR-HOOK minor version for your EMR release. The following example uses EMR-3.43.1, which ships EMR-HOOK version
1.1.4:cd /opt/apps/EMRHOOK/upgrade/emrhook mv hive-hook-hive20.jar hive-hook-1.1.4-hive20.jar mv hive-hook-hive23.jar hive-hook-1.1.4-hive23.jar mv hive-hook-hive31.jar hive-hook-1.1.4-hive31.jar mv spark-hook-spark24.jar spark-hook-1.1.4-spark24.jar mv spark-hook-spark30.jar spark-hook-1.1.4-spark30.jar
Copy the renamed JAR files to the active directory:
sudo cp -p /opt/apps/EMRHOOK/upgrade/emrhook/* /opt/apps/EMRHOOK/emrhook-current/
Step 2: Update Hive configuration
Use the correct <hive-jar> for your Hive version. The file name includes the EMR-HOOK version. See Before you begin for the file name.
| Configuration file | Configuration item | Value to append/add |
|---|---|---|
hive-site.xml (/etc/taihao-apps/hive-conf/hive-site.xml) | hive.aux.jars.path | Append ,/opt/apps/EMRHOOK/emrhook-current/<hive-jar> (separator: ,) |
hive-site.xml | hive.exec.post.hooks | Add com.aliyun.emr.meta.hive.hook.LineageLoggerHook |
hive-env.sh (/etc/taihao-apps/hive-conf/hive-env.sh) | hive_aux_jars_path | Append ,/opt/apps/EMRHOOK/emrhook-current/<hive-jar> (separator: ,) |
Step 3: Update Spark configuration
Use the correct <spark-jar> for your Spark version. The file name includes the EMR-HOOK version. See Before you begin for the file name.
| Configuration file | Configuration item | Value to append/add |
|---|---|---|
spark-defaults.conf (/etc/taihao-apps/spark-conf/spark-defaults.conf) | spark.driver.extraClassPath | Append :/opt/apps/EMRHOOK/emrhook-current/<spark-jar> (separator: :) |
spark-defaults.conf | spark.executor.extraClassPath | Append :/opt/apps/EMRHOOK/emrhook-current/<spark-jar> (separator: :) |
spark-defaults.conf | spark.sql.queryExecutionListeners | Add com.aliyun.emr.meta.spark.listener.EMRQueryLogger |