This topic describes how to create an E-MapReduce (EMR) Hive node. EMR Hive nodes allow you to use SQL-like statements to read data from, write data to, and manage data warehouses with large volumes of data stored in a distributed storage system. You can use EMR Hive nodes to efficiently analyze large amounts of log data.
Prerequisites
- An EMR cluster is created. The inbound rules of the security group to which the cluster
belongs include the following rules:
- Action: Allow
- Protocol type: Custom TCP
- Port range: 8898/8898
- Authorization object: 100.104.0.0/16
- An EMR compute engine instance is bound to the required workspace. The EMR option is displayed only after you bind an EMR compute engine instance to the workspace on the Workspace Management page. For more information, see Configure a workspace.
- If you integrate Hive with Ranger in EMR, you need to modify whitelist configurations and restart Hive before you develop
EMR nodes in DataWorks. Otherwise, the error message Cannot modify spark.yarn.queue at runtime or Cannot modify SKYNET_BIZDATE at runtime is returned when you run EMR nodes.
- You can modify the whitelist configurations by using custom parameters in EMR. Append
key-value pairs to the value of a custom parameter. In this example, the custom parameter
for Hive components is used. The following sample code provides an example:
hive.security.authorization.sqlstd.confwhitelist.append=tez.*|spark.*|mapred.*|mapreduce.*|ALISA.*|SKYNET.*
Note In the code, ALISA.*and SKYNET.*are special configurations for DataWorks. - After the whitelist configurations are modified, restart the Hive service to make the configurations take effect. For more information about how to restart a service, see Restart a service.
- You can modify the whitelist configurations by using custom parameters in EMR. Append
key-value pairs to the value of a custom parameter. In this example, the custom parameter
for Hive components is used. The following sample code provides an example: