This topic describes how to use Hive on an E-MapReduce (EMR) cluster to process data stored in OSS-HDFS.
Prerequisites
Before you begin, make sure that you have:
An EMR cluster of version V3.42.0 or later, or V5.8.0 or later, created with the Hive service selected. For more information, see Create a cluster.
OSS-HDFS enabled for a bucket with access permissions granted. For more information, see Enable OSS-HDFS and grant access permissions.
Add partitions to a Hive table backed by OSS-HDFS
This procedure walks you through logging on to an EMR cluster, connecting to the Hive CLI, and adding partitions that map to OSS-HDFS paths:
Log on to the EMR cluster: Access the cluster node through the EMR console and open a terminal session.
Connect to Hive: Launch the Hive CLI on the cluster terminal.
Add partitions to an existing table: Use the
ALTER TABLE ... ADD PARTITIONcommand to map Hive table partitions to OSS-HDFS paths.
Step 1: Log on to the EMR cluster
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
Click the EMR cluster that you created.
Click the Nodes tab, and then click
on the left side of the node group.Click the ID of the ECS instance. On the Instances page, click Connect next to the instance ID.
For more information about logging on from Windows or Linux using an SSH key pair or SSH password, see Log on to a cluster.
Step 2: Connect to Hive
On the terminal, run the following command to start the Hive CLI:
hiveFor other connection methods, see Connect to Hive.
Step 3: Add partitions to an existing table
Adding partitions stores table data in smaller units. Queries that specify partition conditions scan only matching partitions, which improves query performance.
Syntax
ALTER TABLE <table_name> ADD [IF NOT EXISTS] PARTITION <pt_spec> [PARTITION <pt_spec> ...] LOCATION 'location';Parameters
| Parameter | Required | Description |
|---|---|---|
table_name | Yes | The name of the table to which you want to add partitions. |
IF NOT EXISTS | No | Skips the operation without returning an error if a partition with the same name already exists. If omitted, adding a duplicate partition name causes the operation to fail. |
pt_spec | Yes | The partition specification in the format (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...). Partition key column names are case-insensitive; values are case-sensitive. |
location | Yes | The OSS path used to store data in the partition. |
Example
The following example adds a partition to the sale_detail table for the China (Hangzhou) region in December 2021:
ALTER TABLE sale_detail ADD IF NOT EXISTS PARTITION (sale_date='202112', region='hangzhou') LOCATION 'oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/path/2021/';