After integrating Impala with Kudu, you can use Impala SQL to query and manage data in Kudu tables. This topic describes how to connect Impala to a Kudu cluster using the E-MapReduce (EMR) console or the CLI.
Prerequisites
Before you begin, ensure that you have:
An EMR cluster with Impala and Kudu selected as optional services. For more information, see Create a cluster.
How it works
There are two ways to tell Impala where the Kudu master nodes are:
Global flag (
kudu_master_hosts): Set once in the Impala service configuration. All Kudu tables created through Impala automatically use this setting.Per-table property (
kudu.master_addresses): Specified in theTBLPROPERTIESclause of eachCREATE TABLEstatement. Use this approach when you configure Impala through the CLI without setting the global flag.
Integrate Impala with Kudu using the EMR console
Step 1: Configure the Impala service
Go to the Configure tab of the Impala service page. For more information, see Manage configuration items.
Click impalad.flgs, then click Add Configuration Item. Add the following configuration item:
Parameter Value kudu_master_hostsmaster-1-1:7051kudu_master_hostsspecifies the hostname and port of the Kudu master node. For multiple master nodes, separate eachhostname:portpair with a comma — for example:master-1-1:7051,master-1-2:7051,master-1-3:7051.Click the catalogd.flgs tab, then click Add Configuration Item. Add the same configuration item:
Parameter Value kudu_master_hostsmaster-1-1:7051
Step 2 (Optional): Verify the integration
Connect to Impala. For more information, see Use the Impala shell tool.
Create a test table:
CREATE TABLE my_first_table ( id BIGINT, name STRING, PRIMARY KEY(id) ) PARTITION BY HASH PARTITIONS 16 STORED AS KUDU TBLPROPERTIES( 'kudu.num_tablet_replicas' = '1');If the output contains
Table has been created., Impala is successfully integrated with Kudu.
Integrate Impala with Kudu using the CLI
Step 1: Connect to Impala
Connect to Impala using the Impala shell tool. For more information, see Use the Impala shell tool.
Step 2: Create a Kudu table
Run the following command to create a table. The kudu.master_addresses property specifies the Kudu master node.
CREATE TABLE my_first_table
(
id BIGINT,
name STRING,
PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU
TBLPROPERTIES(
'kudu.master_addresses' = 'master-1-1:7051',
'kudu.num_tablet_replicas' = '1');Key parameters:
| Parameter | Description |
|---|---|
my_first_table | Table name. Replace with a name of your choice. |
kudu.master_addresses | Hostname and port of the Kudu master node. For multiple master nodes, separate each hostname:port pair with a comma — for example: master-1-1:7051,master-1-2:7051,master-1-3:7051. For a Hadoop cluster, replace master-1-1 with emr-header-1. |
kudu.num_tablet_replicas | Number of tablet replicas. The example uses '1'. |
If the output contains Table has been created., the table is created successfully.
Step 3 (Optional): Insert data
INSERT INTO my_first_table VALUES(1, "ss");Step 4 (Optional): Query data
SELECT * FROM my_first_table;Expected output:
+----+------+
| id | name |
+----+------+
| 1 | ss |
+----+------+To drop the table, run DROP TABLE my_first_table;.