All Products
Search
Document Center

E-MapReduce:Integrate Impala with Kudu

Last Updated:Mar 26, 2026

After integrating Impala with Kudu, you can use Impala SQL to query and manage data in Kudu tables. This topic describes how to connect Impala to a Kudu cluster using the E-MapReduce (EMR) console or the CLI.

Prerequisites

Before you begin, ensure that you have:

  • An EMR cluster with Impala and Kudu selected as optional services. For more information, see Create a cluster.

How it works

There are two ways to tell Impala where the Kudu master nodes are:

  • Global flag (kudu_master_hosts): Set once in the Impala service configuration. All Kudu tables created through Impala automatically use this setting.

  • Per-table property (kudu.master_addresses): Specified in the TBLPROPERTIES clause of each CREATE TABLE statement. Use this approach when you configure Impala through the CLI without setting the global flag.

Integrate Impala with Kudu using the EMR console

Step 1: Configure the Impala service

  1. Go to the Configure tab of the Impala service page. For more information, see Manage configuration items.

  2. Click impalad.flgs, then click Add Configuration Item. Add the following configuration item:

    ParameterValue
    kudu_master_hostsmaster-1-1:7051

    kudu_master_hosts specifies the hostname and port of the Kudu master node. For multiple master nodes, separate each hostname:port pair with a comma — for example: master-1-1:7051,master-1-2:7051,master-1-3:7051.

  3. Click the catalogd.flgs tab, then click Add Configuration Item. Add the same configuration item:

    ParameterValue
    kudu_master_hostsmaster-1-1:7051

Step 2 (Optional): Verify the integration

  1. Connect to Impala. For more information, see Use the Impala shell tool.

  2. Create a test table:

    CREATE TABLE my_first_table
    (
      id BIGINT,
      name STRING,
      PRIMARY KEY(id)
    )
    PARTITION BY HASH PARTITIONS 16
    STORED AS KUDU
    TBLPROPERTIES(
      'kudu.num_tablet_replicas' = '1');

    If the output contains Table has been created., Impala is successfully integrated with Kudu.

Integrate Impala with Kudu using the CLI

Step 1: Connect to Impala

Connect to Impala using the Impala shell tool. For more information, see Use the Impala shell tool.

Step 2: Create a Kudu table

Run the following command to create a table. The kudu.master_addresses property specifies the Kudu master node.

CREATE TABLE my_first_table
(
   id BIGINT,
   name STRING,
   PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU
TBLPROPERTIES(
  'kudu.master_addresses' = 'master-1-1:7051',
  'kudu.num_tablet_replicas' = '1');

Key parameters:

ParameterDescription
my_first_tableTable name. Replace with a name of your choice.
kudu.master_addressesHostname and port of the Kudu master node. For multiple master nodes, separate each hostname:port pair with a comma — for example: master-1-1:7051,master-1-2:7051,master-1-3:7051. For a Hadoop cluster, replace master-1-1 with emr-header-1.
kudu.num_tablet_replicasNumber of tablet replicas. The example uses '1'.

If the output contains Table has been created., the table is created successfully.

Step 3 (Optional): Insert data

INSERT INTO my_first_table VALUES(1, "ss");

Step 4 (Optional): Query data

SELECT * FROM my_first_table;

Expected output:

+----+------+
| id | name |
+----+------+
| 1  | ss   |
+----+------+

To drop the table, run DROP TABLE my_first_table;.