All Products
Search
Document Center

E-MapReduce:Integrate Impala with Kudu

Last Updated:Sep 05, 2023

After you integrate Impala with Kudu, you can use Impala to access data tables in Kudu. This topic describes how to integrate Impala with Kudu.

Prerequisites

An E-MapReduce (EMR) cluster is created, and Impala and Kudu are selected from the optional services when you create the cluster. For more information, see Create a cluster.

Procedure

Use the EMR console

  1. On the Configure tab of the Impala service page, add configuration items. For more information, see Manage configuration items.

    1. On the Configure tab of the Impala service page, click impalad.flgs.

    2. On the impalad.flgs tab, click Add Configuration Item to add a configuration item whose name is kudu_master_hosts and value is master-1-1:7051.

      Note

      kudu_master_hosts specifies the name and port number of the master node in the Kudu cluster that is connected to Impala. If a Kudu cluster contains multiple master nodes, separate the names and port numbers of the master nodes with commas (,). Example: master-1-1:7051,master-1-2:7051,master-1-3:7051.

    3. Click the catalogd.flgs tab. On the catalogd.flgs tab, click Add Configuration Item to add a configuration item whose name is kudu_master_hosts and value is master-1-1:7051.

  2. Optional. Log on to the cluster to check whether Impala is integrated with Kudu.

    1. Connect to Impala. For more information, see Use the Impala shell tool.

    2. Run the following command to create a table:

      create table my_first_table
      (
        id bigint,
        name string,
        primary key(id)
      )
      partition by hash partitions 16
      stored as kudu
      tblproperties(
        'kudu.num_tablet_replicas' = '1');

      If the output contains Table has been created., the table is created. This indicates that Impala is integrated with Kudu.

Use a CLI

  1. Connect to Impala. For more information, see Use the Impala shell tool.

  2. Run the following command to create a table.

    kudu.master_addresses in the code specifies a Kudu cluster. Example:

    create table my_first_table
    (
       id bigint,
       name string,
       primary key(id)
    )
    partition by hash partitions 16
    stored as kudu
    tblproperties(
     'kudu.master_addresses' = 'master-1-1:7051',
     'kudu.num_tablet_replicas' = '1');
    Note

    Parameters in the sample code:

    • my_first_table: The name of the table. You can specify a custom name.

    • kudu.master_addresses: specifies the master node. If your cluster contains multiple master nodes, separate the names and port numbers of the master nodes with commas (,). Example: master-1-1:7051,master-1-2:7051,master-1-3:7051. If your cluster is a Hadoop cluster, change master-1-1 to emr-header-1.

    If the output contains Table has been created., the table is created. This indicates that Impala is integrated with Kudu.

  3. Optional. Run the following command to insert data into the table:

    insert into my_first_table values(1,"ss");
  4. Optional. Run the following command to query data in the table:

    select * from my_first_table;

    The following output is returned:

    +----+------+
    | id | name |
    +----+------+
    | 1  | ss   |
    +----+------+
    Note

    If you want to drop the table, run the drop table my_first_table; command.