Connect to Impala on an E-MapReduce (EMR) cluster using the command-line interface (CLI). Two tools are supported:
impala-shell — the native Impala CLI, recommended for interactive queries
Beeline — a JDBC-based client that connects to Impala over the HiveServer2 protocol
Prerequisites
Before you begin, ensure that you have:
An EMR cluster with Impala selected during cluster creation. For details, see Create a cluster
Connect using impala-shell
You can connect to any Impalad node — each node coordinates the execution of queries sent to it, so node selection is arbitrary. Run impala-shell --help to see all available options before connecting.
Common cluster
Log on to the master node over SSH. For details, see Log on to a cluster.
Connect to Impala:
impala-shell -i <impalad-node>Replace
<impalad-node>with the hostname of any Impalad node. To find available nodes, go to the Status tab on the Impala service page of your cluster in the EMR console. Node hostnames are listed in the Topology List section (for example,core-1-1orcore-1-2).
(Optional) Run
quit;to exit impala-shell.
High-security cluster
High-security clusters use Kerberos authentication. The -k flag enables Kerberos when connecting.
Log on to the master node over SSH. For details, see Log on to a cluster.
Initialize a Kerberos credential.
Check whether a valid credential already exists:
klistIf the output shows
klist: No credentials cache found, continue to initialize a credential. If a valid credential is shown, skip to step 3.Find the principal name in the keytab:
klist -k $IMPALA_CONF_DIR/impala.keytabSave the first line of output — this is the principal name needed in the next step. For example:
impala/master-1-1.c-45dcb9bbe234****.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE23****.COM
Initialize the credential:
kinit -k -t $IMPALA_CONF_DIR/impala.keytab <principal>Replace
<principal>with the principal name saved in the previous step.
Connect to Impala with Kerberos authentication:
impala-shell -k -i <impalad-node>(Optional) Run
quit;to exit impala-shell.
Connect using Beeline
Beeline connects to Impala over JDBC using the HiveServer2 protocol on port 28000.
Common cluster
Log on to the master node over SSH. For details, see Log on to a cluster.
Connect to Impala:
beeline -u 'jdbc:hive2://<impalad-node>:28000/default;transportMode=http;uauth=noSasl'Parameter Description <impalad-node>Hostname of any Impalad node. Find node names in the Topology List on the Status tab of the Impala service page in the EMR console. 28000Port that Impala listens on for HiveServer2 connections. transportMode=httpUses HTTP as the transport layer. uauth=noSaslDisables SASL authentication for non-Kerberos clusters. (Optional) Run
quit;to exit Beeline.
High-security cluster
For high-security clusters, run Beeline as the root user on a core node (not the master node).
Log on to a core node over SSH as the root user. For details, see Log on to a cluster.
Initialize a Kerberos credential as the root user.
Check whether a valid credential already exists:
klistIf the output shows
klist: No credentials cache found, continue to initialize a credential. If a valid credential is shown, skip to step 3.Find the principal name in the keytab:
klist -k $IMPALA_CONF_DIR/impala.keytabSave the first line of output. For example:
impala/master-1-1.c-45dcb9bbe234****.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE23****.COM
Initialize the credential:
kinit -k -t $IMPALA_CONF_DIR/impala.keytab <principal>Replace
<principal>with the principal name saved in the previous step.
Connect to Impala as the root user:
beeline -u 'jdbc:hive2://<impalad-node>:28000/default;principal=<principal>;transportMode=http'Replace
<impalad-node>with an Impalad node hostname and<principal>with the full principal name from step 2b.(Optional) Run
!quitto exit Beeline.
The exit command for Beeline is !quit, not quit;.