This topic describes how to use a command-line interface (CLI) to connect to Impala on E-MapReduce (EMR).
Prerequisites
You have created a cluster and selected the Impala service. For more information, see Create a cluster.
Impala-shell connection
Before you connect to Impala, you can run the impala-shell --help command for more information.
Standard cluster
-
Connect to the master node of the cluster over SSH. For more information, see Log on to a cluster.
-
Run the following command to connect to Impala.
impala-shell -i <impalad_node_name>The
<impalad_node_name>placeholder in this topic represents the name of an Impalad node. To obtain the node name, go to the Status tab of the Impala service in the EMR console, expand the Impalad component, and then find the name in the Node Name column of the topology list. For example, the node names can be core-1-1 and core-1-2. You can use any of these names to connect to Impala. Replace the<impalad_node_name>placeholder in the preceding command with the node name that you obtain. -
Optional: Run the
quit;command to exit the Impala CLI.
High-security cluster
-
Connect to the master node of the cluster over SSH. For more information, see Log on to a cluster.
-
Initialize a credential.
-
Run the following command to check for a Kerberos credential.
klistIf the output contains
klist: No credentials cache found, you must initialize a credential in the next step. If the output already contains credential information, you can skip credential initialization and connect to Impala. -
Run the following command to view the principal.
klist -k $IMPALA_CONF_DIR/impala.keytabRecord the principal from the output. You will need it for the next step. For example, the principal is
impala/master-1-1.c-45dcb9bbe234****.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE23****.COM.[root@master-l-1(192.16xxx) ~]# klist -k $IMPALA_CONF_DIR/impala.keytab Keytab name: FILE:/etc/taihao-apps/impala-conf/runtime-conf/impala.keytab KVNO Principal ---- ---------------------------------------------------------------- 2 impala/master-l-1.c-45dcb9bbe234xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE234xxx.COM 2 impala/master-l-1.c-45dcb9bbe234xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE234xxx.COM 2 impala/master-l-1.c-45dcb9bbe234xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE234xxx.COM 2 impala/master-l-1.c-45dcb9bbe234xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE234xxx.COM 2 impala/master-l-1.c-45dcb9bbe234xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE234xxx.COM 2 impala/master-l-1.c-45dcb9bbe234xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE234xxx.COM 2 impala/master-l-1.c-45dcb9bbe234xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE234xxx.COM 2 impala/master-l-1.c-45dcb9bbe234xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-45DCB9BBE234xxx.COM -
Run the following command to initialize the credential.
kinit -k -t $IMPALA_CONF_DIR/impala.keytab <principal>NoteReplace
<principal>with the principal you recorded in the previous step.
-
-
Run the following command to connect to Impala.
impala-shell -k -i <impalad_node_name> -
Optional: Run the
quit;command to exit the Impala CLI.
Beeline JDBC connection
Standard cluster
-
Connect to the master node of the cluster over SSH. For more information, see Log on to a cluster.
-
Run the following command to connect to Impala.
beeline -u 'jdbc:hive2://<impalad_node_name>:28000/default;transportMode=http;uauth=noSasl' -
Optional: Run the
quit;command to exit the Impala CLI.
High-security cluster
-
Connect to a core node of the cluster over SSH. For more information, see Log on to a cluster.
-
Initialize a credential as the root user.
-
Run the following command to check for a Kerberos credential.
klistIf the output contains the message
klist: No credentials cache found, you must proceed to the next step to initialize an access credential. If the output contains credential information, you can skip the credential initialization step and directly use the impala-shell command to connect to Impala. -
Run the following command to view the principal.
klist -k $IMPALA_CONF_DIR/impala.keytabRecord the principal from the output. You will need it for the next step. For example, the principal is
impala/core-1-1.c-ee5cfb2d6306****.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2D6306****.COM.[root@core-1-1(xxx) ~]# klist -k $IMPALA_CONF_DIR/impala.keytab Keytab name: FILE:/etc/taihao-apps/impala-conf/runtime-conf/impala.keytab KVNO Principal ---- ---------------------------------------------------------------- 2 impala/core-1-1.c-ee5cfb2dxxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2Dxxx.COM 2 impala/core-1-1.c-ee5cfb2dxxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2Dxxx.COM 2 impala/core-1-1.c-ee5cfb2dxxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2Dxxx.COM 2 impala/core-1-1.c-ee5cfb2dxxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2Dxxx.COM 2 impala/core-1-1.c-ee5cfb2dxxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2Dxxx.COM 2 impala/core-1-1.c-ee5cfb2dxxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2Dxxx.COM 2 HTTP/core-1-1.c-ee5cfb2d63xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2D63xxx.COM 2 HTTP/core-1-1.c-ee5cfb2d63xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2D63xxx.COM 2 HTTP/core-1-1.c-ee5cfb2d63xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2D63xxx.COM 2 HTTP/core-1-1.c-ee5cfb2d63xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2D63xxx.COM 2 HTTP/core-1-1.c-ee5cfb2d63xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2D63xxx.COM 2 HTTP/core-1-1.c-ee5cfb2d63xxx.cn-hangzhou.emr.aliyuncs.com@EMR.C-EE5CFB2D63xxx.COM -
Run the following command to initialize the credential.
kinit -k -t $IMPALA_CONF_DIR/impala.keytab <principal>NoteReplace
<principal>with the principal you recorded in the previous step.
-
-
Run the following command as the root user to connect to Impala.
beeline -u 'jdbc:hive2://<impalad_node_name>:28000/default;principal=<principal>;transportMode=http' -
Optional: To exit the Beeline CLI, run the
!quitcommand.
FAQ
Where are Impala logs stored?
Impala logs are stored in the /var/log/taihao-apps/impala/ directory by default. You can log on to a cluster node and navigate to this directory to view the logs.
Impala logs are stored on the nodes where Impalad processes are running (typically core nodes). If you cannot find logs on a specific node, verify that the Impalad process is deployed on that node.
How do I query historical access users for Impala?
If Apache Ranger is not enabled for the cluster, Impala uses Hadoop Distributed File System (HDFS) file read and write permissions for access control.
To extract historical connection accounts from the audit log, run the following command on a node where the Impalad process is running:
grep "Successfully authenticated client user" /var/log/taihao-apps/impala/impalad.INFO* | awk -F'"' '{print $2}' | sort | uniq -c | sort -nr
This command searches the Impala audit logs for successful authentication records, extracts the usernames, and displays them in descending order by connection count.
How do I create a read-only account for Impala?
Creating a read-only Impala account requires Apache Ranger for permission control. If Ranger is not deployed in your cluster, you must add the Ranger service before you can configure fine-grained access control such as read-only permissions.
Does Impala on EMR 4.9.0 support Ranger authorization?
No. Impala on EMR 4.9.0 does not support Ranger authorization. To use Ranger for Impala permission management, we recommend that you create a cluster that runs a later EMR version and migrate your data to the new cluster.