This topic describes how to use the Impala shell tool in E-MapReduce (EMR).
Prerequisites
An EMR Hadoop cluster is created, and Impala is selected from the optional services when you create the cluster. For more information, see Create a cluster.
Procedure
- Log on to the cluster in SSH mode. For more information, see Log on to a cluster.
- Run the following command to start the Impala shell tool:
- Common cluster:
impala-shell
- High-security cluster:
impala-shell -k
Note Make sure that the account that you use to connect to Impala has passed the security authentication. For more information, see Configure MIT Kerberos authentication.
If the returned information contains the following content, the Impala shell tool is started:Welcome to the Impala shell.
Before you connect to Impala, you can run theimpala-shell --help
command to obtain help information about Impala.-h, --help show this help message and exit -i IMPALAD, --impalad=IMPALAD <host:port> of impalad to connect to [default: emr-header-1.cluster-20****:2****] -q QUERY, --query=QUERY Execute a query without the shell [default: none] -f QUERY_FILE, --query_file=QUERY_FILE Execute the queries in the query file, delimited by ;. If the argument to -f is "-", then queries are read from stdin and terminated with ctrl-d. [default: none] -k, --kerberos Connect to a kerberized impalad [default: False] -o OUTPUT_FILE, --output_file=OUTPUT_FILE If set, query results are written to the given file. Results from multiple semicolon-terminated queries will be appended to the same file [default: none] -B, --delimited Output rows in delimited mode [default: False] --print_header Print column names in delimited mode when pretty- printed. [default: False] --output_delimiter=OUTPUT_DELIMITER Field delimiter to use for output in delimited mode [default: \t] -s KERBEROS_SERVICE_NAME, --kerberos_service_name=KERBEROS_SERVICE_NAME Service name of a kerberized impalad [default: impala] -V, --verbose Verbose output [default: True] -p, --show_profiles Always display query profiles after execution [default: False] --quiet Disable verbose output [default: False] -v, --version Print version information [default: False] -c, --ignore_query_failure Continue on query failure [default: False] -r, --refresh_after_connect Refresh Impala catalog after connecting [default: False] -d DEFAULT_DB, --database=DEFAULT_DB Issues a use database command on startup [default: none] -l, --ldap Use LDAP to authenticate with Impala. Impala must be configured to allow LDAP authentication. [default: False] -u USER, --user=USER User to authenticate with. [default: root] --ssl Connect to Impala via SSL-secured connection [default: False] --ca_cert=CA_CERT Full path to certificate file used to authenticate Impala's SSL certificate. May either be a copy of Impala's certificate (for self-signed certs) or the certificate of a trusted third-party CA. If not set, but SSL is enabled, the shell will NOT verify Impala's server certificate [default: none] --config_file=CONFIG_FILE Specify the configuration file to load options. The following sections are used: [impala], [impala.query_options]. Section names are case sensitive. Specifying this option within a config file will have no effect. Only specify this as an option in the commandline. [default: /root/.impalarc] --live_summary Print a query summary every 1s while the query is running. [default: False] --live_progress Print a query progress every 1s while the query is running. [default: False] --auth_creds_ok_in_clear If set, LDAP authentication may be used with an insecure connection to Impala. WARNING: Authentication credentials will therefore be sent unencrypted, and may be vulnerable to attack. [default: none] --ldap_password_cmd=LDAP_PASSWORD_CMD Shell command to run to retrieve the LDAP password [default: none] --var=KEYVAL Defines a variable to be used within the Impala session. Can be used multiple times to set different variables. It must follow the pattern "KEY=VALUE", KEY starts with an alphabetic character and contains alphanumeric characters or underscores. [default: none] -Q QUERY_OPTIONS, --query_option=QUERY_OPTIONS Sets the default for a query option. Can be used multiple times to set different query options. It must follow the pattern "KEY=VALUE", KEY must be a valid query option. Valid query options can be listed by command 'set'. [default: none]
- Common cluster:
- Optional: Run the
quit;
command to exit the Impala shell tool.