This topic describes how to connect to Impala from an E-MapReduce (EMR) cluster.

Prerequisites

An EMR Hadoop cluster is created, and Impala is selected from the optional services during the cluster creation. For more information about how to create a cluster, see Create a cluster.

Procedure

  1. Connect to the master node of the EMR cluster in SSH mode.
  2. Run the following command to go to the Impala console:
    If the returned information contains the following content, you have logged on to the Impala console:
    Welcome to the Impala shell.
    Before you connect to Impala, you can run the impala-shell --help command to obtain help information from the console.
      -h, --help            show this help message and exit
      -i IMPALAD, --impalad=IMPALAD
                            <host:port> of impalad to connect to
                            [default: emr-header-1.cluster-20****:2****]
      -q QUERY, --query=QUERY
                            Execute a query without the shell [default: none]
      -f QUERY_FILE, --query_file=QUERY_FILE
                            Execute the queries in the query file, delimited by ;.
                            If the argument to -f is "-", then queries are read
                            from stdin and terminated with ctrl-d. [default: none]
      -k, --kerberos        Connect to a kerberized impalad [default: False]
      -o OUTPUT_FILE, --output_file=OUTPUT_FILE
                            If set, query results are written to the given file.
                            Results from multiple semicolon-terminated queries
                            will be appended to the same file [default: none]
      -B, --delimited       Output rows in delimited mode [default: False]
      --print_header        Print column names in delimited mode when pretty-
                            printed. [default: False]
      --output_delimiter=OUTPUT_DELIMITER
                            Field delimiter to use for output in delimited mode
                            [default: \t]
      -s KERBEROS_SERVICE_NAME, --kerberos_service_name=KERBEROS_SERVICE_NAME
                            Service name of a kerberized impalad [default: impala]
      -V, --verbose         Verbose output [default: True]
      -p, --show_profiles   Always display query profiles after execution
                            [default: False]
      --quiet               Disable verbose output [default: False]
      -v, --version         Print version information [default: False]
      -c, --ignore_query_failure
                            Continue on query failure [default: False]
      -r, --refresh_after_connect
                            Refresh Impala catalog after connecting
                            [default: False]
      -d DEFAULT_DB, --database=DEFAULT_DB
                            Issues a use database command on startup
                            [default: none]
      -l, --ldap            Use LDAP to authenticate with Impala. Impala must be
                            configured to allow LDAP authentication.
                            [default: False]
      -u USER, --user=USER  User to authenticate with. [default: root]
      --ssl                 Connect to Impala via SSL-secured connection
                            [default: False]
      --ca_cert=CA_CERT     Full path to certificate file used to authenticate
                            Impala's SSL certificate. May either be a copy of
                            Impala's certificate (for self-signed certs) or the
                            certificate of a trusted third-party CA. If not set,
                            but SSL is enabled, the shell will NOT verify Impala's
                            server certificate [default: none]
      --config_file=CONFIG_FILE
                            Specify the configuration file to load options. The
                            following sections are used: [impala],
                            [impala.query_options]. Section names are case
                            sensitive. Specifying this option within a config file
                            will have no effect. Only specify this as an option in
                            the commandline. [default: /root/.impalarc]
      --live_summary        Print a query summary every 1s while the query is
                            running. [default: False]
      --live_progress       Print a query progress every 1s while the query is
                            running. [default: False]
      --auth_creds_ok_in_clear
                            If set, LDAP authentication may be used with an
                            insecure connection to Impala. WARNING: Authentication
                            credentials will therefore be sent unencrypted, and
                            may be vulnerable to attack. [default: none]
      --ldap_password_cmd=LDAP_PASSWORD_CMD
                            Shell command to run to retrieve the LDAP password
                            [default: none]
      --var=KEYVAL          Defines a variable to be used within the Impala
                            session. Can be used multiple times to set different
                            variables. It must follow the pattern "KEY=VALUE", KEY
                            starts with an alphabetic character and contains
                            alphanumeric characters or underscores. [default:
                            none]
      -Q QUERY_OPTIONS, --query_option=QUERY_OPTIONS
                            Sets the default for a query option. Can be used
                            multiple times to set different query options. It must
                            follow the pattern "KEY=VALUE", KEY must be a valid
                            query option. Valid query options  can be listed by
                            command 'set'. [default: none]
  3. Optional: Run the quit; command to log off from the Impala console.