Impala is a real-time SQL query engine that features high performance and low latency. You can use Impala to query Apache Hadoop data in real time.
Impala uses the same metadata, SQL syntax (Hive SQL), and ODBC driver as Apache Hive to provide a familiar and unified platform for real-time or batch-oriented queries.
For more information about Impala, see Apache Impala.
- Impala Daemon
The core component of Impala is Impala Daemon that runs on each node. Impala Daemon is represented by a process named Impalad. Impala Daemon is used to read and write data files, receive query statements sent by using the impala-shell command or from Hue, JDBC, or ODBC, and parallelize queries and distribute tasks to Impala nodes of a cluster. In addition, Impala Daemon can also be used to return locally calculated query results back to the coordinator node.
- StateStore Daemon
StateStore Daemon is represented by a process named Statestored. StateStore Daemon is used to manage the health status of all Impalad processes in a cluster and forwards the status results to all Impalad processes. If an Impalad process is unavailable due to node exceptions, network exceptions, or software problems, StateStore Daemon notifies other Impalad processes of the abnormality. When a new query request is initiated, it is not delivered to the unavailable Impalad process.
- Catalog Daemon
Catalog Daemon is represented by a process named Catalogd. Catalog Daemon is used to synchronize metadata changes on each Impalad process to other Impalad processes in the same cluster. We recommend that you run StateStore Daemon and Catalog Daemon on the same node because all requests are delivered by using the Statestored process.