Choose between HBase SQL (Phoenix) and Spark - ApsaraDB for HBase

This topic describes the scenarios of Phoenix and Spark, and compares and analyzes the differences. This helps you choose between Phoenix and Spark.

Scenarios

ApsaraDB Phoenix is an SQL search engine that is provided by ApsaraDB for HBase. It is used to perform simple queries that require high concurrency and low latency. It can also be used to perform simple analytical operations. Queries must hit indexes and return a small amount of data. For a JOIN query, the number of data entries that are retrieved from a table must be less than 100,000, and columns in conditional statements must hit indexes. To ensure the stability of clusters, some complex and time-consuming SQL statements are rejected by the platform.
ApsaraDB Spark is an analytics engine that is provided by ApsaraDB for HBase. ApsaraDB Spark is used to perform complex queries that require low concurrency and high latency. ApsaraDB Spark can handle all types of complex queries. ApsaraDB Spark supports SQL, Scala, Java, and Python. ApsaraDB Spark also supports streaming, online analytical processing (OLAP), offline analytics, data cleansing, and multiple data sources. The data sources include HBase, MongoDB, Redis, and Object Storage Service (OSS). Spark Streaming supports near-real-time streaming that is not covered in this topic.


Item	Phoenix	Spark
Complexity of SQL queries	Phoenix supports simple queries. Queries must hit indexes and return a small amount of data. For a JOIN query, the number of data entries that are returned from a table must be less than 100,000, and columns in conditional statements must hit indexes. To ensure the stability of clusters, some complex and time-consuming SQL statements are rejected by the platform.	Spark supports all kinds of queries and can map queries to Phoenix. This allows Spark to deliver the same performance as Phoenix in simple SQL queries. However, Spark is used for analytics. Analytics scenarios are essentially different from the online transaction processing (OLTP) scenarios of Phoenix.
Cluster	Phoenix is used as a plug-in in your ApsaraDB for HBase cluster to support SQL queries. You do not need to purchase another cluster.	You must purchase another cluster for Spark.
Concurrency	10,000 to 50,000 transactions per second (TPS) per node.	Up to 1 million TPS.
Latency	The response latency is measured in milliseconds. For SQL queries on large amounts of data, the latency can reach several seconds.	Typically, the response latency is more than 300 milliseconds. It takes seconds, minutes, or even hours to execute most SQL statements.
Update	Supported.	Not supported.
Supported workloads	Online workloads.	Offline workloads and near-real-time workloads.

Choose Phoenix for simple queries that require high concurrency, low latency, and online workloads.
Choose Spark for complex queries that require low concurrency, high latency, offline workloads, and near-real-time workloads.