Use SELECT statements to query data from one or multiple tables. This topic covers SELECT syntax and the clauses supported in ApsaraDB for ClickHouse.
Syntax
Only expr_list is required. All other clauses are optional, and are listed below in the order they are executed. For the full SELECT reference, see SELECT Query.
SELECT [DISTINCT] expr_list
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE sample_coeff]
[ARRAY JOIN ...]
[GLOBAL] ANY|ALL INNER|LEFT JOIN (subquery)|table USING columns_list
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr_list] [WITH TOTALS]
[HAVING expr]
[ORDER BY expr_list]
[LIMIT [n, ]m]
[UNION ALL ...]
[INTO OUTFILE filename]
[FORMAT format]
[LIMIT n BY columns]Example:
SELECT
OriginCityName,
DestCityName,
count(*) AS flights,
bar(flights, 0, 20000, 40)
FROM ontime_distributed
WHERE Year = 1988
GROUP BY OriginCityName, DestCityName
ORDER BY flights DESC
LIMIT 20;SAMPLE clause
The SAMPLE clause runs a query on a subset of data instead of the full dataset, trading exactness for speed. Use it when strict latency requirements or large data volumes make full scans impractical.
SAMPLE is supported only for tables in the MergeTree family that define a sampling expression at creation time.
The syntax is SAMPLE k, where k is either a fractional number or a large integer:
| Syntax | Behavior | Approximate total |
|---|---|---|
SAMPLE k (decimal, 0 < k < 1) | Scans a k fraction of the data. SAMPLE 0.1 reads 10% of rows. | Multiply the result by 1/k. |
SAMPLE n (large integer) | Scans up to n rows. SAMPLE 10000000 reads up to 10,000,000 rows. | Cannot be used to obtain an approximate result from all the data. |
Example:
The following query scans 10% of the data. Multiply count() by 10 to get the approximate total page views.
SELECT
Title,
count() * 10 AS PageViews
FROM hits_distributed
SAMPLE 0.1
WHERE
CounterID = 34
AND toDate(EventDate) >= toDate('2013-01-29')
AND toDate(EventDate) <= toDate('2013-02-04')
AND NOT DontCountHits
AND NOT Refresh
AND Title != ''
GROUP BY Title
ORDER BY PageViews DESC
LIMIT 1000;ARRAY JOIN clause
ARRAY JOIN expands array columns into individual rows, making it straightforward to aggregate or filter on each array element. It is similar to the arrayJoin function but supports more scenarios, including nested data structures.
Example:
Create a test table.
CREATE TABLE arrays_test ( s String, arr Array(UInt8) ) ENGINE = Memory;Insert test data.
INSERT INTO arrays_test VALUES ('Hello', [1,2]), ('World', [3,4,5]), ('Goodbye', []);Check the raw data.
SELECT * FROM arrays_test;┌─s───────┬─arr─────┐ │ Hello │ [1,2] │ │ World │ [3,4,5] │ │ Goodbye │ [] │ └─────────┴─────────┘Use ARRAY JOIN to expand the array column. The
Goodbyerow (empty array) is dropped.SELECT s, arr FROM arrays_test ARRAY JOIN arr;┌─s─────┬─arr─┐ │ Hello │ 1 │ │ Hello │ 2 │ │ World │ 3 │ │ World │ 4 │ │ World │ 5 │ └───────┴─────┘
JOIN clause
JOIN combines rows from two or more tables on a join condition. For details on NULL handling, see join_use_nulls, Nullable, and Null.
Usage notes
When running JOIN or IN subqueries across multiple distributed tables in a multi-node cluster (Community-compatible Edition), the following error may appear:
Exception: Double-distributed IN/JOIN subqueries is denied (distributed_product_mode = 'deny').Do not run JOIN or IN subqueries directly on distributed tables. For workarounds, see FAQ.
WHERE clause
WHERE filters rows before aggregation. The filter expression must evaluate to a UInt8 value (typically a conditional expression). If the table engine supports indexes, ClickHouse evaluates whether the WHERE expression can use them to skip data.
PREWHERE clause
PREWHERE applies an early filter before the full column set is read, reducing I/O on wide tables.
PREWHERE is supported only for engines in the MergeTree family. For the full reference, see PREWHERE Clause.
Usage notes
PREWHERE executes before WHERE. If both are present, PREWHERE filters first.
Indexed fields are not suitable for PREWHERE. For indexed fields, ClickHouse already reads only the matching data blocks.
If you set the
optimize_move_to_prewhereparameter to1and do not use the PREWHERE clause in a SELECT statement, the system automatically checks whether a part of the filter condition in the WHERE clause is suitable for the PREWHERE clause. If so, the system moves the part from the WHERE clause to the PREWHERE clause.
When to use PREWHERE manually
Manually specify PREWHERE when your filter condition meets all of the following:
Involves only a few columns (not all columns in the query)
Provides strong row reduction (filters out most rows)
Does not rely on indexed columns
For example, when querying many columns but filtering on one or two non-indexed fields, putting those fields in PREWHERE avoids loading all other columns for rows that will be filtered out.
WITH TOTALS modifier
Add WITH TOTALS to a GROUP BY clause to include an extra row in the results that shows aggregate values computed across all rows.
The totals row uses the default value for each key column (zeros or empty strings) and the aggregate function result over all rows.
The totals row appears as a separate section only when the output format is JSON, TabSeparated, or Pretty. In other formats, it is appended after the regular rows.
Example:
Create a test table.
CREATE TABLE default.t1 ( `a` Int32, `b` Int32, `c` String, `d` Date ) ENGINE = MergeTree PARTITION BY toYYYYMM(d) ORDER BY (a, b) SETTINGS index_granularity = 8192;Insert test data.
INSERT INTO t1(a, b, c, d) VALUES (3,4,'a','2022-09-01'),(12,7,'b','2022-07-01'),(21,4,'c','2022-05-01'),(11,1,'d','2022-06-01');Group by column
dand sum columna.SELECT sum(a) FROM t1 GROUP BY d;┌─sum(a)─┐ │ 21 │ │ 11 │ │ 12 │ │ 3 │ └────────┘Add
WITH TOTALSto include the grand total.SELECT sum(a) FROM t1 GROUP BY d WITH TOTALS;┌─sum(a)─┐ │ 21 │ │ 11 │ │ 12 │ │ 3 │ └────────┘ Totals: ┌─sum(a)─┐ │ 47 │ └────────┘