SAMPLE hashes rows into x equal portions and returns only the rows in the yth portion. Use it in a WHERE clause to sample large tables without scanning the full result set.
Syntax
boolean SAMPLE(<x>, <y>, [<column_name1>, <column_name2>[,...]])Returns BOOLEAN. Rows where the function evaluates to TRUE are included in the output.
Parameters
Parameter | Required | Type | Constraints |
| Yes | BIGINT integer constant | x >= 1 |
| No | BIGINT integer constant | 1 <= y <= x; defaults to 1 (first portion) |
| No | Any data type | Cannot be explicitly set to |
Supported data types for random sampling (when `column_name` is omitted): BIGINT, DATETIME, BOOLEAN, DOUBLE, STRING, BINARY, CHAR, and VARCHAR.
Error conditions:
Condition | Result |
| Error |
| Error |
| Error |
| Error |
| Returns |
Usage notes
Specify `column_name` to avoid data skew. When column_name is omitted, rows are randomly assigned to portions at runtime, which can produce uneven portion sizes. When column_name is specified, SAMPLE applies uniform hashing on the column values — including null values — distributing them evenly across x portions.
Examples
All examples use the mf_sample table:
+----+--------+--------+---------+-----------+
| id | col1 | col2 | col3 | col4 |
+----+--------+--------+---------+-----------+
| 3 | eee | rrr | tttt | ggggg |
| 4 | yyy | uuuu | iiiii | ccccccc |
| 1 | "abc" | "bcd" | "rthg" | "ahgjeog" |
| 2 | "a1bc" | "bc1d" | "rt1hg" | "ahgjeog" |
+----+--------+--------+---------+-----------+Random sampling (no column specified)
Hash all rows into 2 portions and return the first portion.
SELECT * FROM mf_sample WHERE SAMPLE(2, 1);Result:
+----+-------+-------+--------+-----------+
| id | col1 | col2 | col3 | col4 |
+----+-------+-------+--------+-----------+
| 3 | eee | rrr | tttt | ggggg |
| 1 | "abc" | "bcd" | "rthg" | "ahgjeog" |
+----+-------+-------+--------+-----------+Column-based sampling
Hash the values in the id column into 2 portions and return the first portion.
SELECT * FROM mf_sample WHERE SAMPLE(2, 1, id);Result:
+----+--------+--------+---------+-----------+
| id | col1 | col2 | col3 | col4 |
+----+--------+--------+---------+-----------+
| 4 | yyy | uuuu | iiiii | ccccccc |
| 2 | "a1bc" | "bc1d" | "rt1hg" | "ahgjeog" |
+----+--------+--------+---------+-----------+What's next
For other sampling and utility functions, see Other functions.