All Products
Search
Document Center

MaxCompute:SAMPLE

Last Updated:Mar 27, 2026

SAMPLE hashes rows into x equal portions and returns only the rows in the yth portion. Use it in a WHERE clause to sample large tables without scanning the full result set.

Syntax

boolean SAMPLE(<x>, <y>, [<column_name1>, <column_name2>[,...]])

Returns BOOLEAN. Rows where the function evaluates to TRUE are included in the output.

Parameters

Parameter

Required

Type

Constraints

x

Yes

BIGINT integer constant

x >= 1

y

No

BIGINT integer constant

1 <= y <= x; defaults to 1 (first portion)

column_name

No

Any data type

Cannot be explicitly set to null; no implicit type conversions

Supported data types for random sampling (when `column_name` is omitted): BIGINT, DATETIME, BOOLEAN, DOUBLE, STRING, BINARY, CHAR, and VARCHAR.

Error conditions:

Condition

Result

x or y is not an integer type

Error

x or y <= 0

Error

y > x

Error

column_name is explicitly set to null

Error

x or y is null

Returns null

Usage notes

Specify `column_name` to avoid data skew. When column_name is omitted, rows are randomly assigned to portions at runtime, which can produce uneven portion sizes. When column_name is specified, SAMPLE applies uniform hashing on the column values — including null values — distributing them evenly across x portions.

Examples

All examples use the mf_sample table:

+----+--------+--------+---------+-----------+
| id | col1   | col2   | col3    | col4      |
+----+--------+--------+---------+-----------+
| 3  | eee    | rrr    | tttt    | ggggg     |
| 4  | yyy    | uuuu   | iiiii   | ccccccc   |
| 1  | "abc"  | "bcd"  | "rthg"  | "ahgjeog" |
| 2  | "a1bc" | "bc1d" | "rt1hg" | "ahgjeog" |
+----+--------+--------+---------+-----------+

Random sampling (no column specified)

Hash all rows into 2 portions and return the first portion.

SELECT * FROM mf_sample WHERE SAMPLE(2, 1);

Result:

+----+-------+-------+--------+-----------+
| id | col1  | col2  | col3   | col4      |
+----+-------+-------+--------+-----------+
| 3  | eee   | rrr   | tttt   | ggggg     |
| 1  | "abc" | "bcd" | "rthg" | "ahgjeog" |
+----+-------+-------+--------+-----------+

Column-based sampling

Hash the values in the id column into 2 portions and return the first portion.

SELECT * FROM mf_sample WHERE SAMPLE(2, 1, id);

Result:

+----+--------+--------+---------+-----------+
| id | col1   | col2   | col3    | col4      |
+----+--------+--------+---------+-----------+
| 4  | yyy    | uuuu   | iiiii   | ccccccc   |
| 2  | "a1bc" | "bc1d" | "rt1hg" | "ahgjeog" |
+----+--------+--------+---------+-----------+

What's next

For other sampling and utility functions, see Other functions.