Sampling function - PolarDB - Alibaba Cloud Documentation Center

A sampling function allows you to select random sample data from a table as the input rather than using the entire table as the input.

Syntax

SAMPLE [BLOCK] (sample_percent[, dummy_arg]) [SEED (seed_value)]

Parameters

Parameter	Description
BLOCK	The basic unit of sampling. If the BLOCK keyword is not specified, the basic unit of sampling is row. If the BLOCK keyword is specified, the basic unit of sampling is block. This means that all data or no data in a block is selected.
sample_percent	The percentage of data to be sampled from a source table. Valid values: [0, 100].
dummy_arg	An optional parameter, which exists for compatibility and has no actual impact on the sampling result. If you specify this parameter, the value must be equal to or greater than 1.
seed_value	An optional parameter that specifies the random seed used for sampling. If other parameters and conditions are the same, you can specify the same random seed to ensure that two sampling results are the same.

Return value

A subset of the source table after sampling is returned. The number of rows in the returned subset is less than or equal to that in the source table.

Example

Create a table named sample_table_test and insert data into the table.

CREATE TABLE sample_table_test (id int, name text);
INSERT INTO sample_table_test 
  SELECT i, repeat(i::text, 10) FROM generate_series(0, 10000) s(i);

Specify the BLOCK keyword. Sample 10% of the data from the sample_table_test table. Set the dummy_arg parameter to 1 and the seed_value parameter to 10.
```
SELECT COUNT(*) FROM sample_table_test SAMPLE BLOCK (10, 1) SEED (10);

# A value of 873 is returned. The result can be reproduced.
```
Note
The percentage of data sampled from the source table for each sampling may not be exactly the percentage specified by the sample_pecent parameter, but is roughly around the specified percentage.
Do not specify the BLOCK keyword. Sample 5% of the data from the sample_table_test table. Do not specify a random seed.
```
SELECT COUNT(*) FROM sample_table_test SAMPLE (5);

# A value of 485 is returned. The result cannot be reproduced.
```