A sampling function allows you to select random sample data from a table as the input rather than using the entire table as the input.
Syntax
SAMPLE [BLOCK] (sample_percent[, dummy_arg]) [SEED (seed_value)]Parameters
Parameter | Description |
BLOCK | The basic unit of sampling.
|
sample_percent | The percentage of data to be sampled from a source table. Valid values: [0, 100]. |
dummy_arg | An optional parameter, which exists for compatibility and has no actual impact on the sampling result. If you specify this parameter, the value must be equal to or greater than 1. |
seed_value | An optional parameter that specifies the random seed used for sampling. If other parameters and conditions are the same, you can specify the same random seed to ensure that two sampling results are the same. |
Return value
A subset of the source table after sampling is returned. The number of rows in the returned subset is less than or equal to that in the source table.
Example
Create a table named sample_table_test and insert data into the table.
CREATE TABLE sample_table_test (id int, name text); INSERT INTO sample_table_test SELECT i, repeat(i::text, 10) FROM generate_series(0, 10000) s(i);Specify the BLOCK keyword. Sample 10% of the data from the
sample_table_testtable. Set the dummy_arg parameter to 1 and the seed_value parameter to 10.SELECT COUNT(*) FROM sample_table_test SAMPLE BLOCK (10, 1) SEED (10); # A value of 873 is returned. The result can be reproduced.NoteThe percentage of data sampled from the source table for each sampling may not be exactly the percentage specified by the sample_pecent parameter, but is roughly around the specified percentage.
Do not specify the BLOCK keyword. Sample 5% of the data from the
sample_table_testtable. Do not specify a random seed.SELECT COUNT(*) FROM sample_table_test SAMPLE (5); # A value of 485 is returned. The result cannot be reproduced.