This performance test uses 1 TB of TPC-DS data as test data and uses dsdgen to generate sample data.
Note This implementation of TPC-DS is derived from the TPC-DS Benchmark and is not comparable
to published TPC-DS Benchmark results, as this implementation does not comply with
all the requirements of the TPC-DS Benchmark.
-
Download the TPC-DS data generation tool dsdgen from the TPC official website and compile the downloaded file to generate a binary executable file named dsdgen.
- Create a directory to store data files.
mkdir data1tb
- Construct your test data.
./dsdgen -sc 1000 -dir data1tb -TERMINATE N
The following table describes the parameters.Parameter Description Example -sc The test data volume. Unit: GB. - 10
- 1000
-dir The directory to which data files are written. data1tb -TERMINATE Specifies whether to add a field separator at the end of each row. N or Y - N: No field separator is added at the end of each row.
- Y: A field separator such as a vertical bar (|) is added at the end of each row.
-PARALLEL The total number of chunks. Each statement can generate only a single chunk. Therefore, the number of chunks indicates the times for which the statement is executed.
5 -CHILD The serial number of the chunk that the current statement generates. 1 In the following sample code, 1 TB of test data is separated into five chunks:mkdir data1tb_5 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 1 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 2 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 3 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 4 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 5
Test data files are generated in the text format by using the dsdgen commands. A vertical bar (|) is used as the default field separator, and each row contains only a single data entry.call_center.dat catalog_page.dat catalog_returns.dat catalog_sales.dat customer_address.dat customer.dat customer_demographics.dat date_dim.dat dbgen_version.dat household_demographics.dat income_band.dat inventory.dat item.dat promotion.dat reason.dat ship_mode.dat store.dat store_returns.dat store_sales.dat time_dim.dat warehouse.dat web_page.dat web_returns.dat web_sales.dat web_site.dat
For more information about how to use dsdgen, see tpc-ds.