This performance test uses 1 TB of TPC-DS data as test data and uses dsdgen to generate sample data.

Note This implementation of TPC-DS is derived from the TPC-DS Benchmark and is not comparable to published TPC-DS Benchmark results, as this implementation does not comply with all the requirements of the TPC-DS Benchmark.
  1. Download the TPC-DS data generation tool dsdgen from the TPC official website and compile the downloaded file to generate a binary executable file named dsdgen.

  2. Create a directory to store data files.
    mkdir data1tb
  3. Construct your test data.
    ./dsdgen -sc 1000 -dir data1tb -TERMINATE N
    The following table describes the parameters.
    Parameter Description Example
    -sc The test data volume. Unit: GB.
    • 10
    • 1000
    -dir The directory to which data files are written. data1tb
    -TERMINATE Specifies whether to add a field separator at the end of each row. N or Y
    • N: No field separator is added at the end of each row.
    • Y: A field separator such as a vertical bar (|) is added at the end of each row.
    -PARALLEL The total number of chunks.

    Each statement can generate only a single chunk. Therefore, the number of chunks indicates the times for which the statement is executed.

    5
    -CHILD The serial number of the chunk that the current statement generates. 1
    In the following sample code, 1 TB of test data is separated into five chunks:
    mkdir data1tb_5
    
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 1
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 2
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 3
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 4
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 5
    Test data files are generated in the text format by using the dsdgen commands. A vertical bar (|) is used as the default field separator, and each row contains only a single data entry.
    call_center.dat
    catalog_page.dat
    catalog_returns.dat
    catalog_sales.dat
    customer_address.dat
    customer.dat
    customer_demographics.dat
    date_dim.dat
    dbgen_version.dat
    household_demographics.dat
    income_band.dat
    inventory.dat
    item.dat
    promotion.dat
    reason.dat
    ship_mode.dat
    store.dat
    store_returns.dat
    store_sales.dat
    time_dim.dat
    warehouse.dat
    web_page.dat
    web_returns.dat
    web_sales.dat
    web_site.dat

For more information about how to use dsdgen, see tpc-ds.