MaxCompute provides public TPC-DS datasets in four sizes (10 GB, 100 GB, 1 TB, and 10 TB) for product testing. This topic describes the dataset and how to query it.
Dataset and tables
TPC-DS is a standard benchmark from the Transaction Processing Performance Council (TPC) for evaluating data management systems. MaxCompute uses the official TPC-DS tool to generate datasets stored in different schemas within the BIGDATA_PUBLIC_DATASET project.
Dataset size | Project name | Schema name |
10 GB | BIGDATA_PUBLIC_DATASET | TPCDS_10G |
100 GB | BIGDATA_PUBLIC_DATASET | TPCDS_100G |
1 TB | BIGDATA_PUBLIC_DATASET | TPCDS_1T |
10 TB | BIGDATA_PUBLIC_DATASET | TPCDS_10T |
Each schema contains the following tables:
call_center, catalog_page, catalog_returns, catalog_sales, customer, customer_address, customer_demographics, date_dim, household_demographics, income_band, inventory, item, promotion, reason, ship_mode, store, store_returns, store_sales, tab_reducenum, tab_reducenum_100, time_dim, warehouse, web_page, web_returns, web_sales, web_site
For details about table schemas and content, see TPC Benchmark DS (TPC-DS) v2.5.0.
Available regions
Region | Region ID |
China (Hangzhou) | cn-hangzhou |
China (Shanghai) | cn-shanghai |
China (Beijing) | cn-beijing |
China (Zhangjiakou) | cn-zhangjiakou |
China (Ulanqab) | cn-wulanchabu |
China (Shenzhen) | cn-shenzhen |
China (Chengdu) | cn-chengdu |
China (Hong Kong) | cn-hongkong |
Singapore | ap-southeast-1 |
Japan (Tokyo) | ap-northeast-1 |
Malaysia (Kuala Lumpur) | ap-southeast-3 |
Indonesia (Jakarta) | ap-southeast-5 |
US (Silicon Valley) | us-west-1 |
US (Virginia) | us-east-1 |
UK (London) | eu-west-1 |
Germany (Frankfurt) | eu-central-1 |
UAE (Dubai) | me-east-1 |
China (Shanghai) Finance Cloud | cn-shanghai-finance-1 |
China (Beijing) Finance Cloud (Invitational Preview) | cn-beijing-finance-1 |
South China 1 Finance Cloud | cn-shenzhen-finance-1 |
China (Beijing) Alibaba Gov Cloud 1 | cn-north-2-gov-1 |
Prerequisites
Before you begin, make sure that you have:
A MaxCompute project. For more information, see Create a MaxCompute project
Query the data
You query TPC-DS tables through cross-project access because you are not added as a member of the BIGDATA_PUBLIC_DATASET project. Specify the full path in project.schema.table format.
Supported tools
The Data Map feature in DataWorks cannot discover tables in this public dataset because the data requires cross-project access.
Required session flags
The TPC-DS dataset uses schemas for storage and data types such as DECIMAL and INT. Set the following flags before running queries:
-- Enable session-level schema syntax.
SET odps.namespace.schema=true;
-- Enable data type compatibility.
SET odps.sql.hive.compatible=true;
SET odps.sql.type.system.odps2=true;
SET odps.sql.decimal.odps2=true;
-- Allow ORDER BY without LIMIT clause.
-- New projects use this setting by default. Existing projects may need it set explicitly
-- to avoid errors or suboptimal join order for the Q72 query.
SET odps.sql.validate.orderby.limit=false;
-- Allow Cartesian products (required for Q77).
SET odps.sql.allow.cartesian=true;If tenant-level schema syntax is not enabled, the public dataset does not appear in DataWorks Data Analysis. SQL queries still work.
Query example
The following query retrieves 100 rows from the store_sales table in the 10 GB dataset. To query other datasets, replace the schema name (for example, tpcds_100g).
-- Enable session-level schema syntax.
SET odps.namespace.schema=true;
-- Query the tpcds_10g dataset. Replace the schema name to query other datasets.
SELECT * FROM bigdata_public_dataset.tpcds_10g.store_sales limit 100;Sample output:
+-----------------+-----------------+------------+----------------+-------------+-------------+------------+-------------+-------------+------------------+-------------+-------------------+---------------+----------------+---------------------+--------------------+-----------------------+-------------------+------------+---------------+-------------+---------------------+---------------+
| ss_sold_date_sk | ss_sold_time_sk | ss_item_sk | ss_customer_sk | ss_cdemo_sk | ss_hdemo_sk | ss_addr_sk | ss_store_sk | ss_promo_sk | ss_ticket_number | ss_quantity | ss_wholesale_cost | ss_list_price | ss_sales_price | ss_ext_discount_amt | ss_ext_sales_price | ss_ext_wholesale_cost | ss_ext_list_price | ss_ext_tax | ss_coupon_amt | ss_net_paid | ss_net_paid_inc_tax | ss_net_profit |
+-----------------+-----------------+------------+----------------+-------------+-------------+------------+-------------+-------------+------------------+-------------+-------------------+---------------+----------------+---------------------+--------------------+-----------------------+-------------------+------------+---------------+-------------+---------------------+---------------+
| NULL | NULL | 39073 | NULL | 1420876 | 1738 | 56600 | NULL | NULL | 41171 | 90 | 53.3 | NULL | 72.87 | 0 | NULL | 4797 | 7626.6 | 459.08 | 0 | NULL | NULL | NULL |
| NULL | NULL | 22434 | 98163 | NULL | NULL | NULL | 1 | NULL | 8909 | NULL | 15.22 | NULL | 9.2 | NULL | 690 | NULL | 1380.75 | NULL | NULL | NULL | NULL | -451.5 |
| NULL | NULL | 82219 | NULL | NULL | 1572 | 209531 | 38 | 285 | 14907 | 48 | 84.64 | 132.03 | NULL | 0 | NULL | NULL | NULL | 51.96 | 0 | NULL | 2650.2 | -1464.48 |
| NULL | NULL | 97573 | 214533 | 1298744 | NULL | NULL | NULL | 77 | 26167 | NULL | 92.55 | 143.45 | 91.8 | 0 | 8353.8 | NULL | NULL | NULL | 0 | NULL | NULL | -68.25 |
| NULL | NULL | 60120 | 376494 | NULL | 1678 | 13917 | NULL | NULL | 35953 | 9 | 46.97 | NULL | NULL | NULL | NULL | NULL | 714.33 | NULL | NULL | NULL | NULL | 34.38 |
+-----------------+-----------------+------------+----------------+-------------+-------------+------------+-------------+-------------+------------------+-------------+-------------------+---------------+----------------+---------------------+--------------------+-----------------------+-------------------+------------+---------------+-------------+---------------------+---------------+... (100 rows returned)
Sample query files
MaxCompute provides sample query files for each dataset size. Each file contains 99 queries that vary in complexity and data scan volume.
Select queries carefully to avoid high computing costs, especially for larger datasets.
Dataset size | Query file |
10 GB | |
100 GB | |
1 TB | |
10 TB |
Generate different query versions using the TPC-DS benchmark suite tools. For more information, see the official TPC-DS documentation.
Billing
Storage of this public dataset is free. However, running queries incurs computing charges. For more information, see Pay-as-you-go computing pricing.
Disclaimer
The TPC-DS data generation and analysis are based on the TPC-DS benchmark. Results cannot be compared with any officially published TPC-DS benchmark results because the test environment does not meet all TPC-DS benchmark requirements.
This TPC-DS dataset is for product testing and evaluation only. The data is not updated regularly and must not be used in a production environment.
The TPC-DS data originates from TPC. You can also generate TPC-DS data independently. For more information, see the official TPC-DS documentation.