Run the standard TPC-H benchmark to measure how much rds_duckdb accelerates complex analytical queries on ApsaraDB RDS for PostgreSQL.
Prerequisites
Before you begin, ensure that you have:
An Alibaba Cloud account with permissions to create ECS and RDS instances
An ECS instance running Linux, used as the test client
An ApsaraDB RDS for PostgreSQL instance with rds_duckdb enabled — see Use the rds_duckdb extension for prerequisites
PostgreSQL client tools installed on the ECS instance — download from postgresql.org
Resource estimates
Confirm your ECS instance has enough disk space before you start.
This guide uses 100 GB (100 SF) as the test data size. Scale Factor (SF) is the TPC-H unit of data volume: 1 SF = 1 GB of raw table data across all eight benchmark tables. Index data and other overhead are not included in this figure, so reserve additional storage beyond the raw data size.
Step 1: Generate test data on the ECS instance
Download and extract dbgen, the TPC-H data generator.
wget https://github.com/electrum/tpch-dbgen/archive/refs/heads/master.zip yum install -y unzip zip unzip master.zip cd tpch-dbgen-master/Add the
EOL_HANDLINGflag so dbgen omits the trailing|delimiter at the end of each row, then build the binary.echo "#define EOL_HANDLING 1" >> config.h makeGenerate 100 GB of test data and move the output files to a dedicated directory. Adjust the path and scale factor as needed.
./dbgen -s 100 mkdir -p /data/test/tpch_data mv *.tbl /data/test/tpch_data
Step 2: Create and populate the RDS PostgreSQL database
Create an ApsaraDB RDS for PostgreSQL instance and create a privileged account.
Connect to the instance from your ECS instance using the PostgreSQL client.
Create the eight TPC-H tables.
CREATE TABLE customer(c_custkey BIGINT NOT NULL, c_name VARCHAR NOT NULL, c_address VARCHAR NOT NULL, c_nationkey INTEGER NOT NULL, c_phone VARCHAR NOT NULL, c_acctbal DECIMAL(15,2) NOT NULL, c_mktsegment VARCHAR NOT NULL, c_comment VARCHAR NOT NULL); CREATE TABLE lineitem(l_orderkey BIGINT NOT NULL, l_partkey BIGINT NOT NULL, l_suppkey BIGINT NOT NULL, l_linenumber BIGINT NOT NULL, l_quantity DECIMAL(15,2) NOT NULL, l_extendedprice DECIMAL(15,2) NOT NULL, l_discount DECIMAL(15,2) NOT NULL, l_tax DECIMAL(15,2) NOT NULL, l_returnflag VARCHAR NOT NULL, l_linestatus VARCHAR NOT NULL, l_shipdate DATE NOT NULL, l_commitdate DATE NOT NULL, l_receiptdate DATE NOT NULL, l_shipinstruct VARCHAR NOT NULL, l_shipmode VARCHAR NOT NULL, l_comment VARCHAR NOT NULL); CREATE TABLE nation(n_nationkey INTEGER NOT NULL, n_name VARCHAR NOT NULL, n_regionkey INTEGER NOT NULL, n_comment VARCHAR NOT NULL); CREATE TABLE orders(o_orderkey BIGINT NOT NULL, o_custkey BIGINT NOT NULL, o_orderstatus VARCHAR NOT NULL, o_totalprice DECIMAL(15,2) NOT NULL, o_orderdate DATE NOT NULL, o_orderpriority VARCHAR NOT NULL, o_clerk VARCHAR NOT NULL, o_shippriority INTEGER NOT NULL, o_comment VARCHAR NOT NULL); CREATE TABLE part(p_partkey BIGINT NOT NULL, p_name VARCHAR NOT NULL, p_mfgr VARCHAR NOT NULL, p_brand VARCHAR NOT NULL, p_type VARCHAR NOT NULL, p_size INTEGER NOT NULL, p_container VARCHAR NOT NULL, p_retailprice DECIMAL(15,2) NOT NULL, p_comment VARCHAR NOT NULL); CREATE TABLE partsupp(ps_partkey BIGINT NOT NULL, ps_suppkey BIGINT NOT NULL, ps_availqty BIGINT NOT NULL, ps_supplycost DECIMAL(15,2) NOT NULL, ps_comment VARCHAR NOT NULL); CREATE TABLE region(r_regionkey INTEGER NOT NULL, r_name VARCHAR NOT NULL, r_comment VARCHAR NOT NULL); CREATE TABLE supplier(s_suppkey BIGINT NOT NULL, s_name VARCHAR NOT NULL, s_address VARCHAR NOT NULL, s_nationkey INTEGER NOT NULL, s_phone VARCHAR NOT NULL, s_acctbal DECIMAL(15,2) NOT NULL, s_comment VARCHAR NOT NULL);Verify the tables were created:
\dtImport the generated data into each table.
\copy customer FROM '/data/test/tpch_data/customer.tbl' DELIMITER '|'; \copy lineitem FROM '/data/test/tpch_data/lineitem.tbl' DELIMITER '|'; \copy nation FROM '/data/test/tpch_data/nation.tbl' DELIMITER '|'; \copy orders FROM '/data/test/tpch_data/orders.tbl' DELIMITER '|'; \copy part FROM '/data/test/tpch_data/part.tbl' DELIMITER '|'; \copy partsupp FROM '/data/test/tpch_data/partsupp.tbl' DELIMITER '|'; \copy region FROM '/data/test/tpch_data/region.tbl' DELIMITER '|'; \copy supplier FROM '/data/test/tpch_data/supplier.tbl' DELIMITER '|';Verify row counts and table sizes:
\dt+
Step 3: Install rds_duckdb and create column-oriented tables
Install the rds_duckdb extension.
CREATE EXTENSION rds_duckdb;Convert each PostgreSQL table to its column-oriented counterpart used by DuckDB.
SELECT rds_duckdb.create_duckdb_table('customer'); SELECT rds_duckdb.create_duckdb_table('lineitem'); SELECT rds_duckdb.create_duckdb_table('nation'); SELECT rds_duckdb.create_duckdb_table('orders'); SELECT rds_duckdb.create_duckdb_table('part'); SELECT rds_duckdb.create_duckdb_table('partsupp'); SELECT rds_duckdb.create_duckdb_table('region'); SELECT rds_duckdb.create_duckdb_table('supplier');Set the thread count and memory limit based on your instance's available resources.
SET rds_duckdb.worker_threads = 32; SET rds_duckdb.memory_limit = 8192; -- in MB
Step 4: Enable AP query acceleration
Enable the AP query acceleration engine.
SET rds_duckdb.execution = on;
Step 5: Run all 22 TPC-H queries
Enable query timing and redirect output to a file, then run all 22 standard TPC-H SQL statements.
\timing on
\o /data/test/tpch_data/tpch_outThe 22 queries cover a range of analytical workloads. The table below maps each query to its business scenario so you can evaluate results in context.
| Query | Business scenario |
|---|---|
| Q1 | Pricing summary — aggregate billed, shipped, and returned line items |
| Q2 | Minimum-cost supplier — find the lowest-cost supplier for a part in a region |
| Q3 | Shipping priority — rank unshipped orders by potential revenue |
| Q4 | Order priority check — measure how often suppliers meet commit dates |
| Q5 | Local supplier volume — revenue attributed to suppliers in a region |
| Q6 | Forecasting revenue change — estimate revenue impact of a discount scenario |
| Q7 | Volume shipping — trade volume between two nations over two years |
| Q8 | National market share — a nation's share of a product type's market |
| Q9 | Product type profit — annual profit by nation for a product type |
| Q10 | Returned item reporting — customers who returned items and their revenue impact |
| Q11 | Important stock identification — parts whose supply value exceeds a threshold |
| Q12 | Shipping mode and order priority — whether shipping mode affects order priority distribution |
| Q13 | Customer distribution — how customers are distributed by order count |
| Q14 | Promotion effect — revenue share from promotional parts in a given month |
| Q15 | Top supplier — the supplier with the highest revenue in a quarter |
| Q16 | Parts and supplier relationship — count distinct suppliers for specific part attributes |
| Q17 | Small-quantity order revenue — average yearly loss from small-quantity orders |
| Q18 | Large volume customer — customers who placed orders above a quantity threshold |
| Q19 | Discounted revenue — revenue from air-shipped items meeting specific criteria |
| Q20 | Potential part promotion — suppliers with excess inventory for a product in a region |
| Q21 | Suppliers who kept back orders — suppliers with late shipments on failed orders |
| Q22 | Global sales opportunity — customers who haven't placed orders but have high account balances |
-- Q1: Pricing summary report
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) AS sum_qty,
sum(l_extendedprice) AS sum_base_price,
sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
avg(l_quantity) AS avg_qty,
avg(l_extendedprice) AS avg_price,
avg(l_discount) AS avg_disc,
count(*) AS count_order
FROM
lineitem
WHERE
l_shipdate <= CAST('1998-09-02' AS date)
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;
-- Q2: Minimum-cost supplier
SELECT
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
FROM
part,
supplier,
partsupp,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND p_size = 15
AND p_type LIKE '%BRASS'
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE'
AND ps_supplycost = (
SELECT
min(ps_supplycost)
FROM
partsupp,
supplier,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE')
ORDER BY
s_acctbal DESC,
n_name,
s_name,
p_partkey
LIMIT 100;
-- Q3: Shipping priority
SELECT
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
AND c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate < CAST('1995-03-15' AS date)
AND l_shipdate > CAST('1995-03-15' AS date)
GROUP BY
l_orderkey,
o_orderdate,
o_shippriority
ORDER BY
revenue DESC,
o_orderdate
LIMIT 10;
-- Q4: Order priority checking
SELECT
o_orderpriority,
count(*) AS order_count
FROM
orders
WHERE
o_orderdate >= CAST('1993-07-01' AS date)
AND o_orderdate < CAST('1993-10-01' AS date)
AND EXISTS (
SELECT
*
FROM
lineitem
WHERE
l_orderkey = o_orderkey
AND l_commitdate < l_receiptdate)
GROUP BY
o_orderpriority
ORDER BY
o_orderpriority;
-- Q5: Local supplier volume
SELECT
n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
FROM
customer,
orders,
lineitem,
supplier,
nation,
region
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND l_suppkey = s_suppkey
AND c_nationkey = s_nationkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'ASIA'
AND o_orderdate >= CAST('1994-01-01' AS date)
AND o_orderdate < CAST('1995-01-01' AS date)
GROUP BY
n_name
ORDER BY
revenue DESC;
-- Q6: Forecasting revenue change
SELECT
sum(l_extendedprice * l_discount) AS revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1994-01-01' AS date)
AND l_shipdate < CAST('1995-01-01' AS date)
AND l_discount BETWEEN 0.05
AND 0.07
AND l_quantity < 24;
-- Q7: Volume shipping
SELECT
supp_nation,
cust_nation,
l_year,
sum(volume) AS revenue
FROM (
SELECT
n1.n_name AS supp_nation,
n2.n_name AS cust_nation,
extract(year FROM l_shipdate) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
WHERE
s_suppkey = l_suppkey
AND o_orderkey = l_orderkey
AND c_custkey = o_custkey
AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND ((n1.n_name = 'FRANCE'
AND n2.n_name = 'GERMANY')
OR (n1.n_name = 'GERMANY'
AND n2.n_name = 'FRANCE'))
AND l_shipdate BETWEEN CAST('1995-01-01' AS date)
AND CAST('1996-12-31' AS date)) AS shipping
GROUP BY
supp_nation,
cust_nation,
l_year
ORDER BY
supp_nation,
cust_nation,
l_year;
-- Q8: National market share
SELECT
o_year,
sum(
CASE WHEN nation = 'BRAZIL' THEN
volume
ELSE
0
END) / sum(volume) AS mkt_share
FROM (
SELECT
extract(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) AS volume,
n2.n_name AS nation
FROM
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
WHERE
p_partkey = l_partkey
AND s_suppkey = l_suppkey
AND l_orderkey = o_orderkey
AND o_custkey = c_custkey
AND c_nationkey = n1.n_nationkey
AND n1.n_regionkey = r_regionkey
AND r_name = 'AMERICA'
AND s_nationkey = n2.n_nationkey
AND o_orderdate BETWEEN CAST('1995-01-01' AS date)
AND CAST('1996-12-31' AS date)
AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
GROUP BY
o_year
ORDER BY
o_year;
-- Q9: Product type profit measure
SELECT
nation,
o_year,
sum(amount) AS sum_profit
FROM (
SELECT
n_name AS nation,
extract(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
FROM
part,
supplier,
lineitem,
partsupp,
orders,
nation
WHERE
s_suppkey = l_suppkey
AND ps_suppkey = l_suppkey
AND ps_partkey = l_partkey
AND p_partkey = l_partkey
AND o_orderkey = l_orderkey
AND s_nationkey = n_nationkey
AND p_name LIKE '%green%') AS profit
GROUP BY
nation,
o_year
ORDER BY
nation,
o_year DESC;
-- Q10: Returned item reporting
SELECT
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
FROM
customer,
orders,
lineitem,
nation
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate >= CAST('1993-10-01' AS date)
AND o_orderdate < CAST('1994-01-01' AS date)
AND l_returnflag = 'R'
AND c_nationkey = n_nationkey
GROUP BY
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
ORDER BY
revenue DESC
LIMIT 20;
-- Q11: Important stock identification
SELECT
ps_partkey,
sum(ps_supplycost * ps_availqty) AS value
FROM
partsupp,
supplier,
nation
WHERE
ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY'
GROUP BY
ps_partkey
HAVING
sum(ps_supplycost * ps_availqty) > (
SELECT
sum(ps_supplycost * ps_availqty) * 0.0001000000
FROM
partsupp,
supplier,
nation
WHERE
ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY')
ORDER BY
value DESC;
-- Q12: Shipping mode and order priority
SELECT
l_shipmode,
sum(
CASE WHEN o_orderpriority = '1-URGENT'
OR o_orderpriority = '2-HIGH' THEN
1
ELSE
0
END) AS high_line_count,
sum(
CASE WHEN o_orderpriority <> '1-URGENT'
AND o_orderpriority <> '2-HIGH' THEN
1
ELSE
0
END) AS low_line_count
FROM
orders,
lineitem
WHERE
o_orderkey = l_orderkey
AND l_shipmode IN ('MAIL', 'SHIP')
AND l_commitdate < l_receiptdate
AND l_shipdate < l_commitdate
AND l_receiptdate >= CAST('1994-01-01' AS date)
AND l_receiptdate < CAST('1995-01-01' AS date)
GROUP BY
l_shipmode
ORDER BY
l_shipmode;
-- Q13: Customer distribution
SELECT
c_count,
count(*) AS custdist
FROM (
SELECT
c_custkey,
count(o_orderkey)
FROM
customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%special%requests%'
GROUP BY
c_custkey) AS c_orders (c_custkey,
c_count)
GROUP BY
c_count
ORDER BY
custdist DESC,
c_count DESC;
-- Q14: Promotion effect
SELECT
100.00 * sum(
CASE WHEN p_type LIKE 'PROMO%' THEN
l_extendedprice * (1 - l_discount)
ELSE
0
END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
FROM
lineitem,
part
WHERE
l_partkey = p_partkey
AND l_shipdate >= date '1995-09-01'
AND l_shipdate < CAST('1995-10-01' AS date);
-- Q15: Top supplier
SELECT
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM
supplier,
(
SELECT
l_suppkey AS supplier_no,
sum(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
supplier_no) revenue0
WHERE
s_suppkey = supplier_no
AND total_revenue = (
SELECT
max(total_revenue)
FROM (
SELECT
l_suppkey AS supplier_no,
sum(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
supplier_no) revenue1)
ORDER BY
s_suppkey;
-- Q16: Parts and supplier relationship
SELECT
p_brand,
p_type,
p_size,
count(DISTINCT ps_suppkey) AS supplier_cnt
FROM
partsupp,
part
WHERE
p_partkey = ps_partkey
AND p_brand <> 'Brand#45'
AND p_type NOT LIKE 'MEDIUM POLISHED%'
AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
AND ps_suppkey NOT IN (
SELECT
s_suppkey
FROM
supplier
WHERE
s_comment LIKE '%Customer%Complaints%')
GROUP BY
p_brand,
p_type,
p_size
ORDER BY
supplier_cnt DESC,
p_brand,
p_type,
p_size;
-- Q17: Small-quantity order revenue
SELECT
sum(l_extendedprice) / 7.0 AS avg_yearly
FROM
lineitem,
part
WHERE
p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container = 'MED BOX'
AND l_quantity < (
SELECT
0.2 * avg(l_quantity)
FROM
lineitem
WHERE
l_partkey = p_partkey);
-- Q18: Large volume customer
SELECT
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
FROM
customer,
orders,
lineitem
WHERE
o_orderkey IN (
SELECT
l_orderkey
FROM
lineitem
GROUP BY
l_orderkey
HAVING
sum(l_quantity) > 300)
AND c_custkey = o_custkey
AND o_orderkey = l_orderkey
GROUP BY
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
ORDER BY
o_totalprice DESC,
o_orderdate
LIMIT 100;
-- Q19: Discounted revenue
SELECT
sum(l_extendedprice * (1 - l_discount)) AS revenue
FROM
lineitem,
part
WHERE (p_partkey = l_partkey
AND p_brand = 'Brand#12'
AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
AND l_quantity >= 1
AND l_quantity <= 1 + 10
AND p_size BETWEEN 1 AND 5
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON')
OR (p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
AND l_quantity >= 10
AND l_quantity <= 10 + 10
AND p_size BETWEEN 1 AND 10
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON')
OR (p_partkey = l_partkey
AND p_brand = 'Brand#34'
AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
AND l_quantity >= 20
AND l_quantity <= 20 + 10
AND p_size BETWEEN 1 AND 15
AND l_shipmode IN ('AIR', 'AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON');
-- Q20: Potential part promotion
SELECT
s_name,
s_address
FROM
supplier,
nation
WHERE
s_suppkey IN (
SELECT
ps_suppkey
FROM
partsupp
WHERE
ps_partkey IN (
SELECT
p_partkey
FROM
part
WHERE
p_name LIKE 'forest%')
AND ps_availqty > (
SELECT
0.5 * sum(l_quantity)
FROM
lineitem
WHERE
l_partkey = ps_partkey
AND l_suppkey = ps_suppkey
AND l_shipdate >= CAST('1994-01-01' AS date)
AND l_shipdate < CAST('1995-01-01' AS date)))
AND s_nationkey = n_nationkey
AND n_name = 'CANADA'
ORDER BY
s_name;
-- Q21: Suppliers who kept back orders
SELECT
s_name,
count(*) AS numwait
FROM
supplier,
lineitem l1,
orders,
nation
WHERE
s_suppkey = l1.l_suppkey
AND o_orderkey = l1.l_orderkey
AND o_orderstatus = 'F'
AND l1.l_receiptdate > l1.l_commitdate
AND EXISTS (
SELECT
*
FROM
lineitem l2
WHERE
l2.l_orderkey = l1.l_orderkey
AND l2.l_suppkey <> l1.l_suppkey)
AND NOT EXISTS (
SELECT
*
FROM
lineitem l3
WHERE
l3.l_orderkey = l1.l_orderkey
AND l3.l_suppkey <> l1.l_suppkey
AND l3.l_receiptdate > l3.l_commitdate)
AND s_nationkey = n_nationkey
AND n_name = 'SAUDI ARABIA'
GROUP BY
s_name
ORDER BY
numwait DESC,
s_name
LIMIT 100;
-- Q22: Global sales opportunity
SELECT
cntrycode,
count(*) AS numcust,
sum(c_acctbal) AS totacctbal
FROM (
SELECT
substring(c_phone FROM 1 FOR 2) AS cntrycode,
c_acctbal
FROM
customer
WHERE
substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
AND c_acctbal > (
SELECT
avg(c_acctbal)
FROM
customer
WHERE
c_acctbal > 0.00
AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
AND NOT EXISTS (
SELECT
*
FROM
orders
WHERE
o_custkey = c_custkey)) AS custsale
GROUP BY
cntrycode
ORDER BY
cntrycode;After all 22 queries complete, the results are saved to /data/test/tpch_data/tpch_out. Compare execution times with and without rds_duckdb.execution = on to quantify the acceleration effect.
What's next
Use the rds_duckdb extension — configure rds_duckdb for production workloads
Create an ApsaraDB RDS for PostgreSQL instance — set up your instance with the right specifications for AP workloads