PolarDB for PostgreSQL

This article describes the database optimization strategies and presents a demo illustrating how to boost database performance for queue processing tasks.

By digoal

In e-commerce operations, scenarios like the following often occur: due to the relationships between upstream and downstream manufacturers, when a user places an order, both parties generate an order record within their own systems and exchange feedback. After receiving a feedback order, the local system initially caches the order record queue. Subsequently, a background process retrieves orders from the cache for processing.

The key steps in this process are: high-speed writing to the queue, sequential extraction from the queue and high-speed processing, and removal of processed order records from the queue.

During periods of high concurrency, everyone accesses the queue in the same order, which can lead to hotspots and may result in bottlenecks such as lock conflicts, I/O scanning inefficiencies, and wasteful CPU computations. Additionally, such bottlenecks might emerge from delayed clearance of index versions after order records are processed, causing redundant I/O and CPU resource consumption due to backtracking for version verification.

This article presents optimization strategies and a demo illustrating how to boost database performance for queue processing tasks, achieving an enhancement in performance by 10 to 20 times.

DEMO

1. Test Environment

MacBook Pro (15-inch, 2018)  
2.2 GHz six-core Intel Core i7  
32 GB 2400 MHz DDR4  
  
PostgreSQL 15.1

Since the environment is MacOS, you may need to set ulimit.

ulimit -n 1000000

2. The processing queue list of the upstream write orders

create table t_order_q (  
  id serial8 primary key,   -- Auto-increment primary key  
  order_id uuid unique,     -- The order number passed from the upstream  
  cts timestamp not null    -- The creation time of the order passed from the upstream   
);   
  
-- create index on t_order_q (cts); -- If the order is retrieved according to the order time, you need to create a time field index. You can also process the order according to the auto-increment primary key order. At this time, the time index is not required.

3. State table of orders taken out and processed

create table t_order_u (  
  id serial8 primary key,   -- Auto-increment primary key  
  order_id uuid unique,     -- The order number passed from the upstream  
  cts timestamp not null,    -- The creation time of the order passed from the upstream   
  uts timestamp not null,   -- Order processing time  
  status int not null       -- Order processing status marker   
);

4. Write 1 million order queues

insert into t_order_q (order_id, cts) select gen_random_uuid(), clock_timestamp() from generate_series(1,1000000);

5. Write a pgbench stress test script, take it out of the queue, and use AdLock to add a transaction lock to the queue ID to determine whether it is being processed. After the transaction ends, AdLock is automatically released. In addition, AdLock is often used to relieve stress in flash sale scenarios.

vi t.sql  
  
with tmp as   
  (delete from t_order_q where ctid = (select ctid from t_order_q where pg_try_advisory_xact_lock(id) order by id limit 1) returning order_id, cts)  
insert into t_order_u (order_id,cts,uts,status) select tmp.order_id, tmp.cts, now(), 1 from tmp;
Or
begin;
select id as v_id from t_order_q where pg_try_advisory_xact_lock(id) order by id limit 1 \gset
with tmp as (delete from t_order_q where id = :v_id returning order_id, cts) 
  insert into t_order_u (order_id,cts,uts,status) select tmp.order_id, tmp.cts, now(), 1 from tmp;   
end;
Or (The increased duration of application-side operations after the sleep simulation application receives the order to be processed.)
begin; 
select id as v_id from t_order_q where pg_try_advisory_xact_lock(id) order by id limit 1 \gset
\sleep 10ms
with tmp as (delete from t_order_q where id = :v_id returning order_id, cts) 
  insert into t_order_u (order_id,cts,uts,status) select tmp.order_id, tmp.cts, now(), 1 from tmp;   
end;

6. Perform stress tests on 256 concurrent consumption queues, with an average of 3906 transactions handled by each connection.

select 1000000/256.0;  
3906.2500000000000

7. Results of stress test

pgbench -M extended -f ./t.sql -n -r -P 1 -c 256 -j 2 -t 3906
transaction type: ./t.sql  
scaling factor: 1  
query mode: extended  
number of clients: 256  
number of threads: 2  
maximum number of tries: 1  
number of transactions per client: 3906  
number of transactions actually processed: 999936/999936  
number of failed transactions: 0 (0.000%)  
latency average = 8.111 ms  
latency stddev = 5.376 ms  
initial connection time = 429.698 ms  
tps = 25379.081141 (without initial connection time)  
statement latencies in milliseconds and failures:  
         8.114           0  with tmp as

How is the Performance before Optimization?

1. Write a pgbench stress test script, take it out of the queue, and use AdLock to add a transaction lock to the queue ID to determine whether it is being processed. After the transaction ends, AdLock is automatically released. In addition, AdLock is often used to relieve stress in flash sale scenarios.

vi t1.sql  
  
begin;  
select id as vid from t_order_q order by id for update limit 1 \gset  
with tmp as   
  (delete from t_order_q where id = :vid returning order_id, cts)  
insert into t_order_u (order_id,cts,uts,status) select tmp.order_id, tmp.cts, now(), 1 from tmp;   
end;

2. Results of stress test

pgbench -M extended -f ./t1.sql -n -r -P 1 -c 256 -j 2 -t 3906
TPS is about 1200.

After the skip locked is added, TPS can only reach about 2500. After reducing concurrency, the performance of using skip locked can be improved to about 8K TPS.

begin;  
select id as vid from t_order_q order by id for update skip locked limit 1 \gset  
with tmp as   
  (delete from t_order_q where id = :vid returning order_id, cts)  
insert into t_order_u (order_id,cts,uts,status) select tmp.order_id, tmp.cts, now(), 1 from tmp;   
end;

What Else Can We Do to Improve Performance?

1. Reduce I/O and CPU calculation waste:

• In the case of concurrency, order by id limit 1 needs to scan several rows instead of 1 row, because some IDs may have been touched by AdLock, the number of wasted pg_try_advisory_xact_lock() cpu ops calculations is approximately equal to n + n-1 + n-2 + ... + n-n, and the wasted I/O is approximately equal to n.

Optimization method:

• Fix N links and take out different data shards according to the ID hash mod, so as to reduce the I/O and CPU calculation waste.

• Or split the queue table into several partition tables, and each partition is assigned to a different process for retrieval according to the id hash mode during importing to the warehouse, thus reducing conflicts and wasted scanning and improving concurrency.

2. Improve the frequency of the index vacuum to reduce the waste of CPU and I/O back to the table caused by garbage data judgment due to the absence of the index version. Improve the autovacuum_work_mem to accommodate all dead tuple ctid and avoid scanning the index multiple times.

Optimization method:

• Configure the following parameters: autovacuum_naptime, autovacuum_work_mem (or old version maintenance_work_mem).

3. Use the parallel vacuum to configure max_parallel_maintenance_workers.

4. Configure the vacuum to use prefetch blocks to reduce the problem of prolonged vacuum caused by I/O delay. (Suitable for disks with high single I/O delay but no bottleneck in throughput)

5. Take out multiple pieces at a time and process them in batches.

6. Use a local NVMe SSD with high IOPS and low single I/O delay.

Community

PolarDB for PostgreSQL | Database Performance Optimization for High-Concurrency Queue Processing

DEMO

How is the Performance before Optimization?

What Else Can We Do to Improve Performance?

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for Xscale

PolarDB for MySQL

AnalyticDB for PostgreSQL