Behavior and Audit Log Modeling: PostgreSQL Best Practice (1)

By Digoal

Many systems retain users' behavior logs, which include information like browsing behaviors, social behaviors, and operation behaviors. Typical application scenarios include SQL database auditing and behavior auditing of bastion hosts within enterprises.

The number of behavior logs and audit logs is related to the business volume and the number of operations. To meet the real-time query requirements of enterprises, it is usually necessary to build search engines, for example, by using the full-text search feature in ES or PostgreSQL.

Using PostgreSQL to build search engines has some advantages and can meet several requirements:

1. Store details. In addition to fields that need indexing, the Details field can also be stored in PostgreSQL.

2. Create an index, that is, create a full-text index in the log behavior field.

3. Create multidimensional indexes. In addition to an index in the log behavior field, other fields can also be indexed, such as indexes for the time dimension and the property dimension. These indexes can be combined to meet multidimensional search requirements.

4. Search for information in real time. Synchronization with search engines is not required.

Example

Testing Machine

Use a large and inexpensive SATA disk and use an SSD as a BCACHE cache.

12 * N TB

Create a directory in each disk.

/disk[1-12]

Operating System Configuration

Skipped

Compile PostgreSQL

wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2          
          
tar -jxvf postgresql-snapshot.tar.bz2          
cd postgresql-10beta1          
          
export USE_NAMED_POSIX_SEMAPHORES=1          
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql10_8k --with-blocksize=8 --with-wal-segsize=1024          
LIBS=-lpthread CFLAGS="-O3" make world -j 128           
LIBS=-lpthread CFLAGS="-O3" make install-world

Environment Variables

vi env_pg10.sh           
          
export PS1="$USER@`/bin/hostname -s`-> "          
export PGPORT=$(($1+1920))          
export PGDATA=/disk$1/digoal/pgdata/pg_root$(($1+1920))          
export LANG=en_US.utf8          
export PGHOME=/home/digoal/pgsql10_8k          
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH          
export LD_RUN_PATH=$LD_LIBRARY_PATH          
export DATE=`date +"%Y%m%d%H%M"`          
export PATH=$PGHOME/bin:$PATH:.          
export MANPATH=$PGHOME/share/man:$MANPATH          
export PGHOST=127.0.0.1          
export PGUSER=postgres          
export PGDATABASE=postgres          
alias rm='rm -i'          
alias ll='ls -lh'          
unalias vi

Initialize Clusters

Create 12 database clusters corresponding to each disk to fully utilize the disk I/O.
vi init.sh

for ((i=1;i<13;i++))          
do          
  . ~/env_pg10.sh $i          
  initdb -D $PGDATA -E SQL_ASCII --locale=C -U postgres          
  echo "local   all             all                                     trust" > $PGDATA/pg_hba.conf          
  echo "host    all             all             127.0.0.1/32            trust" >> $PGDATA/pg_hba.conf          
  echo "host    all             all             ::1/128                 trust" >> $PGDATA/pg_hba.conf          
  echo "host    all             all             0.0.0.0/0               trust" >> $PGDATA/pg_hba.conf          
done          
. ./init.sh

Configure Parameters

postgresql.auto.conf          
          
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
listen_addresses = '0.0.0.0'
max_connections = 400
unix_socket_directories = '.'
shared_buffers = 32GB
maintenance_work_mem = 2GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 500
bgwriter_lru_multiplier = 5.0
bgwriter_flush_after = 0  
max_parallel_workers_per_gather = 0
fsync=on
synchronous_commit = off
full_page_writes = off
wal_buffers = 128MB
wal_writer_delay = 2s
wal_writer_flush_after = 0  
checkpoint_timeout = 10min
max_wal_size = 32GB
min_wal_size = 4GB
checkpoint_completion_target = 0.5
checkpoint_flush_after = 128kB  
max_wal_senders = 0
random_page_cost = 1.5
effective_cache_size = 48GB
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_freeze_max_age = 500000000
autovacuum_multixact_freeze_max_age = 600000000
vacuum_freeze_table_age = 400000000
vacuum_multixact_freeze_table_age = 400000000
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
autovacuum_work_mem = 2GB
autovacuum_max_workers = 6 
autovacuum_naptime = 20s
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.2
autovacuum_vacuum_cost_delay = 0
autovacuum_vacuum_cost_limit = 0
zero_damaged_pages = on  
cp postgresql.auto.conf /disk1/digoal/pgdata/pg_root1921/          
...          
...          
cp postgresql.auto.conf /disk12/digoal/pgdata/pg_root1932/

Start Clusters

Bind database instances to different CPU cores

vi start.sh          
          
for ((i=1;i<13;i++))          
do          
  . /home/digoal/env_pg10.sh $i          
          
  if [ $i -eq 12 ]; then          
  numactl --physcpubind=56-63 pg_ctl start -o "-c port=$PGPORT"          
  else          
  numactl --physcpubind="$((($i-1)*(5)+(1)))"-"$(($i*5))" pg_ctl start -o "-c port=$PGPORT"          
  fi          
          
done          
. ./start.sh

Stop Clusters

vi stop.sh           
          
for ((i=1;i<13;i++))          
do          
  . /home/digoal/env_pg10.sh $i          
          
  pg_ctl stop -m fast          
          
done          
. ./stop.sh

Modeling

Table Structure

Four fields: Storage PK (corresponding to original detail data PKs), Time, User ID, and User Behavior (tsvector field).

Retrieval Requirements

Data can be retrieved by time interval, user ID, and word-breaking criteria.

Retention Period

A time period after which retained logs will be cleaned (for example, one month).

Partition

Create multiple partitioned tables in each cluster (for example, 12 partitioned tables are used in this example).

If practical, we recommend that you create a partitioned table every hour so that you can query data within a specific time interval simply by using the partitions, without having to create time indexes.

If a single user has a very large amount of data, we recommend that you also create hash or LIST partitions by UID in order to query by UID instead of using indexes. (This eliminates the need to create an index in the UID field or even the need to store the UID field itself.)

Indexes

Use full-text indexes in the behavior field.

Use B-tree indexes in the user ID field.

Use BRIN indexes in the time field.

Simulate Data

Time (generated in a time series).

User ID (randomly generated in a specific range)

User behavior data (strings about 512 characters long, split into several tokens, for example, 40 tokens of different lengths in this example).

Initialize Data Tables

vi init.sql          
          
drop table bptest cascade;          
do language plpgsql 
$$
          
declare          
begin          
  for i in 1..12 loop          
    execute 'drop table bptest'||i||' cascade';          
  end loop;          
end;          

$$
;          
          
create table bptest(pk serial8, uid int, ts int8, content tsvector);          
create index idx_bptest_content on bptest using gin (content) with (fastupdate=on,gin_pending_list_limit=2048);          
create index idx_bptest_uid on bptest(uid);          
create index idx_bptest_ts on bptest using brin(ts);          
          
do language plpgsql 
$$
          
declare          
begin          
  for i in 1..12 loop          
    execute 'create table bptest'||i||'(like bptest including all) inherits(bptest)';          
  end loop;          
end;          

$$
;          
          
-- Generate random strings of arbitrary character length          
CREATE OR REPLACE FUNCTION public.gen_rand_str(integer)          
 RETURNS text          
 LANGUAGE sql          
 STRICT          
AS $function$          
  select string_agg(a[random()*6+1],'') from generate_series(1,$1), (select array['a','b','c','d','e','f',' ']) t(a);          
$function$;          
psql -f ./init.sql -p 1921          
...          
...          
psql -f ./init.sql -p 1932

Test

Pour the test data, for example, insert 200 million pieces of data into each table and 2.4 billion pieces of data into each database (around 6 TB). The total amount of inserted data is 28.8 billion pieces (around 72 TB).

Pour 10 pieces of data in each batch.

vi test.sh          
          
for ((i=1;i<13;i++))          
do          
  echo "insert into bptest${i} (uid,ts,content) select random()*100000 , extract(epoch from now()), to_tsvector('english',gen_rand_str(512)) from generate_series(1,10);" > ~/test${i}.sql          
done          
          
for ((i=1;i<13;i++))          
do          
  . /home/digoal/env_pg10.sh ${i}          
          
  for ((x=1;x<13;x++))          
  do          
    nohup pgbench -M prepared -n -r -P 3 -f ./test${x}.sql -c 1 -j 1 -t 200000000 >/tmp/bptest_${i}_${x}.log 2>&1 &          
  done          
done          
          
          
chmod 500 test.sh

The query test data is as follows. The test data is highly random, and the content of each record is about 40 elements, no longer than 512 characters in length.

select * from bptest1 limit 1;          
          
pk      | 1        
uid     | 849185          
ts      | 1494928859          
content | 'aaaefba':14 'acddcfd':39 'acdeeaadbffdbbecceb':50 'aceeedfbaefbdfcbd':59 'adbbeddbecfdcffaeedcedaeeddaeaaeebfbdcdcecfbbebfcebabceffbfdbfbfa':60           
'adcdf':61 'aead':47 'afddf':70 'ba':8 'bae':37 'bbaacffbcafeffafefdf':38 'bbe':55 'bbecfdf':32 'bcbfd':27 'bdce':45 'bdeccbcdeaabefbeeebcdbfddd':19           
'bed':17 'beedeadccbbbecbfcbf':44 'bfccaeddaddbc':2 'cafdfcf':5 'cbcacefaff':3 'cbcfc':52 'cbfef':63 'ccdcbedb':33 'ccdcd':20 'cd':6 'cfecfeeccabf':42           
'cffb':15 'dabdfddeeabfdcefb':16 'dacdeecfbcefebfabeedfabbaccec':57 'daee':1 'daffcdffadddbaffd':68 'dbcddacefcd':9           
'dbdbcbfadfffbdddaaabdcbcecdbecbbdecffbfcfecbbfebfebcadefecfceadaeffd':11 'dcdf':23 'dd':53 'ddec':31 'debdcdebfffebdbfdeefffbcfbccbececdbeaffffedfbefdcccbbccadedecfbeccccbbb':48           
'deefaeeaabdbbdafcfcbeecc':71 'df':26 'dfcbbcd':46 'e':7,51,56 'eafddcaac':43 'ecbaffa':21 'ecdeeceddbdcbfcabdc':10 'ecedcec':41 'ed':66           
'edcbaecfcdfbcbcdedeebdbfceeeececfac':35 'eeca':25 'eeebafeacfebfdbdbddaacabebabbfbfdefeddefccfbeaefdbf':29 'eefdbfcadebcbbfffaefcaecafbddbdbfcf':13           
'ef':58 'efbdc':67 'efccdddaebfbdaffcdfcbfdcbdeb':54 'efccebdddededdeda':64 'effcbfdfdeebfbbcfaabfd':12 'f':24,28,65 'fbbfccfcbcba':30 'fc':4           
'fcbbdbbaefcefefdf':34 'fd':18 'fdffcbe':69 'fea':62 'feeabdcd':36 'feeadcedecedebaedccffbfddadcfececbefddcbeaedbebfadefedcbd':22 'feffceceaeec':49 'ffaffde':40

Launch a full-text retrieval request, enter four query criteria, and return PKs by stream.

We recommend that you use the streaming return interface because the result set may be very large.

select pk from bptest1 where uid=$1 and ts between $2 and $3 and content @@ to_tsquery('english', $4);

Stress testing

./test.sh

Resource Usage

dstat

The CPU is basically exhausted and the disks are substantially occupied after data is written.

User data consumes most of the CPU overhead (you can use perf to check the performance later).

dstat          
          
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--          
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw           
 92   7   1   0   0   1|1075M 2435M|4048B 2297B|   0     0 | 142k  167k          
 92   7   0   0   0   1|1137M 2075M|2391B 1945B|   0     0 | 135k  161k          
 91   8   0   0   0   0|1182M 2125M|3483B 2845B|   0     0 | 140k  166k          
 91   8   0   0   0   1|1193M 1971M|3788B 1633B|   0     0 | 135k  159k          
 91   8   0   0   0   0|1089M 2305M|2232B 1478B|   0     0 | 139k  159k          
 92   7   0   0   0   1| 986M 2795M|2176B 1568B|   0     0 | 127k  142k          
 92   7   1   0   0   0| 760M 2864M|6028B 1408B|   0     0 | 116k  118k          
 90   8   0   0   0   0|1029M 3057M|1565B 2116B|   0     0 | 132k  150k          
 90   9   1   0   0   1|1000M 3237M|2336B 4850B|   0     0 | 133k  154k          
 90   8   1   1   0   1| 659M 4399M|2872B 7992B|   0     0 | 115k  119k          
 91   7   0   0   0   1| 925M 2996M|1293B 1059B|   0     0 | 122k  127k          
 90   8   1   1   0   1| 996M 3350M| 664B  574B|   0     0 | 133k  148k          
 91   7   1   0   0   1| 948M 2927M|3525B 2500B|   0     0 | 132k  146k          
 90   8   0   1   0   0|1114M 2869M|1751B 2645B|   0     0 | 132k  150k          
 90   8   0   1   0   1|1267M 2408M|3003B 2244B|   0     0 | 137k  167k          
 91   8   0   1   0   1|1086M 2539M| 900B  347B|   0     0 | 133k  154k          
 91   8   0   0   0   1| 998M 2614M|1975B 1757B|   0     0 | 130k  151k          
 91   8   0   0   0   0|1120M 2150M|1466B 4911B|   0     0 | 130k  154k          
 92   7   0   0   0   0|1163M 2387M|1356B  498B|   0     0 | 136k  163k          
 90   8   1   1   0   1| 864M 2656M|2601B 3373B|   0     0 | 130k  143k          
 91   8   0   0   0   1| 987M 2651M|2052B  898B|   0     0 | 135k  154k          
 91   8   0   0   0   0|1073M 2205M|2479B 2319B|   0     0 | 130k  144k          
 90   8   1   1   0   1| 951M 2941M|1390B 1001B|   0     0 | 130k  148k

Disk Usage

iostat -x           
          
avg-cpu:  %user   %nice %system %iowait  %steal   %idle          
          85.29    0.49    9.89    1.90    0.00    2.43          
          
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util          
sdb               0.00    92.40   99.40  389.00  2822.40 136435.20   285.13     2.25    4.60   0.65  31.64          
sdc               0.00    33.60  154.20  211.60  4838.40 85700.80   247.51     1.14    3.10   0.63  22.96          
sdd               0.00    63.00  232.40  238.40  7316.80 109648.00   248.44     2.17    4.61   0.99  46.54          
sde               0.00    78.80  269.60  340.80  7980.80 102419.20   180.87     2.53    4.14   0.94  57.62          
sdf               0.00    58.40  283.00  234.20  8204.80 99129.60   207.53     2.30    4.45   0.93  48.20          
sdg               0.00    50.80  207.60  236.60  6652.80 94337.60   227.35     1.42    3.19   0.68  30.34          
sdh               0.00   102.20  109.40  475.20  3489.60 131211.20   230.42     2.60    4.45   0.52  30.40          
sdi               0.00    70.20  107.00  337.00  3228.80 79603.20   186.56     1.35    3.04   0.53  23.38          
sdj               0.00    31.00   70.60  158.80  2534.40 85124.80   382.12     0.82    3.59   0.86  19.72          
sdk               0.20    58.40  190.60  295.80  5587.20 123539.20   265.47     1.74    3.57   0.68  33.28          
sdl               0.00    91.00  162.80  396.40  4441.60 119507.20   221.65     1.98    3.54   0.59  33.26          
sdm               0.00   274.80  103.20  359.20  2296.00 158908.80   348.63     3.81    8.23   1.08  50.06

perf

The postgres process consumes most of the overhead.

top -ag          
          
   PerfTop:    9171 irqs/sec  kernel:63.7%  exact:  0.0% [1000Hz cycles],  (all, 64 CPUs)        
----------------------------------------------------------------------------------------------                           
        
 samples  pcnt function                    DSO        
 _______ _____ ___________________________ _______________________________________        
        
23044.00  4.5% tsCompareString             /home/digoal/pgsql10_8k/bin/postgres         
19821.00  3.9% ExecInterpExpr              /home/digoal/pgsql10_8k/bin/postgres         
12258.00  2.4% gintuple_get_key            /home/digoal/pgsql10_8k/bin/postgres         
12208.00  2.4% pg_detoast_datum_packed     /home/digoal/pgsql10_8k/bin/postgres         
11111.00  2.2% hash_search_with_hash_value /home/digoal/pgsql10_8k/bin/postgres         
10318.00  2.0% memcpy                      /lib64/libc-2.12.so                            
 9078.00  1.8% AllocSetAlloc               /home/digoal/pgsql10_8k/bin/postgres         
 8944.00  1.7% advance_aggregates          /home/digoal/pgsql10_8k/bin/postgres         
 8547.00  1.7% cmpEntryAccumulator         /home/digoal/pgsql10_8k/bin/postgres         
 7311.00  1.4% array_seek                  /home/digoal/pgsql10_8k/bin/postgres         
 6744.00  1.3% gin_cmp_tslexeme            /home/digoal/pgsql10_8k/bin/postgres         
 6650.00  1.3% __closure_wake_up           [bcache]                                     
 6550.00  1.3% appendBinaryStringInfo      /home/digoal/pgsql10_8k/bin/postgres         
 6475.00  1.3% TParserGet                  /home/digoal/pgsql10_8k/bin/postgres         
 5578.00  1.1% ginFindLeafPage             /home/digoal/pgsql10_8k/bin/postgres         
 5543.00  1.1% PyParser_AddToken           /lib64/libpython2.7.so.1.0                  
 5412.00  1.1% array_get_element           /home/digoal/pgsql10_8k/bin/postgres         
 5355.00  1.0% heap_fill_tuple             /home/digoal/pgsql10_8k/bin/postgres         
 4936.00  1.0% entryLocateLeafEntry        /home/digoal/pgsql10_8k/bin/postgres         
 4732.00  0.9% heap_form_minimal_tuple     /home/digoal/pgsql10_8k/bin/postgres         
 4512.00  0.9% rb_insert                   /home/digoal/pgsql10_8k/bin/postgres

top

top -c -u digoal          
          
top - 19:20:47 up 179 days,  5:38,  8 users,  load average: 183.79, 189.01, 166.41          
Tasks: 2939 total, 159 running, 2780 sleeping,   0 stopped,   0 zombie          
Cpu(s): 87.7%us,  8.9%sy,  0.9%ni,  0.9%id,  1.2%wa,  0.0%hi,  0.5%si,  0.0%st          
Mem:  529321828k total, 512395020k used, 16926808k free,   299780k buffers          
Swap:        0k total,        0k used,        0k free, 482162560k cached          
          
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                   
11258 digoal  20   0 20.7g 2.9g 2.9g R 78.7  0.6   5:11.98 postgres: postgres postgres 127.0.0.1(52848) INSERT                                                            
11253 digoal  20   0 20.7g 3.0g 2.9g R 70.7  0.6   5:14.48 postgres: postgres postgres 127.0.0.1(52843) INSERT                                                            
11264 digoal  20   0 20.7g 3.0g 2.9g R 68.1  0.6   5:14.63 postgres: postgres postgres 127.0.0.1(52854) INSERT                                                            
11263 digoal  20   0 20.7g 3.0g 2.9g R 65.2  0.6   5:14.89 postgres: postgres postgres 127.0.0.1(52853) INSERT                                                            
11250 digoal  20   0 20.7g 3.0g 3.0g R 59.4  0.6   5:16.66 postgres: postgres postgres 127.0.0.1(52840) idle                                                              
11268 digoal  20   0 20.7g 2.9g 2.9g R 53.3  0.6   5:11.36 postgres: postgres postgres 127.0.0.1(52858) INSERT                                                            
11266 digoal  20   0 20.7g 3.0g 3.0g R 52.3  0.6   5:09.00 postgres: postgres postgres 127.0.0.1(52856) INSERT                                                            
11068 digoal  20   0 20.7g 2.6g 2.6g R 51.4  0.5   4:44.47 postgres: postgres postgres 127.0.0.1(45823) INSERT                                                            
11097 digoal  20   0 20.7g 2.6g 2.5g R 49.5  0.5   4:47.85 postgres: postgres postgres 127.0.0.1(45839) INSERT                                                            
11161 digoal  20   0 20.7g 2.6g 2.6g R 49.5  0.5   4:47.87 postgres: postgres postgres 127.0.0.1(44881) INSERT                                                            
11256 digoal  20   0 20.7g 3.0g 3.0g R 49.5  0.6   5:14.69 postgres: postgres postgres 127.0.0.1(52846) INSERT                                                            
10819 digoal  20   0 20.7g 2.7g 2.7g R 48.8  0.5   4:58.17 postgres: postgres postgres 127.0.0.1(47342) INSERT                                                            
11107 digoal  20   0 20.7g 2.7g 2.7g R 48.8  0.5   5:02.00 postgres: postgres postgres 127.0.0.1(59612) INSERT                                                            
11255 digoal  20   0 20.7g 3.0g 3.0g R 48.2  0.6   5:15.68 postgres: postgres postgres 127.0.0.1(52845) INSERT                                                            
11267 digoal  20   0 20.7g 3.0g 3.0g R 47.9  0.6   5:18.82 postgres: postgres postgres 127.0.0.1(52857) INSERT                                                            
11066 digoal  20   0 20.7g 2.6g 2.6g R 46.9  0.5   4:44.97 postgres: postgres postgres 127.0.0.1(45821) INSERT                                                            
11222 digoal  20   0 20.7g 2.6g 2.6g R 45.9  0.5   5:00.43 postgres: postgres postgres 127.0.0.1(40379) idle                                                              
11207 digoal  20   0 20.7g 2.6g 2.6g R 45.6  0.5   5:04.59 postgres: postgres postgres 127.0.0.1(46160) INSERT                                                            
11224 digoal  20   0 20.7g 2.6g 2.6g R 45.3  0.5   5:02.60 postgres: postgres postgres 127.0.0.1(40381) INSERT                                                            
11249 digoal  20   0 20.7g 2.6g 2.6g R 45.3  0.5   4:59.58 postgres: postgres postgres 127.0.0.1(46187) INSERT                                                            
11045 digoal  20   0 20.7g 2.6g 2.6g R 44.6  0.5   4:39.75 postgres: postgres postgres 127.0.0.1(64424) idle                                                              
11064 digoal  20   0 20.7g 2.6g 2.6g R 44.6  0.5   4:44.69 postgres: postgres postgres 127.0.0.1(45819) INSERT                                                            
11145 digoal  20   0 20.7g 2.6g 2.6g S 44.3  0.5   4:46.18 postgres: postgres postgres 127.0.0.1(44876) INSERT                                                            
10865 digoal  20   0 20.7g 2.6g 2.6g R 44.0  0.5   4:59.89 postgres: postgres postgres 127.0.0.1(49769) INSERT                                                            
11080 digoal  20   0 20.7g 2.6g 2.5g R 44.0  0.5   4:43.70 postgres: postgres postgres 127.0.0.1(45825) INSERT                                                            
11247 digoal  20   0 20.7g 2.6g 2.6g R 43.4  0.5   5:01.91 postgres: postgres postgres 127.0.0.1(40391) idle                                                              
11163 digoal  20   0 20.7g 2.6g 2.6g R 42.7  0.5   4:48.34 postgres: postgres postgres 127.0.0.1(44882) idle                                                              
11164 digoal  20   0 20.7g 2.6g 2.6g R 42.4  0.5   4:53.21 postgres: postgres postgres 127.0.0.1(44883) INSERT                                                            
10882 digoal  20   0 20.7g 2.6g 2.6g R 41.8  0.5   5:04.78 postgres: postgres postgres 127.0.0.1(49772) INSERT                                                            
10868 digoal  20   0 20.7g 2.6g 2.6g R 41.4  0.5   5:00.30 postgres: postgres postgres 127.0.0.1(49770) INSERT

TPS of Data Writing

The performance of writing data is about 65,000 rows per second on a single machine.

progress: 729.0 s, 55.0 tps, lat 11.610 ms stddev 4.836          
progress: 732.0 s, 59.7 tps, lat 20.071 ms stddev 107.984          
progress: 735.0 s, 57.0 tps, lat 20.492 ms stddev 125.445          
progress: 738.0 s, 38.7 tps, lat 25.891 ms stddev 154.607          
progress: 741.0 s, 41.0 tps, lat 24.405 ms stddev 140.247          
progress: 744.0 s, 43.0 tps, lat 13.550 ms stddev 10.448          
progress: 747.0 s, 60.0 tps, lat 20.691 ms stddev 131.640          
progress: 750.0 s, 60.0 tps, lat 17.394 ms stddev 83.385          
progress: 753.0 s, 44.3 tps, lat 25.510 ms stddev 146.719          
progress: 756.0 s, 25.0 tps, lat 39.819 ms stddev 213.642          
progress: 759.0 s, 50.0 tps, lat 11.439 ms stddev 5.319          
progress: 762.0 s, 60.0 tps, lat 20.979 ms stddev 106.782          
progress: 765.0 s, 60.0 tps, lat 18.778 ms stddev 167.714          
progress: 768.0 s, 58.0 tps, lat 18.017 ms stddev 99.949          
progress: 771.0 s, 51.0 tps, lat 19.636 ms stddev 124.429

The performance of writing data basically depends on the number of elements in the tsvector field and hash. Each record has 40 elements in this example. If the number of elements is reduced by 50%, the performance will be about twice as good.

postgres=# select array_length(tsvector_to_array(content),1) from bptest1 limit 10;        
 array_length         
--------------        
           40        
           37        
           40        
           45        
           35        
           42        
           38        
           46        
           30        
           40        
(10 rows)

Evaluate How Many Index Entries Are Created per Second

1. Full-text search index entries

Each record has about 40 elements. Around 2.6 million index entries will be created per second in the event of 65,000 TPS.

65,000*40 = 2,600,000

2. The number of UID index entries is negligible.

3. TS index entries use a BRIN index and are negligible.

Creating full-text index entries has the most significant impact on the performance and consumes the most resources.

Query Performance

Example

postgres=# \dt+ bptest1      
                      List of relations      
 Schema |  Name   | Type  |  Owner   |  Size   | Description       
--------+---------+-------+----------+---------+-------------      
 public | bptest1 | table | postgres | 1689 MB |       
(1 row)      
      
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from bptest1 where ts between 1494999617 and 1495999617 and content @@ to_tsquery ('english','abc');      
                                                              QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------      
 Bitmap Heap Scan on public.bptest1  (cost=175.95..23691.41 rows=20015 width=811) (actual time=7.017..23.376 rows=19755 loops=1)      
   Output: uid, ts, content      
   Recheck Cond: (bptest1.content @@ '''abc'''::tsquery)      
   Filter: ((bptest1.ts >= 1494999617) AND (bptest1.ts <= 1495999617))      
   Heap Blocks: exact=18933      
   Buffers: shared hit=18948      
   ->  Bitmap Index Scan on bptest1_content_idx  (cost=0.00..170.94 rows=20019 width=0) (actual time=3.811..3.811 rows=19755 loops=1)      
         Index Cond: (bptest1.content @@ '''abc'''::tsquery)      
         Buffers: shared hit=15      
 Planning time: 0.097 ms      
 Execution time: 24.517 ms      
(11 rows)       
      
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from bptest1 where ts between 1494999617 and 1495999617 and content @@ to_tsquery ('english','abc & bc');      
                                                            QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------      
 Bitmap Heap Scan on public.bptest1  (cost=36.27..2598.42 rows=1996 width=811) (actual time=4.577..6.711 rows=2125 loops=1)      
   Output: uid, ts, content      
   Recheck Cond: (bptest1.content @@ '''abc'' & ''bc'''::tsquery)      
   Filter: ((bptest1.ts >= 1494999617) AND (bptest1.ts <= 1495999617))      
   Heap Blocks: exact=2116      
   Buffers: shared hit=2239      
   ->  Bitmap Index Scan on bptest1_content_idx  (cost=0.00..35.77 rows=1997 width=0) (actual time=4.291..4.291 rows=2125 loops=1)      
         Index Cond: (bptest1.content @@ '''abc'' & ''bc'''::tsquery)      
         Buffers: shared hit=123      
 Planning time: 0.125 ms      
 Execution time: 6.849 ms      
(11 rows)

Performance of Data Writing to SSD with fsync=on

Data writing testing per second

Up to 70,000 records are written per second, and around 2.8 million full-text index entries are created per second.

The performance is relatively smooth.

Performance of Data Writing to SATA and SSD Bcache with fsync=off

Data writing testing per second

Up to 75,000 records are written per second, and around 3 million full-text index entries are created per second.

The performance is relatively smooth.

Summary

1. Query aggregation

We recommend that you use plproxy to aggregate queries because log data is distributed among multiple clusters and tables.

2. Write to shards

Writing to shards can be completed in the business layer to write data in a scrambled manner.

In actual application scenarios, you can create more partitions as needed.

3. The majority of the overhead is incurred on Postgres. To perform detailed analysis, re-compile Postgres

4. GIN index optimization

https://www.postgresql.org/docs/9.6/static/sql-createindex.html

GIN indexes accept different parameters:          
          
1. fastupdate          
          
This setting controls usage of the fast update technique described in Section 63.4.1.           
It is a Boolean parameter: ON enables fast update, OFF disables it.           
(Alternative spellings of ON and OFF are allowed as described in Section 19.1.) The default is ON.          
          
Note: Turning fastupdate off via ALTER INDEX prevents future insertions from going into the list of pending index entries,           
but does not in itself flush previous entries.           
          
You might want to VACUUM the table or call gin_clean_pending_list function afterward to ensure the pending list is emptied.          
          
2. gin_pending_list_limit          
          
Custom gin_pending_list_limit parameter.           
This value is specified in kilobytes.

The purpose of gin_pending_list_limit is to delay the merging action, because a record may involve too many GIN keys. If updated in real time, a large number of GIN indexes need to be written, influencing the performance.

In this example, gin_pending_list_limit is set to 2 MB, and the TPS is relatively stable. If a larger value is set for gin_pending_list_limit, significant fluctuation will occur in the event of insufficient CPU resources.

You can set a proper gin_pending_list_limit value based on specific test requirements.

5. If PostgreSQL is used entirely as an index database and data loss can be tolerated, you can set the fsync switch to off. (fsync on checkpoints has big impact on the I/O performance. Because a SATA disk is used in this example, significant performance fluctuation will occur if fsync is on.)

postgresql.auto.conf      
      
fsync = off      
zero_damaged_pages = on

The risk of data loss is much lower if databases are designed to have high availability. (However, after a server crash, rebuilding a standby database is not easy due to the massive amounts of data.)

We recommend that you use more database instances and make the size of each instance controllable (for example, smaller than 2 TB), so that the time for rebuilding a standby database is relatively manageable.

6. In order to achieve better response time (RT), we recommend that you store detail data and indexes separately, because writing detail data requires short RT while indexes can tolerate some latency. In addition, the reliability requirements for detail data and indexes are different.

Community

Behavior and Audit Log Modeling: PostgreSQL Best Practice (1)

Example

Testing Machine

Operating System Configuration

Compile PostgreSQL

Environment Variables

Initialize Clusters

Configure Parameters

Start Clusters

Stop Clusters

Modeling

Table Structure

Retrieval Requirements

Retention Period

Partition

Indexes

Simulate Data

Initialize Data Tables

Test

Resource Usage

dstat

Disk Usage

perf

top

TPS of Data Writing

Evaluate How Many Index Entries Are Created per Second

Query Performance

Performance of Data Writing to SSD with fsync=on

Performance of Data Writing to SATA and SSD Bcache with fsync=off

Summary

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

ApsaraDB RDS for PostgreSQL

Simple Log Service

PolarDB for PostgreSQL

ApsaraDB RDS for MySQL