×
Community Blog Open Source PolarDB Uses duckdb_fdw to Support Parquet and Efficient OLAP

Open Source PolarDB Uses duckdb_fdw to Support Parquet and Efficient OLAP

This article describes how open source PolarDB uses duckdb_fdw to support parquet, a columnar storage file format, and efficient OLAP.

By digoal

Background

PolarDB's is a cloud-native database that features a computing-storage separation architecture. It offers affordable data storage, efficient scalability, high-speed parallel computing capabilities, and fast data search and processing. By combining PolarDB with computing algorithms, it enables the conversion of data into productivity and drives business data output.

This article discusses how the open-source version of PolarDB utilizes duckdb_fdw to support the parquet columnar storage file and efficient OLAP (Online Analytical Processing).

The test environment used in this article is macos + docker. For more information on deploying PolarDB, please see:

duckdb_fdw for PolarDB

1.  Deployment

A higher version of CMake is required.

https://cmake.org/download  
  
wget https://github.com/Kitware/CMake/releases/download/v3.25.1/cmake-3.25.1.tar.gz  
tar -zxvf cmake-3.25.1.tar.gz  
cd cmake-3.25.1  
./configure  
make -j 4  
make install  
  
  
[postgres@67e1eed1b4b6 duckdb]$ export PATH=/usr/local/bin:$PATH  
  
  
[postgres@67e1eed1b4b6 duckdb]$ which cmake  
/usr/local/bin/cmake  
git clone --depth 1 https://github.com/cwida/duckdb  
cd duckdb  
make -j 4  
cd ~/duckdb  
  
cp build/release/tools/sqlite3_api_wrapper/libsqlite3_api_wrapper.so ~/tmp_basedir_polardb_pg_1100_bld/lib/    
cp tools/sqlite3_api_wrapper/include/sqlite3.h ~/tmp_basedir_polardb_pg_1100_bld/include/server/    
cp build/release/src/libduckdb.so ~/tmp_basedir_polardb_pg_1100_bld/lib/    
cd ~    
git clone --depth 1 https://github.com/alitrack/duckdb_fdw    
  
cd duckdb_fdw    
USE_PGXS=1 make    
USE_PGXS=1 make install    
[postgres@67e1eed1b4b6 duckdb_fdw]$ USE_PGXS=1 make    
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation   -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o connection.o connection.c  
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation   -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o option.o option.c  
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation   -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o deparse.o deparse.c  
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation   -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o sqlite_query.o sqlite_query.c  
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation   -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o duckdb_fdw.o duckdb_fdw.c  
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation   -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -shared -o duckdb_fdw.so connection.o option.o deparse.o sqlite_query.o duckdb_fdw.o -L/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib  -Wl,-rpath,'$ORIGIN/../lib' -L/opt/rh/llvm-toolset-7.0/root/usr/lib64  -Wl,--as-needed -Wl,-rpath,'/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib',--enable-new-dtags  -lsqlite3_api_wrapper   
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2  -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2  -flto=thin -emit-llvm -c -o connection.bc connection.c  
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2  -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2  -flto=thin -emit-llvm -c -o option.bc option.c  
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2  -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2  -flto=thin -emit-llvm -c -o deparse.bc deparse.c  
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2  -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2  -flto=thin -emit-llvm -c -o sqlite_query.bc sqlite_query.c  
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2  -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2  -flto=thin -emit-llvm -c -o duckdb_fdw.bc duckdb_fdw.c  
  
  
[postgres@67e1eed1b4b6 duckdb_fdw]$ USE_PGXS=1 make  install  
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib'  
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension'  
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension'  
/usr/bin/install -c -m 755  duckdb_fdw.so '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/duckdb_fdw.so'  
/usr/bin/install -c -m 644 .//duckdb_fdw.control '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension/'  
/usr/bin/install -c -m 644 .//duckdb_fdw--1.0.sql .//duckdb_fdw--1.0--1.1.sql  '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension/'  
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode/duckdb_fdw'  
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/  
/usr/bin/install -c -m 644 connection.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./  
/usr/bin/install -c -m 644 option.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./  
/usr/bin/install -c -m 644 deparse.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./  
/usr/bin/install -c -m 644 sqlite_query.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./  
/usr/bin/install -c -m 644 duckdb_fdw.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./  
cd '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode' && /opt/rh/llvm-toolset-7.0/root/usr/bin/llvm-lto -thinlto -thinlto-action=thinlink -o duckdb_fdw.index.bc duckdb_fdw/connection.bc duckdb_fdw/option.bc duckdb_fdw/deparse.bc duckdb_fdw/sqlite_query.bc duckdb_fdw/duckdb_fdw.bc  

2.  Load the plug-in

create extension duckdb_fdw;   

3.  Test

cd ~/duckdb/build/release  
./duckdb /home/postgres/db    
  
COPY (select generate_series as id, md5(random()::text) as info, now()::timestamp+(generate_series||' second')::interval as crt_time from generate_series(1,100)) TO '/home/postgres/t1.parquet' (FORMAT 'PARQUET');    
  
COPY (select generate_series as cid, md5(random()::text) as info, now()::timestamp+(generate_series||' second')::interval as crt_time from generate_series(1,100)) TO '/home/postgres/t2.parquet' (FORMAT 'PARQUET');    
  
COPY (select (floor(random()*100)+1)::int as gid, (floor(random()*100)+1)::int as cid, (random()*10)::int as c1, (random()*100)::int as c2, (random()*1000)::int as c3, (random()*10000)::int as c4, (random()*100000)::int as c5 from generate_series(1,1000000)) TO '/home/postgres/t3.parquet' (FORMAT 'PARQUET');    
    
    
create view t1 as select * from read_parquet('/home/postgres/t1.parquet');    
create view t2 as select * from read_parquet('/home/postgres/t2.parquet');    
create view t3 as select * from read_parquet('/home/postgres/t3.parquet');    
    
checkpoint;

4.  Check the performance

psql  
  
  
CREATE SERVER DuckDB_server FOREIGN DATA WRAPPER duckdb_fdw OPTIONS (database '/home/postgres/db');    
    
IMPORT FOREIGN SCHEMA public FROM SERVER DuckDB_server INTO public;    
    
postgres=# \det    
     List of foreign tables    
 Schema | Table |    Server         
--------+-------+---------------    
 public | t1    | duckdb_server    
 public | t2    | duckdb_server    
 public | t3    | duckdb_server    
(3 rows)    
    
postgres=# explain verbose select count(distinct gid) from t3;    
                         QUERY PLAN                              
-------------------------------------------------------------    
 Foreign Scan  (cost=1.00..1.00 rows=1 width=8)    
   Output: (count(DISTINCT gid))    
   SQLite query: SELECT count(DISTINCT "gid") FROM main."t3"    
(3 rows)    
    
postgres=# \timing    
Timing is on.    
  
select count(distinct gid) from t3;    
  
select count(distinct gid),count(*) from t3;    
  
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid;    
  
  
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid;    
  
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid join t2 on t3.cid=t2.cid;    
  
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid join t2 on t3.cid=t2.cid;    

5.  Import the data to the local PolarDB database and do the same query to check the time.

PostgreSQL local computing

postgres=# create unlogged table lt1 as select * from t1;    
SELECT 100    
postgres=# create unlogged table lt2 as select * from t2;    
SELECT 100    
postgres=# create unlogged table lt3 as select * from t3;    
SELECT 10000000    
postgres=# \timing    
Timing is on.    
postgres=# select count(distinct gid) from lt3;    
 count     
-------    
   100    
(1 row)    
    
Time: 14891.356 ms (00:14.891)      

References

https://github.com/alitrack/duckdb_fdw

0 0 0
Share on

digoal

276 posts | 24 followers

You may also like

Comments