ShardingSphere is an open source distributed database middleware ecosystem. It sits between your application and ApsaraDB RDS for PostgreSQL instances, routing queries and transparently applying sharding rules without requiring changes to your application code.
All PostgreSQL versions supported by ApsaraDB RDS work with ShardingSphere.
Choose a ShardingSphere product
ShardingSphere ships three independent products. The right choice depends on your deployment model and language stack:
-
Sharding-Proxy: a transparent database proxy that supports any client using the PostgreSQL or MySQL protocol. Choose Proxy if your application uses a non-Java language, or if you need centralized operations and maintenance.
-
Sharding-JDBC: a lightweight Java framework that runs inside your application. Choose JDBC for Java applications where you want to avoid an additional network hop.
-
Sharding-Sidecar: designed for cloud-native deployments as a sidecar container alongside each service instance.
| Attribute | Sharding-JDBC | Sharding-Proxy | Sharding-Sidecar |
|---|---|---|---|
| Supported database engine | All JDBC-compatible engines (MySQL, PostgreSQL, Oracle, SQL Server) | MySQL and PostgreSQL | MySQL and PostgreSQL |
| Connections consumed | High | Low | High |
| Supported language | Java only | Any language | Any language |
| Performance impact | Low | Moderate | Low |
| Centerless | Yes | No | Yes |
| Stateless API | No | Yes | No |
The steps in this topic use Sharding-Proxy (version apache-shardingsphere-incubating-4.0.0), which is the most common choice for integrating with ApsaraDB RDS for PostgreSQL.
Prerequisites
Before you begin, ensure that you have:
-
An Elastic Compute Service (ECS) instance with internet access
-
An ApsaraDB RDS for PostgreSQL instance (all PostgreSQL versions supported by ApsaraDB RDS work with ShardingSphere; the example in this topic uses PostgreSQL 12)
-
ShardingSphere Proxy (
apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin) downloaded and extracted on the ECS instance
To create an ApsaraDB RDS for PostgreSQL instance, see Create an ApsaraDB RDS for PostgreSQL instance.
Understand the configuration files
Sharding-Proxy stores all configuration in the conf/ directory. Each file controls a distinct concern:
| File | Purpose |
|---|---|
config-sharding.yaml |
Data sharding rules and data source connections |
config-master_slave.yaml |
Read/write splitting rules |
config-encrypt.yaml |
Data encryption rules |
server.yaml |
Global proxy settings: authentication, thread pool, and feature flags |
To view the files on your ECS instance, run:
cd apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/conf
ll
Expected output:
total 24
-rw-r--r-- 1 501 games 3019 Jul 30 2019 config-encrypt.yaml
-rw-r--r-- 1 501 games 3582 Apr 22 2019 config-master_slave.yaml
-rw-r--r-- 1 501 games 4278 Apr 22 2019 config-sharding.yaml
-rw-r--r-- 1 501 games 1918 Jul 30 2019 server.yaml
config-sharding.yaml structure
schemaName: # Name of the logical data source (used as the database name when connecting)
dataSources: # One entry per physical database; no connection pool configuration needed (unlike Sharding-JDBC)
<data_source_name>:
url: # JDBC connection URL for the ApsaraDB RDS instance
username: # Database account username
password: # Database account password
connectionTimeoutMilliseconds: 30000 # Connection timeout
idleTimeoutMilliseconds: 60000 # Idle connection reclaim timeout
maxLifetimeMilliseconds: 1800000 # Maximum connection lifetime
maxPoolSize: 65 # Maximum connections in the pool
shardingRule: # Sharding rules; same structure as Sharding-JDBC
server.yaml structure
Proxy settings control thread pool size, transaction type, and diagnostic features:
props:
acceptor.size: # Worker threads for accepting client connections (default: CPU cores x 2)
proxy.transaction.type: # Transaction type: LOCAL (default), XA (uses Atomikos), or BASE
# Note: to use BASE, copy the .jar that implements ShardingTransactionManager to the lib directory
proxy.opentracing.enabled: # Enable link tracing (default: false)
check.table.metadata.enabled: # Check shard table metadata consistency at startup (default: false)
proxy.frontend.flush.threshold: # Batch packet size for complex queries
Authentication controls which users can log in and which logical databases they can access:
authentication:
users:
<username>:
password: <password>
authorizedSchemas: <db1>,<db2> # Omit to grant access to all schemas
For the full configuration reference, see the .
Set up the test environment
This section walks through a complete horizontal sharding example using four PostgreSQL databases and two sharded tables.
Step 1: Install Java on the ECS instance
sudo yum install -y java
Step 2: Configure the ApsaraDB RDS instance
-
Create a database account with username
r1and passwordPW123321!. -
Create four databases owned by
r1:db0,db1,db2, anddb3. -
Add the ECS instance's IP address to the ApsaraDB RDS instance's IP address whitelist.
For details, see Create a database and an account on an ApsaraDB RDS for PostgreSQL instance and Configure an IP address whitelist for an ApsaraDB RDS for PostgreSQL instance.
Step 3: Configure server.yaml
vi /home/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/conf/server.yaml
Set the following:
authentication:
users:
r1:
password: PW123321!
authorizedSchemas: db0,db1,db2,db3
props:
executor.size: 16
sql.show: false
Test horizontal sharding
Step 1: Configure config-sharding.yaml
vi /home/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/conf/config-sharding.yaml
The following configuration maps four physical databases (db0-db3) to a single logical data source named sdb. It shards t_order across 4 databases x 8 tables (32 physical shards total) using user_id for database routing and order_id for table routing.
schemaName: sdb
dataSources:
db0:
url: jdbc:postgresql://pgm-bpxxxxx.pg.rds.aliyuncs.com:1433/db0
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
db1:
url: jdbc:postgresql://pgm-bpxxxxx.pg.rds.aliyuncs.com:1433/db1
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
db2:
url: jdbc:postgresql://pgm-bpxxxxx.pg.rds.aliyuncs.com:1433/db2
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
db3:
url: jdbc:postgresql://pgm-bpxxxxx.pg.rds.aliyuncs.com:1433/db3
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
shardingRule:
tables:
t_order:
actualDataNodes: db${0..3}.t_order${0..7} # 4 databases x 8 tables = 32 shards
databaseStrategy:
inline:
shardingColumn: user_id
algorithmExpression: db${user_id % 4} # Routes to db0-db3 based on user_id
tableStrategy:
inline:
shardingColumn: order_id
algorithmExpression: t_order${order_id % 8} # Routes to t_order0-t_order7 based on order_id
keyGenerator:
type: SNOWFLAKE
column: order_id
t_order_item:
actualDataNodes: db${0..3}.t_order_item${0..7}
databaseStrategy:
inline:
shardingColumn: user_id
algorithmExpression: db${user_id % 4}
tableStrategy:
inline:
shardingColumn: order_id
algorithmExpression: t_order_item${order_id % 8}
keyGenerator:
type: SNOWFLAKE
column: order_item_id
bindingTables:
- t_order,t_order_item # Declares these tables share the same sharding strategy to avoid cross-shard joins
defaultTableStrategy:
none:
Step 2: Start Sharding-Proxy
Start the proxy and listen on port 8001:
cd ~/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/bin/
./start.sh 8001
Step 3: Connect and create tables
Connect to the logical data source sdb as user r1:
psql -h 127.0.0.1 -p 8001 -U r1 sdb
Create the sharded tables:
CREATE TABLE t_order (
order_id int8 PRIMARY KEY,
user_id int8,
info text,
c1 int,
crt_time timestamp
);
CREATE TABLE t_order_item (
order_item_id int8 PRIMARY KEY,
order_id int8,
user_id int8,
info text,
c1 int, c2 int, c3 int, c4 int, c5 int,
crt_time timestamp
);
CREATE TABLE, Sharding-Proxy automatically creates the physical shards across all databases according to the actualDataNodes expression in config-sharding.yaml.Write and query data
Insert rows across multiple shards:
INSERT INTO t_order (user_id, info, c1, crt_time) VALUES (0, 'a', 1, now());
INSERT INTO t_order (user_id, info, c1, crt_time) VALUES (1, 'b', 2, now());
INSERT INTO t_order (user_id, info, c1, crt_time) VALUES (2, 'c', 3, now());
INSERT INTO t_order (user_id, info, c1, crt_time) VALUES (3, 'c', 4, now());
Query all rows (Sharding-Proxy merges results from all shards transparently):
SELECT * FROM t_order;
Expected output:
order_id | user_id | info | c1 | crt_time
--------------------+---------+------+----+----------------------------
433352561047633921 | 0 | a | 1 | 2020-02-09 19:48:21.856555
433352585668198400 | 1 | b | 2 | 2020-02-09 19:48:27.726815
433352610813050881 | 2 | c | 3 | 2020-02-09 19:48:33.721754
433352628370407424 | 3 | c | 4 | 2020-02-09 19:48:37.907683
(4 rows)
Query with a filter on the sharding column (routes to a single database shard):
SELECT * FROM t_order WHERE user_id = 1;
Expected output:
order_id | user_id | info | c1 | crt_time
--------------------+---------+------+----+----------------------------
433352585668198400 | 1 | b | 2 | 2020-02-09 19:48:27.726815
(1 row)
Enable SQL logging
To inspect the SQL parsing and routing statements that Sharding-Proxy generates, set sql.show: true in server.yaml:
authentication:
users:
r1:
password: PW123321!
authorizedSchemas: db0,db1,db2,db3
props:
executor.size: 16
sql.show: true # Log parsed and routed SQL statements
Logs are written to:
/home/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/logs/stdout.log
Run stress tests with pgbench
Create a test script:
vi test.sql
Add the following:
\set user_id random(1,100000000)
\set order_id random(1,2000000000)
\set order_item_id random(1,2000000000)
INSERT INTO t_order (user_id, order_id, info, c1, crt_time)
VALUES (:user_id, :order_id, random()::text, random()*1000, now())
ON CONFLICT (order_id) DO UPDATE SET info = excluded.info, c1 = excluded.c1, crt_time = excluded.crt_time;
INSERT INTO t_order_item (order_item_id, user_id, order_id, info, c1, c2, c3, c4, c5, crt_time)
VALUES (:order_item_id, :user_id, :order_id, random()::text, random()*1000, random()*1000, random()*1000, random()*1000, random()*1000, now())
ON CONFLICT (order_item_id) DO NOTHING;
Run the benchmark with 24 concurrent clients for 120 seconds:
pgbench -M simple -n -r -P 1 -f ./test.sql -c 24 -j 24 -h 127.0.0.1 -p 8001 -U r1 sdb -T 120
Sample output:
progress: 1.0 s, 1100.9 tps, lat 21.266 ms stddev 6.349
progress: 2.0 s, 1253.0 tps, lat 18.779 ms stddev 7.913
progress: 3.0 s, 1219.0 tps, lat 20.083 ms stddev 13.212
What's next
-
ShardingSphere documentation - full reference for sharding strategies, read/write splitting, distributed transactions, and data encryption
-
Create a database and an account on an ApsaraDB RDS for PostgreSQL instance
-
Configure an IP address whitelist for an ApsaraDB RDS for PostgreSQL instance