Use PL/Proxy to implement horizontal sharding and remote database access - ApsaraDB RDS

PL/Proxy is a PostgreSQL extension that routes SQL calls from a proxy node to remote databases. On ApsaraDB RDS for PostgreSQL, use it in CLUSTER mode to distribute writes across shards or replicate data across nodes, or in CONNECT mode to route queries to a specific database.

Prerequisites

Before you begin, ensure that you have:

A major engine version that supports PL/Proxy — see Extensions supported by ApsaraDB RDS for PostgreSQL
An up-to-date minor engine version if PL/Proxy is not available despite the major version being supported — see Update the minor engine version

How it works

PL/Proxy runs on a dedicated proxy node. When a client calls a PL/Proxy function, PL/Proxy intercepts the call and forwards it to one or more data nodes based on routing rules embedded in the function body.

CLUSTER mode groups data nodes into a named cluster using the SQL/MED CREATE SERVER statement. Each proxy function specifies a RUN ON clause that controls how the call is routed:

Clause	Routing behavior
`RUN ON ANY`	Sends the call to a randomly selected data node
`RUN ON ALL`	Broadcasts the call to all data nodes (use for SQL replication or cross-shard reads)

CONNECT mode skips cluster configuration. Each function embeds a connection string directly and routes the query to a fixed target database.

Usage notes

RDS instances in the same virtual private cloud (VPC) can access each other directly.
To route queries across RDS instances in different VPCs, place an Elastic Compute Service (ECS) instance in the same VPC as an intermediary proxy.
The number of data nodes in a CLUSTER must be a power of 2 (2, 4, 8, and so on).

Install the extension

Connect to the proxy node database and run:

CREATE EXTENSION plproxy;

Set up a CLUSTER (CLUSTER mode only)

Skip this section if you are using CONNECT mode.

The cluster is defined with CREATE SERVER. Each data node is registered as an option (p0, p1, and so on). The number of options must be a power of 2.

Create the cluster server and register your data nodes:

CREATE SERVER cluster_srv1 FOREIGN DATA WRAPPER plproxy
OPTIONS (
    connection_lifetime '1800',
    disable_binary '1',
    p0 'dbname=pl_db0 host=100.xxx.xxx.72 port=5678',
    p1 'dbname=pl_db1 host=11.xxx.xxx.9 port=5678'
);

Grant access to the cluster:

GRANT USAGE ON FOREIGN SERVER cluster_srv1 TO postgres;

Create a user mapping so PL/Proxy can authenticate to the data nodes:

CREATE USER MAPPING FOR postgres SERVER cluster_srv1 OPTIONS (user 'postgres');

Test setup

The following sections use three RDS instances to demonstrate both modes.

IP	Role	Database	Username
100.xx.xx.136	Proxy node	postgres	postgres
100.xx.xx.72	Data node	pl_db0	postgres
11.xx.xx.9	Data node	pl_db1	postgres

Create a users table on each data node:

CREATE TABLE users (userid int, name text);

Test CLUSTER mode

Horizontal sharding

RUN ON ANY randomly routes each INSERT to one of the data nodes, distributing writes across the cluster.

Create the insert_user function on each data node:

CREATE OR REPLACE FUNCTION insert_user(i_id int, i_name text)
RETURNS integer AS $$
    INSERT INTO users (userid, name) VALUES ($1, $2);
    SELECT 1;
$$ LANGUAGE SQL;

Create the proxy function on the proxy node. PL/Proxy matches calls by function name, so the proxy function and the data node function must share the same name.
```
CREATE OR REPLACE FUNCTION insert_user(i_id int, i_name text)
RETURNS integer AS $$
    CLUSTER 'cluster_srv1';
    RUN ON ANY;
$$ LANGUAGE plproxy;
```

Create a read function on the proxy node. RUN ON ALL queries every data node and merges the results.

CREATE OR REPLACE FUNCTION get_user_name()
RETURNS TABLE(userid int, name text) AS $$
    CLUSTER 'cluster_srv1';
    RUN ON ALL;
    SELECT userid, name FROM users;
$$ LANGUAGE plproxy;

Insert 10 records from the proxy node:

SELECT insert_user(1001, 'Sven');
SELECT insert_user(1002, 'Marko');
SELECT insert_user(1003, 'Steve');
SELECT insert_user(1004, 'lottu');
SELECT insert_user(1005, 'rax');
SELECT insert_user(1006, 'ak');
SELECT insert_user(1007, 'jack');
SELECT insert_user(1008, 'molica');
SELECT insert_user(1009, 'pg');
SELECT insert_user(1010, 'oracle');

Query each data node to see how the data was distributed:

RUN ON ANY routes writes randomly, so the distribution is uneven.

-- On pl_db0
SELECT * FROM users;
 userid |  name
--------+--------
   1001 | Sven
   1003 | Steve
   1004 | lottu
   1005 | rax
   1006 | ak
   1007 | jack
   1008 | molica
   1009 | pg
(8 rows)

-- On pl_db1
SELECT * FROM users;
 userid |  name
--------+--------
   1002 | Marko
   1010 | oracle
(2 rows)

Read all rows back through the proxy node:

SELECT userid, name FROM get_user_name();
 userid |  name
--------+--------
   1001 | Sven
   1003 | Steve
   1004 | lottu
   1005 | rax
   1006 | ak
   1007 | jack
   1008 | molica
   1009 | pg
   1002 | Marko
   1010 | oracle
(10 rows)

SQL replication

RUN ON ALL broadcasts each write to every data node, keeping all nodes in sync.

Create a trunc_user function on each data node and on the proxy node:

-- On pl_db0 and pl_db1
CREATE OR REPLACE FUNCTION trunc_user()
RETURNS integer AS $$
    TRUNCATE TABLE users;
    SELECT 1;
$$ LANGUAGE SQL;

-- On the proxy node
CREATE OR REPLACE FUNCTION trunc_user()
RETURNS SETOF integer AS $$
    CLUSTER 'cluster_srv1';
    RUN ON ALL;
$$ LANGUAGE plproxy;

Truncate the table across all nodes:

SELECT trunc_user();
 trunc_user
------------
          1
          1
(2 rows)

Create the replication insert function on the proxy node. TARGET insert_user tells PL/Proxy to call the insert_user function (already defined on the data nodes) on every node:

CREATE OR REPLACE FUNCTION insert_user_2(i_id int, i_name text)
RETURNS SETOF integer AS $$
    CLUSTER 'cluster_srv1';
    RUN ON ALL;
    TARGET insert_user;
$$ LANGUAGE plproxy;

Insert 4 records:

SELECT insert_user_2(1004, 'lottu');
SELECT insert_user_2(1005, 'rax');
SELECT insert_user_2(1006, 'ak');
SELECT insert_user_2(1007, 'jack');

Verify that both data nodes contain identical data:

-- On pl_db0
SELECT * FROM users;
 userid | name
--------+-------
   1004 | lottu
   1005 | rax
   1006 | ak
   1007 | jack
(4 rows)

-- On pl_db1
SELECT * FROM users;
 userid | name
--------+-------
   1004 | lottu
   1005 | rax
   1006 | ak
   1007 | jack
(4 rows)

Read through the proxy node using RUN ON ANY — since all nodes hold identical data, any node returns the full result:

CREATE OR REPLACE FUNCTION get_user_name_2()
RETURNS TABLE(userid int, name text) AS $$
    CLUSTER 'cluster_srv1';
    RUN ON ANY;
    SELECT userid, name FROM users;
$$ LANGUAGE plproxy;

SELECT userid, name FROM get_user_name_2();
 userid | name
--------+-------
   1004 | lottu
   1005 | rax
   1006 | ak
   1007 | jack
(4 rows)

Test CONNECT mode

CONNECT mode routes a query to a single specified database without requiring cluster setup. Embed the connection string directly in the function.

CREATE OR REPLACE FUNCTION get_user_name_3()
RETURNS TABLE(userid int, name text) AS $$
    CONNECT 'dbname=pl_db0 host=100.81.137.72 port=56789';
    SELECT userid, name FROM users;
$$ LANGUAGE plproxy;

SELECT userid, name FROM get_user_name_3();
 userid | name
--------+-------
   1004 | lottu
   1005 | rax
   1006 | ak
   1007 | jack
(4 rows)

Choose a mode

	CLUSTER mode	CONNECT mode
Use when	You need horizontal sharding or want to replicate writes across nodes	You need to query a specific remote database by connection string
Cluster setup required	Yes (`CREATE SERVER`, user mapping)	No
Data node count constraint	Must be a power of 2	None
Routing	`RUN ON ANY` (random) or `RUN ON ALL` (broadcast)	Fixed connection string per function
Typical use case	Distributing large datasets across multiple RDS instances	Cross-instance queries or ad-hoc federation

What's next

For the full PL/Proxy language reference, see the PL/Proxy tutorial.
To check which other extensions are available for your instance, see Extensions supported by ApsaraDB RDS for PostgreSQL.