All Products
Search
Document Center

ApsaraDB RDS:Horizontal splitting (PL/Proxy)

Last Updated:Mar 28, 2026

PL/Proxy is a PostgreSQL extension that routes SQL calls from a proxy node to remote databases. On ApsaraDB RDS for PostgreSQL, use it in CLUSTER mode to distribute writes across shards or replicate data across nodes, or in CONNECT mode to route queries to a specific database.

Prerequisites

Before you begin, ensure that you have:

How it works

PL/Proxy runs on a dedicated proxy node. When a client calls a PL/Proxy function, PL/Proxy intercepts the call and forwards it to one or more data nodes based on routing rules embedded in the function body.

CLUSTER mode groups data nodes into a named cluster using the SQL/MED CREATE SERVER statement. Each proxy function specifies a RUN ON clause that controls how the call is routed:

ClauseRouting behavior
RUN ON ANYSends the call to a randomly selected data node
RUN ON ALLBroadcasts the call to all data nodes (use for SQL replication or cross-shard reads)

CONNECT mode skips cluster configuration. Each function embeds a connection string directly and routes the query to a fixed target database.

Usage notes

  • RDS instances in the same virtual private cloud (VPC) can access each other directly.

  • To route queries across RDS instances in different VPCs, place an Elastic Compute Service (ECS) instance in the same VPC as an intermediary proxy.

  • The number of data nodes in a CLUSTER must be a power of 2 (2, 4, 8, and so on).

Install the extension

Connect to the proxy node database and run:

CREATE EXTENSION plproxy;

Set up a CLUSTER (CLUSTER mode only)

Skip this section if you are using CONNECT mode.

The cluster is defined with CREATE SERVER. Each data node is registered as an option (p0, p1, and so on). The number of options must be a power of 2.

  1. Create the cluster server and register your data nodes:

    CREATE SERVER cluster_srv1 FOREIGN DATA WRAPPER plproxy
    OPTIONS (
        connection_lifetime '1800',
        disable_binary '1',
        p0 'dbname=pl_db0 host=100.xxx.xxx.72 port=5678',
        p1 'dbname=pl_db1 host=11.xxx.xxx.9 port=5678'
    );
  2. Grant access to the cluster:

    GRANT USAGE ON FOREIGN SERVER cluster_srv1 TO postgres;
  3. Create a user mapping so PL/Proxy can authenticate to the data nodes:

    CREATE USER MAPPING FOR postgres SERVER cluster_srv1 OPTIONS (user 'postgres');

Test setup

The following sections use three RDS instances to demonstrate both modes.

IPRoleDatabaseUsername
100.xx.xx.136Proxy nodepostgrespostgres
100.xx.xx.72Data nodepl_db0postgres
11.xx.xx.9Data nodepl_db1postgres

Create a users table on each data node:

CREATE TABLE users (userid int, name text);

Test CLUSTER mode

Horizontal sharding

RUN ON ANY randomly routes each INSERT to one of the data nodes, distributing writes across the cluster.

  1. Create the insert_user function on each data node:

    CREATE OR REPLACE FUNCTION insert_user(i_id int, i_name text)
    RETURNS integer AS $$
        INSERT INTO users (userid, name) VALUES ($1, $2);
        SELECT 1;
    $$ LANGUAGE SQL;
  2. Create the proxy function on the proxy node. PL/Proxy matches calls by function name, so the proxy function and the data node function must share the same name.

    CREATE OR REPLACE FUNCTION insert_user(i_id int, i_name text)
    RETURNS integer AS $$
        CLUSTER 'cluster_srv1';
        RUN ON ANY;
    $$ LANGUAGE plproxy;
  3. Create a read function on the proxy node. RUN ON ALL queries every data node and merges the results.

    CREATE OR REPLACE FUNCTION get_user_name()
    RETURNS TABLE(userid int, name text) AS $$
        CLUSTER 'cluster_srv1';
        RUN ON ALL;
        SELECT userid, name FROM users;
    $$ LANGUAGE plproxy;
  4. Insert 10 records from the proxy node:

    SELECT insert_user(1001, 'Sven');
    SELECT insert_user(1002, 'Marko');
    SELECT insert_user(1003, 'Steve');
    SELECT insert_user(1004, 'lottu');
    SELECT insert_user(1005, 'rax');
    SELECT insert_user(1006, 'ak');
    SELECT insert_user(1007, 'jack');
    SELECT insert_user(1008, 'molica');
    SELECT insert_user(1009, 'pg');
    SELECT insert_user(1010, 'oracle');
  5. Query each data node to see how the data was distributed:

    RUN ON ANY routes writes randomly, so the distribution is uneven.
    -- On pl_db0
    SELECT * FROM users;
     userid |  name
    --------+--------
       1001 | Sven
       1003 | Steve
       1004 | lottu
       1005 | rax
       1006 | ak
       1007 | jack
       1008 | molica
       1009 | pg
    (8 rows)
    
    -- On pl_db1
    SELECT * FROM users;
     userid |  name
    --------+--------
       1002 | Marko
       1010 | oracle
    (2 rows)
  6. Read all rows back through the proxy node:

    SELECT userid, name FROM get_user_name();
     userid |  name
    --------+--------
       1001 | Sven
       1003 | Steve
       1004 | lottu
       1005 | rax
       1006 | ak
       1007 | jack
       1008 | molica
       1009 | pg
       1002 | Marko
       1010 | oracle
    (10 rows)

SQL replication

RUN ON ALL broadcasts each write to every data node, keeping all nodes in sync.

  1. Create a trunc_user function on each data node and on the proxy node:

    -- On pl_db0 and pl_db1
    CREATE OR REPLACE FUNCTION trunc_user()
    RETURNS integer AS $$
        TRUNCATE TABLE users;
        SELECT 1;
    $$ LANGUAGE SQL;
    
    -- On the proxy node
    CREATE OR REPLACE FUNCTION trunc_user()
    RETURNS SETOF integer AS $$
        CLUSTER 'cluster_srv1';
        RUN ON ALL;
    $$ LANGUAGE plproxy;
  2. Truncate the table across all nodes:

    SELECT trunc_user();
     trunc_user
    ------------
              1
              1
    (2 rows)
  3. Create the replication insert function on the proxy node. TARGET insert_user tells PL/Proxy to call the insert_user function (already defined on the data nodes) on every node:

    CREATE OR REPLACE FUNCTION insert_user_2(i_id int, i_name text)
    RETURNS SETOF integer AS $$
        CLUSTER 'cluster_srv1';
        RUN ON ALL;
        TARGET insert_user;
    $$ LANGUAGE plproxy;
  4. Insert 4 records:

    SELECT insert_user_2(1004, 'lottu');
    SELECT insert_user_2(1005, 'rax');
    SELECT insert_user_2(1006, 'ak');
    SELECT insert_user_2(1007, 'jack');
  5. Verify that both data nodes contain identical data:

    -- On pl_db0
    SELECT * FROM users;
     userid | name
    --------+-------
       1004 | lottu
       1005 | rax
       1006 | ak
       1007 | jack
    (4 rows)
    
    -- On pl_db1
    SELECT * FROM users;
     userid | name
    --------+-------
       1004 | lottu
       1005 | rax
       1006 | ak
       1007 | jack
    (4 rows)
  6. Read through the proxy node using RUN ON ANY — since all nodes hold identical data, any node returns the full result:

    CREATE OR REPLACE FUNCTION get_user_name_2()
    RETURNS TABLE(userid int, name text) AS $$
        CLUSTER 'cluster_srv1';
        RUN ON ANY;
        SELECT userid, name FROM users;
    $$ LANGUAGE plproxy;
    
    SELECT userid, name FROM get_user_name_2();
     userid | name
    --------+-------
       1004 | lottu
       1005 | rax
       1006 | ak
       1007 | jack
    (4 rows)

Test CONNECT mode

CONNECT mode routes a query to a single specified database without requiring cluster setup. Embed the connection string directly in the function.

CREATE OR REPLACE FUNCTION get_user_name_3()
RETURNS TABLE(userid int, name text) AS $$
    CONNECT 'dbname=pl_db0 host=100.81.137.72 port=56789';
    SELECT userid, name FROM users;
$$ LANGUAGE plproxy;

SELECT userid, name FROM get_user_name_3();
 userid | name
--------+-------
   1004 | lottu
   1005 | rax
   1006 | ak
   1007 | jack
(4 rows)

Choose a mode

CLUSTER modeCONNECT mode
Use whenYou need horizontal sharding or want to replicate writes across nodesYou need to query a specific remote database by connection string
Cluster setup requiredYes (CREATE SERVER, user mapping)No
Data node count constraintMust be a power of 2None
RoutingRUN ON ANY (random) or RUN ON ALL (broadcast)Fixed connection string per function
Typical use caseDistributing large datasets across multiple RDS instancesCross-instance queries or ad-hoc federation

What's next