All Products
Search
Document Center

AnalyticDB for PostgreSQL:Query data across instances

Last Updated:Jan 31, 2024

AnalyticDB for PostgreSQL uses a foreign data wrapper (FDW) to help you easily and quickly query data across instances within an Alibaba Cloud account. This ensures data timeliness and reduces data redundancy.

Overview

A company or organization may run multiple instances to support different business fields or applications. In specific cases, you must query data across instances to achieve associated analysis across business units. You may use one of the following methods to perform cross-instance query in the past:

  • Store the same data copies on different instances. This method may cause data disordering and data redundancy.

  • Store data in a shared storage, such as an Object Storage Service (OSS) bucket. This method cannot ensure data timeliness.

To resolve the preceding issues, the Alibaba Cloud team developed an FDW based on the massively parallel processing (MPP) architecture of AnalyticDB for PostgreSQL and the intercommunication between compute nodes. The FDW uses the performance capabilities of compute nodes to implement parallel data access between instances and improve the efficiency of data access. The performance of the FDW is several times higher than the performance of the native postgres_fdw extension.

image.png

Usage notes

  • The source and destination instances must belong to the same Alibaba Cloud account and reside in the same region and virtual private cloud (VPC).

  • If the source instance in Serverless mode is in the scaling state, you cannot access data of the instance.

  • The cross-instance query feature is supported only for the following engine versions:

    • AnalyticDB for PostgreSQL V7.0 in elastic storage mode: V7.0.1.x and later.

    • AnalyticDB for PostgreSQL V6.0 in elastic storage mode: V6.3.11.2 and later.

    • AnalyticDB for PostgreSQL in Serverless mode: V1.0.6.x and later.

  • The password verification method of AnalyticDB for PostgreSQL V7.0 is changed. If you want to access an AnalyticDB for PostgreSQL V7.0 instance from an AnalyticDB for PostgreSQL V6.0 instance or an AnalyticDB for PostgreSQL instance in Serverless mode, Submit a ticket.

  • FDW foreign tables support only the SELECT and INSERT operations. The UPDATE and DELETE operations are not supported.

  • Only AnalyticDB for PostgreSQL V7.0 in elastic storage mode supports join pushdown and aggregate pushdown.

  • Only the Orca optimizer in AnalyticDB for PostgreSQL V7.0 in elastic storage mode can generate execution plans for FDW foreign tables. AnalyticDB for PostgreSQL V6.0 in elastic storage mode and AnalyticDB for PostgreSQL in Serverless mode use the native optimizer to generate execution plans for FDW foreign tables.

Procedure

To enable data query across instances A and B that belong to the same Alibaba Cloud account, perform the following steps. The two instances reside in the same region and VPC. After you enable the feature, you can access tables of the db02 database of Instance B from the db01 database of Instance A. Then, you can use the intercommunication between compute nodes to facilitate associated query with tables of the db01 database.

  1. Use psql to connect to instances A and B. For more information, see Client connection.

  2. Create databases on instances A and B.

    Create a database named db01 on Instance A and switch to the db01 database.

    CREATE DATABASE db01;
    \c db01

    Create a database named db02 on Instance B and switch to the db02 database.

    CREATE DATABASE db02;
    \c db02
  3. Install the greenplum_fdw and gp_parallel_retrieve_cursor extensions for the db01 database of Instance A and the db02 database of Instance B.

    CREATE EXTENSION IF NOT EXISTS gp_parallel_retrieve_cursor;
    CREATE EXTENSION IF NOT EXISTS greenplum_fdw;
    GRANT USAGE ON FOREIGN DATA WRAPPER greenplum_fdw TO PUBLIC;
    Note
    • The gp_parallel_retrieve_cursor extension is not required for AnalyticDB for PostgreSQL V7.0.

    • AnalyticDB for PostgreSQL does not allow you to install extensions. If the required extensions are not installed on your instance, submit a ticket.

  4. Obtain the internal IP address of Instance A and add the IP address to a whitelist of Instance B. For information about how to configure a whitelist, see Configure an IP address whitelist.

    Execute the following statement on Instance A to obtain the internal IP address:

    SELECT dbid, address FROM gp_segment_configuration;
  5. Prepare test data on the db02 database of Instance B.

    CREATE SCHEMA s01;
    
    CREATE TABLE s01.t1(a int, b int, c text); 
    CREATE TABLE s01.t2(a int, b int, c text); 
    CREATE TABLE s01.t3(a int, b int, c text); 
    
    INSERT INTO s01.t1 VALUES(generate_series(1,10),generate_series(11,20),'t1');
    INSERT INTO s01.t2 VALUES(generate_series(11,20),generate_series(11,20),'t2');
    INSERT INTO s01.t3 VALUES(generate_series(21,30),generate_series(11,20),'t3')
  6. Create a server and a user mapping on the db01 database of Instance A.

    • Create a server.

      CREATE SERVER remote_adbpg FOREIGN DATA WRAPPER greenplum_fdw
          OPTIONS (host 'gp-xxxxxxxx-master.gpdb.zhangbei.rds.aliyuncs.com',
                   port '5432',
                   dbname 'db02');

      The following table describes the parameters.

      Parameter

      Description

      host

      The internal endpoint of Instance B.

      You can log on to the AnalyticDB for PostgreSQL console and go to the Basic Information page of Instance B to view the internal endpoint in the Database Connection Information section.

      port

      The internal port number of Instance B. Default value: 5432.

      dbname

      The name of the source database. In this example, this parameter is set to db02.

    • Create a user mapping. For more information about user mappings, see CREATE USER MAPPING.

      CREATE USER MAPPING FOR PUBLIC SERVER remote_adbpg
          OPTIONS (user 'report', password '******');

      The following table describes the parameters.

      Parameter

      Description

      user

      The name of the database account that is used to connect to Instance B.

      The database account must have read permissions on the db02 database. If you want to perform INSERT operations, write permissions are also required. For information about how to create a database account, see Create a database account.

      password

      The password of the database account.

  7. Enable cross-instance query for the db01 database of Instance A.

    You can use one of the following methods to enable cross-instance query:

    • Create foreign tables for the source tables.

      CREATE SCHEMA s01;
      CREATE FOREIGN TABLE s01.t1(a int, b int) 
      	server remote_adbpg options(schema_name 's01', table_name 't1');

      This method has the following advantages and disadvantages:

      • Advantages: You can specify a custom DDL structure for foreign tables. For example, the t1 table in the db02 database contains the a, b, and c fields, but the db01 database contains only the a and b fields. By using this method, you can specify fields when you create a foreign table.

      • Disadvantages: You must know the DDL structure of each table. A long period of time is required to import multiple foreign tables at a time.

    • Import all tables from the schema of the source database.

      CREATE SCHEMA s01;
      IMPORT FOREIGN SCHEMA s01 LIMIT TO (t1, t2, t3) 
      	FROM SERVER remote_adbpg INTO s01;

      This method has the following advantages and disadvantages:

      • Advantages: You can import foreign tables in a short period of time. You do not need to know the DDL structure of each table.

      • Disadvantages: Foreign tables must have the same names and fields as the tables in the source database.

      For more information, see IMPORT FOREIGN SCHEMA.

  8. Query data of the db02 database of Instance B from the db01 database of Instance A.

    SELECT * FROM s01.t1;

    Sample result:

     a  | b  | c
    ----+----+----
      2 | 12 | t1
      3 | 13 | t1
      4 | 14 | t1
      7 | 17 | t1
      8 | 18 | t1
      1 | 11 | t1
      5 | 15 | t1
      6 | 16 | t1
      9 | 19 | t1
     10 | 20 | t1
    (10 rows)

Performance test

The following figure shows a TPC-H performance test on a local query and a cross-instance query. The test dataset is 1 TB in size.image.png

  • When a large amount of data (1 TB in this example) is involved, the performance of the cross-instance query is approximately one half the performance of the local query.

  • To query data across instances, you must transmit data across networks. To reduce network I/O operations, we recommend that you add additional filter conditions to the WHERE clause for foreign tables.

References

AnalyticDB for PostgreSQL supports the cross-database query feature. For more information, see Query data across databases.