All Products
Search
Document Center

ApsaraDB RDS:Use pg_dump and pg_restore to migrate data from a self-managed PostgreSQL instance to an ApsaraDB RDS for PostgreSQL instance

Last Updated:Jan 10, 2024

This topic describes how to use pg_dump and pg_restore to migrate a database from a self-managed PostgreSQL instance to an ApsaraDB RDS for PostgreSQL instance. This applies if you are familiar with pg_dump and pg_restore and do not want to use the migration services and features provided by Alibaba Cloud, such as Data Transmission Service (DTS) and the cloud migration feature of ApsaraDB RDS for PostgreSQL.

Prerequisites

  • An RDS instance that runs the same PostgreSQL version as the self-managed PostgreSQL instance is created. For more information, see Create an ApsaraDB RDS for PostgreSQL instance.

    Note
    • You can use pg_dump and pg_restore to migrate a database across PostgreSQL versions. However, compatibility issues may occur. To ensure smooth migration, we recommend that you use the same PostgreSQL version for the self-managed PostgreSQL instance and the RDS instance.

    • You can run the psql --version command on the server on which the self-managed PostgreSQL instance is deployed to query the PostgreSQL version of the instance.

    • You can upgrade the major engine version of your RDS instance after the migration is complete. For more information, see Upgrade the major engine version.

    • If the self-managed PostgreSQL instance is deployed on an Elastic Compute Service (ECS) instance, we recommend that you create the RDS instance in the same region and virtual private cloud (VPC) as the ECS instance. This way, data is migrated over an internal network, which is faster and more secure.

  • A whitelist is configured for the RDS instance to allow access from the server on which the self-managed PostgreSQL instance is deployed. For more information, see Configure an IP address whitelist.

  • A privileged account is created for the RDS instance. For more information, see Create an account.

  • The endpoint and port number of the RDS instance are obtained. For more information, see View and change the endpoints and port numbers of an ApsaraDB RDS for PostgreSQL instance.

    Note

    If the self-managed PostgreSQL instance is deployed on an ECS instance that resides in the same region and VPC as the RDS instance, you must obtain the internal endpoint and port number of the RDS instance. In other cases, you must obtain the public endpoint and port number of the RDS instance.

  • An administrator account that assumes the superuser role is created for the self-managed PostgreSQL instance.

  • pg_dump and pg_restore are installed on the server on which the self-managed PostgreSQL instance is deployed.

    Note

    When you install the self-managed PostgreSQL instance on your on-premises machine, pg_dump and pg_restore are automatically installed. You can run the pg_dump --version and pg_restore --version commands to query the details.

Procedure

The migration solution in this topic is suitable for migrating a single database. If you want to migrate multiple databases or the entire self-managed PostgreSQL instance to an RDS instance, you can use the migration services and features provided by Alibaba Cloud.

Step 1: Check the information about the database that you want to migrate

Connect to the self-managed PostgreSQL instance. In the psql command line, run the \l command to query the information about databases and check the values of Owner, Encoding, Collate, and Ctype of the database that you want to migrate.

In this example, the database named testdb is migrated. The following information is returned for the database: Owner is testuser, Encoding is UTF8, Collate is en_US.UTF-8, and Ctype is en_US.UTF-8.

                                                 List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    | ICU Locale | Locale Provider |   Access privileges
-----------+----------+----------+-------------+-------------+------------+-----------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | =Tc/postgres         +
           |          |          |             |             |            |                 | postgres=CTc/postgres
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | =c/postgres          +
           |          |          |             |             |            |                 | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | =c/postgres          +
           |          |          |             |             |            |                 | postgres=CTc/postgres
 testdb    | testuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            |
(4 rows)

Step 2: Migrate users

  1. Connect to the server on which the self-managed PostgreSQL instance is deployed and run the following command to export the owner named testuser of the database that you want to migrate:

    pg_dumpall -g -h 127.0.0.1 -p 5432 -U testuser

    Sample output:

    Note

    In the Roles section of the command output, you can obtain the statement that is related to the permissions of the testuser user.

    --
    -- PostgreSQL database cluster dump
    --
    
    SET default_transaction_read_only = off;
    
    SET client_encoding = 'UTF8';
    SET standard_conforming_strings = on;
    
    --
    -- Roles
    --
    
    CREATE ROLE postgres;
    ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION BYPASSRLS;
    CREATE ROLE testuser;
    ALTER ROLE testuser WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION BYPASSRLS PASSWORD 'SCRAM-SHA-256$4096:0IEvNqWupQsGcaoDS3SX****3U=';
    
    --
    -- User Configurations
    --
    
    --
    -- PostgreSQL database cluster dump complete
    --
  2. Use the privileged account to connect to the RDS instance. For more information, see Connect to an ApsaraDB RDS for PostgreSQL instance.

  3. Change SUPERUSER in the obtained SQL statement that is related to the permissions of the testuser user to rds_SUPERUSER. On the RDS instance, create an account that has the same permissions as the testuser user of the self-managed PostgreSQL instance.

    CREATE ROLE testuser;
    ALTER ROLE testuser WITH rds_SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION BYPASSRLS PASSWORD 'SCRAM-SHA-256$4096:0IEvNqWupQsGcaoDS3SX****3U=';

Step 3: Create a destination database on the RDS instance

  1. Use the privileged account to connect to the RDS instance.

  2. Create a destination database. Make sure that the destination database has the same character set and the same Collate, and Ctype settings as the source database on the self-managed PostgreSQL instance. For example, the destination database must have the same settings as the source database that is obtained in Step 1: Encoding is UTF8, Collate is en_US.UTF-8, and Ctype is en_US.UTF-8.

    CREATE DATABASE <Name of the destination database> WITH OWNER <Database owner> TEMPLATE template0 ENCODING 'UTF8' LC_CTYPE 'en_US.UTF-8' LC_COLLATE 'en_US.UTF-8';
    Note
    • The name of the destination database can be different from the name of the source database on the self-managed PostgreSQL instance.

    • The owner of the destination database must be the same as the owner of the source database on the self-managed PostgreSQL instance.

Step 4: Migrate data

You can use one of the following methods to migrate data:

  • If the server on which the self-managed PostgreSQL instance is deployed can connect to the RDS instance, we recommend that you use online migration.

  • If the server on which the self-managed PostgreSQL instance is deployed cannot connect to the RDS instance, we recommend that you use offline migration. Specifically, use pg_dump to export the data to a file, copy the file to the host to which the RDS instance connects, and then use pg_restore to import the file to the RDS instance.

Online migration

  1. Configure a password file. For more information, see The Password File.

    1. Run the following command on the server on which the self-managed PostgreSQL instance is deployed to create and edit the .pgpass file to store the password:

      vi ~/.pgpass
    2. Press i to enter the insert mode and enter the password-related content.

      Note
      • The content of the password file is in the following format: host:port:dbname:username:password.

      • By default, the metadata of a PostgreSQL database is stored in lower case. Therefore, the values of the dbname and username parameters must be in lower case. If the database name and username that you specified use uppercase letters and are enclosed in a pair of double quotation marks ("), the preceding condition does not apply.

      Endpoint of the RDS instance:5432:Name of the destination database:Username of the owner of the destination database:Password
      127.0.0.1:5432:Name of the source database on the self-managed PostgreSQL instance:Username of the superuser:Password
    3. Press Esc and enter :wq to exit the insert mode.

    4. Modify permissions on the .pgpass file.

      chmod 700 ~/.pgpass
  2. Run the following command on the server on which the self-managed PostgreSQL instance is deployed to migrate data:

    nohup pg_dump -F p -h 127.0.0.1 -p 5432 -U <Superuser of the self-managed instance> -d <Name of the database that you want to migrate> --no-tablespaces | time psql -h <Endpoint of the RDS instance> -p 5432 -U <Username of the owner of the destination database> --single-transaction <Name of the destination database> > ./pg.dump.log 2>&1 & 
    Note
    • You can query the pg.dump.log file to check whether error logs exist. If the file is empty, the migration is successful. In this case, you can log on to the RDS instance to view the migrated data.

    • The --single-transaction option is specified in the preceding command. During data import, all operations are encapsulated in one transaction to ensure that all operations are either all successful or all fail. If all operations fail, you can resolve the issue and directly import the data again.

Offline migration

  1. Run the following command on the server on which the self-managed PostgreSQL instance is deployed to export the pg.dump file:

    nohup pg_dump -F c -h 127.0.0.1 -p 5432 -U <Superuser of the self-managed instance> -d <Name of the database that you want to migrate> --no-tablespaces -f ./pg.dump > ./pg.dump.log 2>&1 &
    Note

    Wait for the export to complete. Then, verify that no error logs exist in the pg.dump.log file.

  2. Copy the pg.dump file to a host to which the RDS instance can connect. Then, import the file.

    Note

    If you use this method, you must configure a whitelist for the RDS instance to allow the host to access the instance. For more information, see Configure an IP address whitelist.

    • Single-thread import

      pg_restore -h <Endpoint of the RDS instance> -p 5432 -U <Username of the owner of the destination database> -d <Name of the destination database> --no-tablespaces --single-transaction  pg.dump  >./pg.restore.log
      Note
      • Wait for the import to complete. You can query the pg.restore.log file to check whether error logs exist. If the file is empty, the migration is successful. In this case, you can log on to the RDS instance to view the migrated data.

      • The --single-transaction option is specified in the preceding command. During data import, all operations are encapsulated in one transaction to ensure that all operations are either all successful or all fail. If all operations fail, you can resolve the issue and directly import the data again.

    • Multi-thread parallel import

      pg_restore -h <Endpoint of the RDS instance> -p 5432 -U <Username of the owner of the destination database> -d <Name of the destination database> --no-tablespaces -j <Number of parallel threads> pg.dump  >./pg.restore.log 2>&1 &
      Note
      • You cannot use the --single-transaction option and the -j option that is used to enable parallelism at the same time.

      • The value of the -j option is a positive integer. We recommend that the maximum value of the option is twice or four times the number of CPU cores of the system. This ensures that system resources are fully utilized and prevents resource contention and performance degradation that are caused by a large number of parallel tasks.

      • This method does not use the --single-transaction option to encapsulate transactions. If an error is reported during data import, you must resolve the issue, clear all tables in the destination database on the RDS instance, and then import data again. This helps ensure data consistency.

References

You can configure more parameters for the commands in this topic based on your business requirements. For more information, see the following documents:

The migration solution in this topic is suitable for migrating a single database. If you want to migrate multiple databases or the entire self-managed PostgreSQL instance to an RDS instance, you can use the migration services and features provided by Alibaba Cloud.