This topic describes the development and O&M recommendations that can help you increase the security compliance, stability, and performance of your ApsaraDB RDS for PostgreSQL instance.

Connection pooling

  • We recommend that you store SQL statements in PreparedStatement objects. This way, hard parses are not required, which reduces CPU resource consumption and increases the performance of your RDS instance.
  • We recommend that you close idle connections. This way, you can reduce memory usage, improve the efficiency of GetSnapshotData(), and increase database performance.
  • We recommend that you enable the connection pool feature on your application to prevent the resources from being consumed by short-lived connections and prevent performance deterioration. If your application does not support the connection pool feature, we recommend that you configure a connection pool between your application and your RDS instance. For example, you can use PgBouncer or Pgpool-II as a connection pool.
  • We recommend that you configure the following parameters for connection pooling:
    • minimumIdle: specifies the minimum number of idle connections in connection pools. We recommend that you set this parameter to 1 to reduce idle connections.
      Note The maxIdle parameter is removed from the configurations of most connection pools. If the maxIdle parameter is available, we recommend that you set this parameter to 1.
    • maxLifetime: specifies the maximum time-to-live (TTL) of each connection in connection pools. We recommend that you set this parameter to 60 minutes. This way, you can reduce the probability of out of memory (OOM) errors that occur due to frequent connections to RelCache.
    • maximumPoolSize: specifies the maximum number of connections that are allowed in each connection pool. We recommend that you set this parameter to 15. A connection pool that supports up to 15 connections is suitable for most business scenarios. If the number of cached connections in a connection pool is small and your RDS instance processes only the workloads from the connections, you can set the maximumPoolSize parameter to a value greater than 15 on the database clients.
    Note
    We recommend that you use the following configurations for connection pooling:
    • We recommend that you use the following configuration for the HikariCP connection pool, which is the recommended connection pool in Java environments:
      minimumIdle=1, maximumPoolSize=15, idleTimeout=600000 (10 minutes), maxLifetime=3600000 (60 minutes)
    • We recommend that you use the following configuration for the GORM connection pool, which is the recommended connection pool in GO environments:
      sqlDB.SetMaxIdleConns(1), sqlDB.SetMaxOpenConns(15), sqlDB.SetConnMaxLifetime(time.Hour)
    • We recommend that you use the following configuration for the Druid connection pool, which is used in Java environments:
      initialSize=1, minIdle=1, maxIdle=1, maxActive=15, testOnBorrow=false, testOnReturn=false, testWhileIdle=true,minEvictableIdleTimeMillis=600000 (10 minutes), maxEvictableIdleTimeMillis=900000 (15 minutes), timeBetweenEvictionRunsMillis=60000 (1 minutes), maxWait=6000 (6 seconds).

    The preceding configurations do not include PreparedStatement objects. You must configure PreparedStatement objects based on your business requirements.

Performance and stability

  • We recommend that you do not create more than 5,000 tables in a single database and make sure that the number of subpartitions for a single partitioned table does not exceed 128. If you partition tables by time, we recommend that you use a granularity of month or year. We recommend that you do not partition tables by day. This way, you can reduce the memory consumption for establishing frequent connections to RelCache.
  • We recommend that you use CREATE INDEX CONCURRENTLY to create indexes for online workloads. This way, you can prevent the DML INSERT, UPDATE, and DELETE operations that are performed in other sessions on the table for which indexes are created from being blocked.
  • We recommend that you use REINDEX CONCURRENTLY to re-create indexes for RDS instances that run PostgreSQL 12 or a later version. For RDS instances run PostgreSQL 11 or an earlier version, we recommend that you use CONCURRENTLY to create indexes before you delete the original indexes.
  • Do not frequently create or delete temporary tables. This way, you can reduce the consumption of system table resources. Proceed with caution when you use ON COMMIT DROP. In most cases, you can use the WITH clause instead of creating temporary tables.
  • Compared with the previous versions, PostgreSQL 13 is optimized to improve partitioned tables, HashAggregate operations for GROUP BY clauses, and parallel queries. We recommend that you upgrade the major engine version of your RDS instance to PostgreSQL 13. For more information, see Upgrade the major engine version of an ApsaraDB RDS for PostgreSQL instance.
  • If you no longer use the cursor feature, we recommend that you disable this feature.
  • We recommend that you execute the TRUNCATE statement rather than the DELETE statement on tables to improve the performance of your RDS instance.
  • PostgreSQL supports the execution and rollback of DDL transactions. We recommend that you encapsulate DDL statements in transactions. This way, you can roll back DDL statements based on your business requirements. Take note that you must encapsulate DDL statements in transactions of appropriate lengths. If the transactions are long, the read operations on the objects that are being accessed by these transactions may be blocked for a long period of time.
  • If you want to write a large amount of data to your RDS instance, we recommend that you run the copy command or execute the INSERT INTO table VALUES (),(),...(); statement to increase the writing speed.

Minor engine version

  • If you want to use the Replication Slot feature, we recommend that you update the minor engine version of your RDS instance to 20201230 or later. In 20201230 and later minor engine versions, you can enable the Logical Replication Slot Failover feature and configure an alert rule for the Maximum Replication Slot Latency metric to prevent logical subscriptions from being delayed or interrupted. If logical subscriptions are delayed or interrupted, logical replication slots are lost and write-ahead logging (WAL) records may pile up. For more information, see Logical Replication Slot Failover and Manage the alert rules of an ApsaraDB RDS for PostgreSQL instance.
  • If you enable the audit log feature or the Performance Insight feature, we recommend that you update the minor engine version of your RDS instance to 20211031 or later.
    Note We recommend that you set the log_statement parameter to all. This way, you can improve the performance of your RDS instance by approximately four times in scenarios in which more than 50 active connections are established. However, if the minor engine version of your RDS instance is earlier than 20211031, the CPU utilization of your RDS instance abruptly increases after you set the log_statement parameter to all.

Monitoring and alerting

Troubleshooting

Design

Permission design
  • We recommend that you manage permissions at the schema level or the role level and create the following two roles for your RDS instance in compliance with the principle of least privilege (PoLP): one role with the read and write permissions and one role with only the read permissions. For more information, see Manage permissions in an ApsaraDB RDS for PostgeSQL instance.
  • If you enable read/write splitting at the application layer, we recommend that you follow PoLP and use the read-only role for read-only database clients.
Table design
  • The data types that are defined for the fields of the schema in your RDS instance must be the same as the data types that are defined in your application. In addition, the same rules must be used to check fields for all tables. This way, you can prevent errors and make sure that you can use indexes.
  • If you want to delete historical data on a regular basis, we recommend that you partition tables by time. We also recommend that you execute the DROP or TRUNCATE statement on tables to delete data. We recommend that you do not execute the DELETE statement on tables to delete data.
  • If you want to frequently update a table, we recommend that you set the FILLFACTOR parameter of the table to 85 and reserve 15% of the available storage per page when you create the table. The reserved storage is used to update the hot data in the table.
    CREATE TABLE test123(id int, info text) WITH(FILLFACTOR=85);  
  • We recommend that the names of temporary tables start with tmp_. We also recommend that the names of child tables end with the rule based on which the parent table of the child tables is partitioned. For example, if the name of a parent table that is partitioned by year is tbl, the names of the child tables of the parent table can be tbl_2016 and tbl_2017.
Index design
  • A B-tree index can contain fields whose total size is up to 2,000 bytes. If the total size of the fields exceeds 2,000 bytes, a new index is required. We recommend that you create a function index, such as a hash index. If you do not create a function index, we recommend that you use an analyzer to analyze the data before you create an index on the data.
  • Data, such as streaming data, time fields, and auto-increment fields, may be stored in a linear order. In most cases, range queries are run to query these types of data. We recommend that you create BRIN indexes to reduce the size per index and speed up data insertion.
    CREATE INDEX idx ON tbl using BRIN(id);
  • We recommend that you do not run full table scans, except when you want to scan and analyze a large amount of data. ApsaraDB RDS for PostgreSQL supports indexes of most data types.

    The following types of indexes are supported: B-tree, Hash, GIN, GiST, SP-GiST, BRIN, RUM, Bloom, and PASE. Among these types of indexes, RUN, Bloom, and PASE are extended indexes.

  • We recommend that the names of primary key indexes start with pk_, the names of unique indexes start with uk_, and the names of regular indexes start with idx_.
Data type design and character set design
  • We recommend that you select a suitable data type for the data you want to write. If you want to write numeric data or the data that you want to write can be stored in tree structures, we recommend that you do not select the string data type.

    A suitable data type increases query efficiency.

    ApsaraDB RDS for PostgreSQL supports the following data types: Numeric, Floating-Point, Monetary, String, Character, Binary, Date/Time, Boolean, Enumerated, Geometry, Network Address, Bit String, Text Search, UUID, XML, JSON, Array, Composite, Range, Object identifier, row number, large object, ltree structure, Data Cube, geography, H-Store, pg_trgm module, PostGIS, and HyperLogLog. PostGIS includes data types such as point, line segment, surface, path, latitude, longitude, raster, and topology. HyperLogLog is a fixed-size, set-like data structure that is used to count distinct values at a tunable precision.

  • We recommend that you set LC_COLLATE to C rather than UTF8. The UTF8 character set collation is inferior to the C character set collation. In addition, if you use the UTF8 character set collation, you must specify the text_pattern_ops operator class for indexes to support LIKE queries.
Stored procedure design
  • If the business logic is lengthy, we recommend that you reduce the number of interactions between your application and your RDS instance. We recommend that you use stored procedures, such as stored procedures that are based on PL/pgSQL, or built-in functions. PL/pgSQL is a procedural programming language that is supported by PostgreSQL and is used to process complex business logic. PostgreSQL supports the following common functions and complex functions: analytic functions, aggregate functions, window functions, mathematical functions, and geometric functions.

Data query

  • We recommend that you do not replace COUNT(column_name) or COUNT(constants) with COUNT(*). COUNT(*) is a standard function that is defined in SQL-92 to count the number of rows. COUNT(*) counts in NULL values when it calculates the actual number of rows, whereas COUNT(column_name) does not count in NULL values.
  • If COUNT(DISTINCT) is used, the names of the multiple columns that you want to specify must be enclosed in a pair of parentheses (). Example: COUNT( (col1,col2,col3) ). COUNT(DISTINCT) counts in all NULL values. Therefore, COUNT(DISTINCT) produces the same result as COUNT(*).
  • We recommend that you do not use SELECT * FROM t. Replace the wildcard (*) with an array of fields that you require. This way, ApsaraDB RDS returns only the fields that you specify and does not return the fields that you do not require.
  • We recommend that you prevent ApsaraDB RDS from returning large amounts of data to database clients, except for extract, transform, and load (ETL) operations. If the amount of data that is returned for a query is abnormally large, check whether the execution plan of the query is optimal.
  • If you want to perform range queries, we recommend that you use the Range data type and GiST indexes to improve query performance.
  • If your application frequently initiates queries for which a large number of results are returned, we recommend that you aggregate all results of such a query into a result set. For example, if the number of results that are returned for a query reaches 100, we recommend that you aggregate the 100 results of the query into a result set. In addition, if your application frequently accesses the results in the result set by ID, we recommend that you aggregate the results by ID on a regular basis. A small number of results returned indicates shorter response time.

Instance management

  • We recommend that you enable the SQL Explorer and Audit feature for your RDS instance. This feature allows you to query and export the information about the SQL statements that are executed on your RDS instance. The information includes the databases on which the SQL statements are executed, the status of the SQL statements, and the execution durations of the SQL statements. You can use this feature to diagnose the health status of the SQL statements, troubleshoot performance issues, and analyze business traffic. For more information, see Use the SQL Explorer and Audit feature on an ApsaraDB RDS for PostgreSQL instance.
  • If you want to monitor and record the activities within your Alibaba Cloud account, we recommend that you use ActionTrail. The activities that you can monitor and record include access to and use of cloud products and services by using the Alibaba Cloud Management Console, open API, and developer tools. ActionTrail records these actions as events. You can download the events from the ActionTrail console or configure ActionTrail to deliver the events to Log Service Logstores or Object Storage Service (OSS) buckets. Then, you can perform operations, such as action analysis, security analysis, resource change tracking, or compliance audit based on the events. For more information, see What is ActionTrail?.
  • DDL operations must be reviewed before they are performed. Make sure that you execute DDL operations during off-peak hours.
  • Before you commit the transactions that are run to delete or modify data, we recommend that you execute the SELECT statement to confirm the transactions. This way, you can prevent accidental operations. If you want to update only one row based on your business logic, add LIMIT 1.
  • If you want to perform DDL operations or other similar operations that may acquire locks on specific objects, we recommend that you configure a lock wait mechanism to prevent these operations from blocking queries on the locked objects. Such operations include VACUUM FULL and CREATE INDEX.
    begin;  
    SET local lock_timeout = '10s';  
    -- DDL query;  
    end;
  • You can execute the EXPLAIN ANALYZE statement to view the execution plan of a query. The EXPLAIN ANALYZE statement and the EXPLAIN statement work in a similar way. However, the EXPLAIN ANALYZE statement may involve data changes. If the execution plan of a query involves operations such as DML UPDATE, INSERT, or DELETE operations that cause data changes, you must execute the EXPLAIN ANALYZE statement in the transaction, and roll the transaction back after the statement is executed.
    begin;  
    EXPLAIN (ANALYZE) <DML(UPDATE/INSERT/DELETE) SQL>; 
    rollback;
  • If you want to delete data or update a large amount of data, we recommend that you divide the data into batches and delete or update each batch of data in an independent transaction. We recommend that you do not delete or update all data in one transaction. If you delete or update all data in one transaction, a large amount of junk data is generated.