This topic describes how to select AnalyticDB for PostgreSQL instance types.
Instance type definition
AnalyticDB for PostgreSQL instance types are determined by the following parameters:
- Storage Type
This parameter is used to determine whether the storage media is SSD or HDD. The differences between the two storage types are as follows:
- High-performance SSD storage: provides better I/O capabilities and higher analysis performance.
- High-capacity HDD storage: provides a larger and more affordable space to meet higher storage requirements.
- Node Cores
AnalyticDB for PostgreSQL uses the MPP architecture, where each instance is composed of multiple nodes and each node is a data partition. The optional types of each node are defined as follows:
Node (partition) storage type Number of cores per node Memory Valid storage space Total dual-copy space Description High-performance SSD 1 8 GB 80 GB 160 GB This storage type is only suitable for scenarios where less than five concurrent queries are performed and the number of nodes is less than 32. This SSD configuration allows you to create an instance consisting of 4 to 64 nodes. High-performance SSD 4 32 GB 320 GB 640 GB The recommended type of high-performance SSD storage. This SSD configuration allows you to create an instance consisting of 8 to 4,096 nodes. High-capacity HDD 2 16 GB 1 TB 2 TB This storage type is only suitable for scenarios where less than five concurrent queries are performed and the number of nodes is less than 8. This HDD configuration allows you to create an instance consisting of 4 to 32 nodes. High-capacity HDD 4 32 GB 2 TB 4 TB The recommended type of high-performance SSD storage. This HDD configuration allows you to create an instance consisting of 8 to 4,096 nodes.
- Node Num
An instance consists of multiple nodes. Each node is a data partition in the MPP architecture. A single instance can have up to 4,096 nodes. Each node stores and processes part of the data. In the MPP architecture, the storage space increases linearly with the number of nodes, but the query response time does not change.
Principles of selecting instance specifications
When creating or upgrading the type of an AnalyticDB for PostgreSQL instance, you must set Storage Type, Node Cores, and Node Num. AnalyticDB for PostgreSQL also supports OSS-based external table extensions and data compression in external storage through gzip. You can store data that is not required for real-time computing in an external storage to further reduce storage costs.
- Storage type
- We recommend that you use SSD to create AnalyticDB for PostgreSQL instances in scenarios where performance takes precedence.
- We recommend that you use HDD in scenarios where data storage takes precedence.
- Number of cores per node
Each node stores and processes data of a partition in each user table, that is, a data partition in the MPP architecture. The recommended number of cores per node is 4. The SSD configuration that supports one core per node is only suitable for low-concurrency execution of jobs in an instance with less than 32 nodes. The HDD configuration that supports two cores per node is only suitable for low-concurrency execution of jobs in an instance with less than eight nodes.
- Number of nodes
AnalyticDB for PostgreSQL uses the MPP architecture. The data processing capability increases linearly as the number of nodes increases, ensuring that the response time does not change as the data volume increases. You can select the appropriate number of nodes based on the application scenario and the volume of raw data.
Row store and column store
When you create a table, AnalyticDB for PostgreSQL allows you to specify the data storage format to store data by row or by column.
- We recommend that you use row store in business scenarios that involve lots of data
update operations or real-time write operations (INSERT/UPDATE/DELETE).
If 1 TB of raw data is stored in a database by row, approximately 1 TB of storage space is required. Taking indexes, logs, and temporary files generated during computing into consideration, we recommend that you reserve 2 TB of user storage for each cluster instance. To improve query performance, you can increase the number of nodes to increase CPU and memory resources.
- In batch ETL scenarios, data is imported into databases in batches and rarely updated
(UPDATE/DELETE). Full table data is aggregated and associated based on a small number
of columns. In this case, we recommend that you use column store.
Column store provides a high data compression rate, which allows compression ratio of up to 2:5. If column store compression is performed on 1 TB of raw data after the data is imported, the data size after compression is less than 0.5 TB. In this case, you can reserve 1 TB of user storage for each cluster instance.
Examples of selecting instance specifications
In high-performance analysis scenarios where the size of raw data is 5 TB and the number of concurrent queries is over 100, we recommend that you use an SSD configuration that supports four cores per node and 32 nodes per instance. In this case, a total of 10 TB of user data storage space will be available.