AnalyticDB for PostgreSQL provides three options to distribute table data across compute nodes: hash distribution, random distribution, and replicated distribution.

Syntax

CREATE TABLE table_name (...) [ DISTRIBUTED BY (column  [,..] ) | DISTRIBUTED RANDOMLY | DISTRIBUTED REPLICATED ]
Note AnalyticDB for PostgreSQL V4.3 supports only hash distribution and random distribution. Replicated distribution is a new feature in AnalyticDB for PostgreSQL V6.0.

Hash distribution

DISTRIBUTED BY (column, [ ... ])

This clause specifies hash distribution. Table rows are distributed across compute nodes based on their hash values in one or more distribution columns that are selected as the distribution key. Each row is assigned to a specific compute node. Rows that have identical values are always assigned to the same compute node. You can choose a unique distribution key, such as the primary key, to ensure even distribution of data. The default table distribution option is hash distribution. If you do not specify a DISTRIBUTED clause when you create a table, the table uses its primary key or the first identified suitable distribution column as the distribution key. If no suitable distribution column is identified, the system uses random distribution.

Execute the following statement to create a table:

CREATE TABLE products 
(name varchar(40),                   
 prod_id integer,                   
 supplier_id integer) DISTRIBUTED BY (prod_id);   

In the preceding example, a table that uses hash distribution is created. Each row is assigned to a specific compute node based on its hash value.

Random distribution

DISTRIBUTED RANDOMLY

This clause specifies random distribution. Table rows are evenly distributed across compute nodes by using a round-robin algorithm. Rows that have identical values may be assigned to different compute nodes. We recommend that you use random distribution only when no suitable distribution column is identified.

Execute the following statement to create a table:

CREATE TABLE random_stuff (things text,
                           doodads text,
                           etc text)  DISTRIBUTED RANDOMLY;

In the preceding example, a table that uses random distribution is created. The rows of the table are distributed across compute nodes by using a round-robin algorithm. If no suitable distribution column is identified, we recommend that you use random distribution.

Replicated distribution

DISTRIBUTED REPLICATED

This clause specifies replicated distribution. All data of a table is stored on each compute node. If you want to join large and small tables, you can specify replicated distribution for small tables to improve join performance.

Execute the following statement to create a table:

CREATE TABLE replicated_stuff (things text,
                               doodads text,
                               etc text) DISTRIBUTED REPLICATED;

In the preceding example, a table that uses replicated distribution is created. All data of the table is stored on each compute node.