AnalyticDB for PostgreSQL provides three options to distribute table data across compute nodes: hash distribution, random distribution, and replicated distribution.
Syntax
CREATE TABLE table_name (...) [ DISTRIBUTED BY (column [,..] ) | DISTRIBUTED RANDOMLY | DISTRIBUTED REPLICATED ]
Hash distribution
DISTRIBUTED BY (column, [ ... ])
This clause specifies hash distribution. Table rows are distributed across compute nodes based on their hash values in one or more distribution columns that are selected as the distribution key. Each row is assigned to a specific compute node. Rows that have identical values are always assigned to the same compute node. You can choose a unique distribution key, such as the primary key, to ensure even distribution of data. The default table distribution option is hash distribution. If you do not specify a DISTRIBUTED clause when you create a table, the table uses its primary key or the first identified suitable distribution column as the distribution key. If no suitable distribution column is identified, the system uses random distribution.
Execute the following statement to create a table:
CREATE TABLE products
(name varchar(40),
prod_id integer,
supplier_id integer) DISTRIBUTED BY (prod_id);
In the preceding example, a table that uses hash distribution is created. Each row is assigned to a specific compute node based on its hash value.
Random distribution
DISTRIBUTED RANDOMLY
This clause specifies random distribution. Table rows are evenly distributed across compute nodes by using a round-robin algorithm. Rows that have identical values may be assigned to different compute nodes. We recommend that you use random distribution only when no suitable distribution column is identified.
Execute the following statement to create a table:
CREATE TABLE random_stuff (things text,
doodads text,
etc text) DISTRIBUTED RANDOMLY;
In the preceding example, a table that uses random distribution is created. The rows of the table are distributed across compute nodes by using a round-robin algorithm. If no suitable distribution column is identified, we recommend that you use random distribution.
Replicated distribution
DISTRIBUTED REPLICATED
This clause specifies replicated distribution. All data of a table is stored on each compute node. If you want to join large and small tables, you can specify replicated distribution for small tables to improve join performance.
Execute the following statement to create a table:
CREATE TABLE replicated_stuff (things text,
doodads text,
etc text) DISTRIBUTED REPLICATED;
In the preceding example, a table that uses replicated distribution is created. All data of the table is stored on each compute node.