AnalyticDB for PostgreSQL allows you to define the columns of a table as arrays. The following types of arrays are supported: basic type, user-defined type, enumeration type, and composite type. This topic describes the usage of ARRAY data in AnalyticDB for PostgreSQL, such as table creation, data writes, query, modification, operators, and functions.
Declaration of the ARRAY type
An ARRAY type is named by appending brackets ([ ]) to the data type name of array elements.
For example, execute the following statement to create a table that contains columns of basic-type arrays.
CREATE TABLE sal_emp (
id int,
name text,
pay_by_quarter integer[],
schedule text[][]
) DISTRIBUTED BY(id);Parameters:
sal_emp: the name of the table.id: the employee ID, which is of the INT type.name: the employee name, which is of the TEXT type.pay_by_quarter: the quarterly salary, which is of the one-dimensional INTEGER ARRAY type.schedule: the weekly schedule, which is of the two-dimensional TEXT ARRAY type.id: The table is distributed by id.
Write array values
Execute the INSERT statement to write array values to a table. Sample statements:
INSERT INTO sal_emp
VALUES (1,
'Bill',
'{10000, 10000, 10000, 10000}',
'{{"meeting", "lunch"}, {"training", "presentation"}}');
INSERT INTO sal_emp
VALUES (2,
'Carol',
'{20000, 25000, 25000, 25000}',
'{{"breakfast", "consulting"}, {"meeting", "lunch"}}');Query the data that is written to the table.
SELECT * FROM sal_emp;Sample result:
id | name | pay_by_quarter | schedule
----+-------+---------------------------+-------------------------------------------
2 | Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}
1 | Bill | {10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
(2 rows)Access arrays
AnalyticDB for PostgreSQL uses a 1-based numbering convention for arrays. An array with n elements starts with array[1] and ends with array[n]. You can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writing lower-bound:upper-bound for one or more array dimensions.
The following examples describe how to access an array element in the sal_emp table:
Example 1: Query the names of the employees whose salary changed in the second quarter.
SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];Sample result:
name ------- Carol (1 row)Example 2: Query the third-quarter salary of all employees.
SELECT pay_by_quarter[3] FROM sal_emp;Sample result:
pay_by_quarter ---------------- 10000 25000 (2 rows)Example 3: Query the first item on Bill's schedule for the first two days of the week.
SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';Sample result:
schedule ------------------------ {{meeting},{training}} (1 row)
Array functions
The following table describes the array functions that are supported in AnalyticDB for PostgreSQL.
Function | Data type of return values | Description | Example | Result |
array_append(anyarray, anyelement) | anyarray | Appends elements to the end of an array. | array_append(ARRAY[1,2], 3) | {1,2,3} |
array_cat(anyarray, anyarray) | anyarray | Concatenates two arrays. | array_cat(ARRAY[1,2,3], ARRAY[4,5]) | {1,2,3,4,5} |
array_ndims(anyarray) | int | Returns the number of dimensions of an array. | array_ndims(ARRAY[[1,2,3], [4,5,6]]) | 2 |
array_dims(anyarray) | text | Returns the text representation of array dimensions. | array_dims(ARRAY[[1,2,3], [4,5,6]]) | [1:2][1:3] |
array_fill(anyelement, int[], [, int[]]) | anyarray | Returns an array that is initialized with designated values and dimensions. Optional lower bounds cannot be 1. | array_fill(7, ARRAY[3], ARRAY[2]) | [2:4]={7,7,7} |
array_length(anyarray, int) | int | Returns the length of a specified array dimension. | array_length(array[1,2,3], 1) | 3 |
array_lower(anyarray, int) | int | Returns the lower bound of a specified array dimension. | array_lower('[0:2]={1,2,3}'::int[], 1) | 0 |
array_prepend(anyelement, anyarray) | anyarray | Appends elements to the beginning of an array. | array_prepend(1, ARRAY[2,3]) | {1,2,3} |
array_remove(anyarray, anyelement) | anyarray | Removes all elements that are equal to a specified value from an array. This function is supported only for one-dimensional arrays. | array_remove(ARRAY[1,2,3,2], 2) | {1,3} |
array_replace(anyarray, anyelement, anyelement) | anyarray | Replaces the array elements that are equal to a specified value with a new value. | array_replace(ARRAY[1,2,5,4], 5, 3) | {1,2,3,4} |
array_to_string(anyarray, text [, text]) | text | Concatenates array elements by using a specified delimiter and null string. | array_to_string(ARRAY[1, 2, 3, NULL, 5], ',', '*') | 1,2,3,*,5 |
array_upper(anyarray, int) | int | Returns the upper bound of a specified array dimension. | array_upper(ARRAY[1,8,3,7], 1) | 4 |
cardinality(anyarray) | int | Returns the total number of elements in an array. If a value of 0 is returned, the array is empty. | cardinality(ARRAY[[1,2],[3,4]]) | 4 |
string_to_array(text, text [, text]) | text[] | Splits a string to an array by using a specified delimiter and null string. | string_to_array('xx~^~yy~^~zz', '~^~', 'yy') | {xx,NULL,zz} |
unnest(anyarray) | setof anyelement | Expands an array to a set of rows. | unnest(ARRAY[1,2]) | |
unnest(anyarray, anyarray [, ...]) | setof anyelement, anyelement [, ...] | Expands multiple arrays of the same or different types to a set of rows. This function can be used only in the FROM clause. | unnest(ARRAY[1,2],ARRAY['foo','bar','baz']) | |
In the
string_to_arrayfunction, if the delimiter parameter is NULL, each character in the input string becomes a separate element in the resulting array. If the delimiter is an empty string, the entire input string is returned as a one-element array. Otherwise, the input string is split at each occurrence of the delimiter string.In the
string_to_arrayfunction, if the null-string parameter is omitted or NULL, none of the substrings of the input string is replaced by NULL. In thearray_to_stringfunction, if the null-string parameter is omitted or NULL, any null elements in the array are skipped and not represented in the output string.
Examples
Example 1: Use the
array_dimsfunction to query the number of dimensions of an array.SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';Sample result:
array_dims ------------ [1:2][1:2] (1 row)Example 2: Use the
array_upperandarray_lowerfunctions to query the upper and lower bounds of a specified array dimension.SELECT array_upper(schedule, 1), array_lower(schedule, 1) FROM sal_emp WHERE name = 'Carol';Sample result:
array_upper | array_lower -------------+------------- 2 | 1 (1 row)Example 3: Use the
array_lengthfunction to query the length of a specified array dimension.SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 'Carol';Sample result:
array_length -------------- 2 (1 row)Example 4: Use the
cardinalityfunction to query the total number of elements of all dimensions in an array.SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';Sample result:
cardinality ------------- 4 (1 row)
Modify arrays
AnalyticDB for PostgreSQL allows you to modify arrays by using the following methods:
Replace the value of an entire array.
UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}' WHERE name = 'Carol';Update a single element in an array.
UPDATE sal_emp SET pay_by_quarter[4] = 15000 WHERE name = 'Bill';Update a slice in an array.
UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}' WHERE name = 'Carol';Use the concatenation operator (
||) to construct array values. The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. Sample statement:SELECT ARRAY[1,2] || ARRAY[3,4];Sample result:
?column? ----------- {1,2,3,4} (1 row)The concatenation operator (
||) also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array. Sample statement:SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]];Sample result:
?column? --------------------- {{5,6},{1,2},{3,4}} (1 row)Use the
array_prepend,array_append, orarray_catfunction to construct array values. Thearray_prependandarray_appendfunctions support only one-dimensional arrays. Thearray_catfunction supports multidimensional arrays.The concatenation operator (||) is preferred instead of the preceding functions. The purpose of the functions is to implement the concatenation operator. Using the functions in user-defined creation functions may be necessary. Sample statements:
SELECT array_prepend(1, ARRAY[2,3]); array_prepend --------------- {1,2,3} (1 row) SELECT array_append(ARRAY[1,2], 3); array_append -------------- {1,2,3} (1 row) SELECT array_cat(ARRAY[1,2], ARRAY[3,4]); array_cat ----------- {1,2,3,4} (1 row) SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]); array_cat --------------------- {{1,2},{3,4},{5,6}} (1 row) SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]); array_cat --------------------- {{5,6},{1,2},{3,4}}
Search in arrays
To search for a value in an array, each value must be checked. You can use array operators to search for records that match specific conditions.
Example 1: Query the rows of the
pay_by_quartercolumn in which the array contains a value of 10000.SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);Example 2: Query the rows of the
pay_by_quartercolumn in which all values of the array are equal to 10000.SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);Example 3: Use the && operator to check whether the left operand overlaps with the right operand.
SELECT * FROM sal_emp WHERE pay_by_quarter && ARRAY[10000];
The following table describes the array operators that are supported in AnalyticDB for PostgreSQL.
Operator | Description | Example | Result |
= | Equal to | ARRAY[1.1,2.1,3.1]::int[] = ARRAY[1,2,3] | t |
<> | Not equal to | ARRAY[1,2,3] <> ARRAY[1,2,4] | t |
< | Less than | ARRAY[1,2,3] < ARRAY[1,2,4] | t |
> | Greater than | ARRAY[1,4,3] > ARRAY[1,2,4] | t |
<= | Less than or equal to | ARRAY[1,2,3] <= ARRAY[1,2,3] | t |
>= | Greater than or equal to | ARRAY[1,4,3] >= ARRAY[1,4,3] | t |
@> | Contains | ARRAY[1,4,3] @> ARRAY[3,1] | t |
<@ | Is contained by | ARRAY[2,7] <@ ARRAY[1,7,4,2,6] | t |
&& | Overlaps (with elements in common) | ARRAY[1,4,3] && ARRAY[2,1] | t |
|| | Concatenates array to array (one-dimensional) | ARRAY[1,2,3] || ARRAY[4,5,6] | {1,2,3,4,5,6} |
|| | Concatenates array to array (multidimensional) | ARRAY[1,2,3] || ARRAY[[4,5,6],[7,8,9]] | {{1,2,3},{4,5,6},{7,8,9}} |
|| | Concatenates element to array | 3 || ARRAY[4,5,6] | {3,4,5,6} |
|| | Concatenates array to element | ARRAY[4,5,6] || 7 | {4,5,6,7} |
Use indexes to accelerate array search
If an array contains a large number of elements, you can use the @> operator to search for a specific element. More elements in the array cause lower search performance. In this case, you can create indexes to accelerate array search.
GIN indexes are supported only for AnalyticDB for PostgreSQL instances in elastic storage mode.
The following example describes how to use indexes to accelerate array search:
-- Create a table that contains a column of the INT ARRAY type.
REATE TABLE multivalue (
id int,
values int[]
) DISTRIBUTED BY(id);
-- Write 1,000 rows of data to the table. The values column must contain 1,000,000 array elements in each row.
INSERT INTO multivalue SELECT g, ARRAY(SELECT generate_series(1, 1000000)) FROM generate_series(1, 1000) as g;The table does not contain indexes. Query the rows of the values column in which the array contains a value of 800000. In this case, sequential scanning is used and delivers lower performance.
EXPLAIN ANALYZE SELECT id FROM multivalue WHERE values @> ARRAY[800000];
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..1.00 rows=1 width=4) (actual time=711.216..3478.225 rows=1000 loops=1)
-> Seq Scan on multivalue (cost=0.00..1.00 rows=1 width=4) (actual time=10.420..2629.403 rows=508 loops=1)
Filter: ("values" @> '{800000}'::integer[])
Planning time: 0.080 ms
(slice0) Executor memory: 39K bytes.
(slice1) Executor memory: 12733K bytes avg x 2 workers, 12733K bytes max (seg0).
Memory used: 1048576kB
Optimizer: Postgres query optimizer
Execution time: 3483.157 ms
(9 rows)
Time: 3483.667 msCreate a GIN index on the values column and perform the same query. In this case, index scanning is used and the query performance significantly increases. The amount of time that is consumed deceases from 3483.667 ms to 9.731 ms.
CREATE INDEX idx_values on multivalue USING GIN (values);
EXPLAIN ANALYZE SELECT id FROM multivalue WHERE values @> ARRAY[800000];
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=12.04..14.10 rows=5 width=4) (actual time=3.727..4.127 rows=1000 loops=1)
-> Bitmap Heap Scan on multivalue (cost=12.04..14.10 rows=3 width=4) (actual time=1.826..1.872 rows=508 loops=1)
Recheck Cond: ("values" @> '{800000}'::integer[])
-> Bitmap Index Scan on idx_values (cost=0.00..12.04 rows=3 width=0) (actual time=1.462..1.462 rows=508 loops=1)
Index Cond: ("values" @> '{800000}'::integer[])
Planning time: 0.155 ms
(slice0) Executor memory: 49K bytes.
(slice1) Executor memory: 331K bytes avg x 2 workers, 331K bytes max (seg0). Work_mem: 9K bytes max.
Memory used: 1048576kB
Optimizer: Postgres query optimizer
Execution time: 9.138 ms
(11 rows)
Time: 9.731 msGIN indexes can improve array search performance. However, GIN indexes degrade write performance because indexes must be updated when data is written. In addition, indexes occupy additional storage space. You can select whether to use indexes based on the query performance, write performance, and storage space.
References
For more information about how to use the ARRAY type, see PostgreSQL documentation.