ARRAY type - AnalyticDB - Alibaba Cloud Documentation Center

AnalyticDB for PostgreSQL allows you to define the columns of a table as arrays. The following types of arrays are supported: basic type, user-defined type, enumeration type, and composite type. This topic describes the usage of ARRAY data in AnalyticDB for PostgreSQL, such as table creation, data writes, query, modification, operators, and functions.

Declaration of the ARRAY type

An ARRAY type is named by appending brackets ([ ]) to the data type name of array elements.

For example, execute the following statement to create a table that contains columns of basic-type arrays.

CREATE TABLE sal_emp (
    id              int,
    name            text,
    pay_by_quarter  integer[],
    schedule        text[][]
) DISTRIBUTED BY(id);

Parameters:

sal_emp: the name of the table.
id: the employee ID, which is of the INT type.
name: the employee name, which is of the TEXT type.
pay_by_quarter: the quarterly salary, which is of the one-dimensional INTEGER ARRAY type.
schedule: the weekly schedule, which is of the two-dimensional TEXT ARRAY type.
id: The table is distributed by id.

Write array values

Execute the INSERT statement to write array values to a table. Sample statements:

INSERT INTO sal_emp
    VALUES (1,
    'Bill',
    '{10000, 10000, 10000, 10000}',
    '{{"meeting", "lunch"}, {"training", "presentation"}}');

INSERT INTO sal_emp
    VALUES (2,
    'Carol',
    '{20000, 25000, 25000, 25000}',
    '{{"breakfast", "consulting"}, {"meeting", "lunch"}}');

Query the data that is written to the table.

SELECT * FROM sal_emp;

Sample result:

 id | name  |      pay_by_quarter       |                 schedule
----+-------+---------------------------+-------------------------------------------
  2 | Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}
  1 | Bill  | {10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
(2 rows)

Access arrays

AnalyticDB for PostgreSQL uses a 1-based numbering convention for arrays. An array with n elements starts with array[1] and ends with array[n]. You can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writing lower-bound:upper-bound for one or more array dimensions.

The following examples describe how to access an array element in the sal_emp table:

Example 1: Query the names of the employees whose salary changed in the second quarter.
```
SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];
```
Sample result:
```
 name
-------
 Carol
(1 row)
```

Example 2: Query the third-quarter salary of all employees.

SELECT pay_by_quarter[3] FROM sal_emp;

Sample result:

 pay_by_quarter
----------------
          10000
          25000
(2 rows)

Example 3: Query the first item on Bill's schedule for the first two days of the week.

SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';

Sample result:

        schedule
------------------------
 {{meeting},{training}}
(1 row)

Array functions

The following table describes the array functions that are supported in AnalyticDB for PostgreSQL.

Function	Data type of return values	Description	Example	Result
array_append(anyarray, anyelement)	anyarray	Appends elements to the end of an array.	array_append(ARRAY[1,2], 3)	{1,2,3}
array_cat(anyarray, anyarray)	anyarray	Concatenates two arrays.	array_cat(ARRAY[1,2,3], ARRAY[4,5])	{1,2,3,4,5}
array_ndims(anyarray)	int	Returns the number of dimensions of an array.	array_ndims(ARRAY[[1,2,3], [4,5,6]])	2
array_dims(anyarray)	text	Returns the text representation of array dimensions.	array_dims(ARRAY[[1,2,3], [4,5,6]])	[1:2][1:3]
array_fill(anyelement, int[], [, int[]])	anyarray	Returns an array that is initialized with designated values and dimensions. Optional lower bounds cannot be 1.	array_fill(7, ARRAY[3], ARRAY[2])	[2:4]={7,7,7}
array_length(anyarray, int)	int	Returns the length of a specified array dimension.	array_length(array[1,2,3], 1)	3
array_lower(anyarray, int)	int	Returns the lower bound of a specified array dimension.	array_lower('[0:2]={1,2,3}'::int[], 1)	0
array_prepend(anyelement, anyarray)	anyarray	Appends elements to the beginning of an array.	array_prepend(1, ARRAY[2,3])	{1,2,3}
array_remove(anyarray, anyelement)	anyarray	Removes all elements that are equal to a specified value from an array. This function is supported only for one-dimensional arrays.	array_remove(ARRAY[1,2,3,2], 2)	{1,3}
array_replace(anyarray, anyelement, anyelement)	anyarray	Replaces the array elements that are equal to a specified value with a new value.	array_replace(ARRAY[1,2,5,4], 5, 3)	{1,2,3,4}
array_to_string(anyarray, text [, text])	text	Concatenates array elements by using a specified delimiter and null string.	array_to_string(ARRAY[1, 2, 3, NULL, 5], ',', '*')	1,2,3,*,5
array_upper(anyarray, int)	int	Returns the upper bound of a specified array dimension.	array_upper(ARRAY[1,8,3,7], 1)	4
cardinality(anyarray)	int	Returns the total number of elements in an array. If a value of 0 is returned, the array is empty.	cardinality(ARRAY[[1,2],[3,4]])	4
string_to_array(text, text [, text])	text[]	Splits a string to an array by using a specified delimiter and null string.	string_to_array('xx~^~yy~^~zz', '~^~', 'yy')	{xx,NULL,zz}
unnest(anyarray)	setof anyelement	Expands an array to a set of rows.	unnest(ARRAY[1,2])	`1 2`
unnest(anyarray, anyarray [, ...])	setof anyelement, anyelement [, ...]	Expands multiple arrays of the same or different types to a set of rows. This function can be used only in the FROM clause.	unnest(ARRAY[1,2],ARRAY['foo','bar','baz'])	`1 foo 2 bar NULL baz`

Note

In the string_to_array function, if the delimiter parameter is NULL, each character in the input string becomes a separate element in the resulting array. If the delimiter is an empty string, the entire input string is returned as a one-element array. Otherwise, the input string is split at each occurrence of the delimiter string.
In the string_to_array function, if the null-string parameter is omitted or NULL, none of the substrings of the input string is replaced by NULL. In the array_to_string function, if the null-string parameter is omitted or NULL, any null elements in the array are skipped and not represented in the output string.

Examples

Example 1: Use the array_dims function to query the number of dimensions of an array.

SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';

Sample result:

array_dims
------------
 [1:2][1:2]
(1 row)

Example 2: Use the array_upper and array_lower functions to query the upper and lower bounds of a specified array dimension.

SELECT array_upper(schedule, 1), array_lower(schedule, 1) FROM sal_emp WHERE name = 'Carol';

Sample result:

 array_upper | array_lower
-------------+-------------
           2 |           1
(1 row)

Example 3: Use the array_length function to query the length of a specified array dimension.

SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 'Carol';

Sample result:

 array_length
--------------
            2
(1 row)

Example 4: Use the cardinality function to query the total number of elements of all dimensions in an array.
```
SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';
```
Sample result:
```
 cardinality
-------------
           4
(1 row)
```

Modify arrays

AnalyticDB for PostgreSQL allows you to modify arrays by using the following methods:

Replace the value of an entire array.

UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
    WHERE name = 'Carol';

Update a single element in an array.

UPDATE sal_emp SET pay_by_quarter[4] = 15000
    WHERE name = 'Bill';

Update a slice in an array.

UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
    WHERE name = 'Carol';

Use the concatenation operator (||) to construct array values. The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. Sample statement:
```
SELECT ARRAY[1,2] || ARRAY[3,4];
```
Sample result:
```
 ?column?
-----------
 {1,2,3,4}
(1 row)
```
The concatenation operator (||) also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array. Sample statement:
```
SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]];
```
Sample result:
```
      ?column?
---------------------
 {{5,6},{1,2},{3,4}}
(1 row)
```

Use the array_prepend, array_append, or array_cat function to construct array values. The array_prepend and array_append functions support only one-dimensional arrays. The array_cat function supports multidimensional arrays.

The concatenation operator (||) is preferred instead of the preceding functions. The purpose of the functions is to implement the concatenation operator. Using the functions in user-defined creation functions may be necessary. Sample statements:

SELECT array_prepend(1, ARRAY[2,3]);
 array_prepend
---------------
 {1,2,3}
(1 row)

SELECT array_append(ARRAY[1,2], 3);
 array_append
--------------
 {1,2,3}
(1 row)

SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);
 array_cat
-----------
 {1,2,3,4}
(1 row)

SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);
      array_cat
---------------------
 {{1,2},{3,4},{5,6}}
(1 row)

SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]);
      array_cat
---------------------
 {{5,6},{1,2},{3,4}}

Search in arrays

To search for a value in an array, each value must be checked. You can use array operators to search for records that match specific conditions.

Example 1: Query the rows of the pay_by_quarter column in which the array contains a value of 10000.
```
SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
```
Example 2: Query the rows of the pay_by_quarter column in which all values of the array are equal to 10000.
```
SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);
```
Example 3: Use the && operator to check whether the left operand overlaps with the right operand.
```
SELECT * FROM sal_emp WHERE pay_by_quarter && ARRAY[10000];
```

The following table describes the array operators that are supported in AnalyticDB for PostgreSQL.

Operator	Description	Example	Result
=	Equal to	ARRAY[1.1,2.1,3.1]::int[] = ARRAY[1,2,3]	t
<>	Not equal to	ARRAY[1,2,3] <> ARRAY[1,2,4]	t
<	Less than	ARRAY[1,2,3] < ARRAY[1,2,4]	t
>	Greater than	ARRAY[1,4,3] > ARRAY[1,2,4]	t
<=	Less than or equal to	ARRAY[1,2,3] <= ARRAY[1,2,3]	t
>=	Greater than or equal to	ARRAY[1,4,3] >= ARRAY[1,4,3]	t
@>	Contains	ARRAY[1,4,3] @> ARRAY[3,1]	t
<@	Is contained by	ARRAY[2,7] <@ ARRAY[1,7,4,2,6]	t
&&	Overlaps (with elements in common)	ARRAY[1,4,3] && ARRAY[2,1]	t
\|\|	Concatenates array to array (one-dimensional)	ARRAY[1,2,3] \|\| ARRAY[4,5,6]	{1,2,3,4,5,6}
\|\|	Concatenates array to array (multidimensional)	ARRAY[1,2,3] \|\| ARRAY[[4,5,6],[7,8,9]]	{{1,2,3},{4,5,6},{7,8,9}}
\|\|	Concatenates element to array	3 \|\| ARRAY[4,5,6]	{3,4,5,6}
\|\|	Concatenates array to element	ARRAY[4,5,6] \|\| 7	{4,5,6,7}

Use indexes to accelerate array search

If an array contains a large number of elements, you can use the @> operator to search for a specific element. More elements in the array cause lower search performance. In this case, you can create indexes to accelerate array search.

Important

GIN indexes are supported only for AnalyticDB for PostgreSQL instances in elastic storage mode.

The following example describes how to use indexes to accelerate array search:

-- Create a table that contains a column of the INT ARRAY type. 
REATE TABLE multivalue (
    id      int,
    values  int[]
) DISTRIBUTED BY(id);

-- Write 1,000 rows of data to the table. The values column must contain 1,000,000 array elements in each row. 
INSERT INTO multivalue SELECT g, ARRAY(SELECT generate_series(1, 1000000)) FROM generate_series(1, 1000) as g;

The table does not contain indexes. Query the rows of the values column in which the array contains a value of 800000. In this case, sequential scanning is used and delivers lower performance.

EXPLAIN ANALYZE SELECT id FROM multivalue WHERE values @> ARRAY[800000];
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..1.00 rows=1 width=4) (actual time=711.216..3478.225 rows=1000 loops=1)
   ->  Seq Scan on multivalue  (cost=0.00..1.00 rows=1 width=4) (actual time=10.420..2629.403 rows=508 loops=1)
         Filter: ("values" @> '{800000}'::integer[])
 Planning time: 0.080 ms
   (slice0)    Executor memory: 39K bytes.
   (slice1)    Executor memory: 12733K bytes avg x 2 workers, 12733K bytes max (seg0).
 Memory used:  1048576kB
 Optimizer: Postgres query optimizer
 Execution time: 3483.157 ms
(9 rows)

Time: 3483.667 ms

Create a GIN index on the values column and perform the same query. In this case, index scanning is used and the query performance significantly increases. The amount of time that is consumed deceases from 3483.667 ms to 9.731 ms.

CREATE INDEX idx_values on multivalue USING GIN (values);

EXPLAIN ANALYZE SELECT id FROM multivalue WHERE values @> ARRAY[800000];
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=12.04..14.10 rows=5 width=4) (actual time=3.727..4.127 rows=1000 loops=1)
   ->  Bitmap Heap Scan on multivalue  (cost=12.04..14.10 rows=3 width=4) (actual time=1.826..1.872 rows=508 loops=1)
         Recheck Cond: ("values" @> '{800000}'::integer[])
         ->  Bitmap Index Scan on idx_values  (cost=0.00..12.04 rows=3 width=0) (actual time=1.462..1.462 rows=508 loops=1)
               Index Cond: ("values" @> '{800000}'::integer[])
 Planning time: 0.155 ms
   (slice0)    Executor memory: 49K bytes.
   (slice1)    Executor memory: 331K bytes avg x 2 workers, 331K bytes max (seg0).  Work_mem: 9K bytes max.
 Memory used:  1048576kB
 Optimizer: Postgres query optimizer
 Execution time: 9.138 ms
(11 rows)

Time: 9.731 ms

GIN indexes can improve array search performance. However, GIN indexes degrade write performance because indexes must be updated when data is written. In addition, indexes occupy additional storage space. You can select whether to use indexes based on the query performance, write performance, and storage space.

References

For more information about how to use the ARRAY type, see PostgreSQL documentation.