All Products
Search
Document Center

AnalyticDB:ARRAY type

Last Updated:Aug 30, 2023

AnalyticDB for PostgreSQL allows you to define the columns of a table as arrays. The following types of arrays are supported: basic type, user-defined type, enumeration type, and composite type. This topic describes the usage of ARRAY data in AnalyticDB for PostgreSQL, such as table creation, data writes, query, modification, operators, and functions.

Declaration of the ARRAY type

An ARRAY type is named by appending brackets ([ ]) to the data type name of array elements.

For example, execute the following statement to create a table that contains columns of basic-type arrays.

CREATE TABLE sal_emp (
    id              int,
    name            text,
    pay_by_quarter  integer[],
    schedule        text[][]
) DISTRIBUTED BY(id);

Parameters:

  • sal_emp: the name of the table.

  • id: the employee ID, which is of the INT type.

  • name: the employee name, which is of the TEXT type.

  • pay_by_quarter: the quarterly salary, which is of the one-dimensional INTEGER ARRAY type.

  • schedule: the weekly schedule, which is of the two-dimensional TEXT ARRAY type.

  • id: The table is distributed by id.

Write array values

Execute the INSERT statement to write array values to a table. Sample statements:

INSERT INTO sal_emp
    VALUES (1,
    'Bill',
    '{10000, 10000, 10000, 10000}',
    '{{"meeting", "lunch"}, {"training", "presentation"}}');

INSERT INTO sal_emp
    VALUES (2,
    'Carol',
    '{20000, 25000, 25000, 25000}',
    '{{"breakfast", "consulting"}, {"meeting", "lunch"}}');

Query the data that is written to the table.

SELECT * FROM sal_emp;

Sample result:

 id | name  |      pay_by_quarter       |                 schedule
----+-------+---------------------------+-------------------------------------------
  2 | Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}
  1 | Bill  | {10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
(2 rows)

Access arrays

AnalyticDB for PostgreSQL uses a 1-based numbering convention for arrays. An array with n elements starts with array[1] and ends with array[n]. You can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writing lower-bound:upper-bound for one or more array dimensions.

The following examples describe how to access an array element in the sal_emp table:

  • Example 1: Query the names of the employees whose salary changed in the second quarter.

    SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];

    Sample result:

     name
    -------
     Carol
    (1 row)
  • Example 2: Query the third-quarter salary of all employees.

    SELECT pay_by_quarter[3] FROM sal_emp;

    Sample result:

     pay_by_quarter
    ----------------
              10000
              25000
    (2 rows)
  • Example 3: Query the first item on Bill's schedule for the first two days of the week.

    SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';

    Sample result:

            schedule
    ------------------------
     {{meeting},{training}}
    (1 row)

Array functions

The following table describes the array functions that are supported in AnalyticDB for PostgreSQL.

Function

Data type of return values

Description

Example

Result

array_append(anyarray, anyelement)

anyarray

Appends elements to the end of an array.

array_append(ARRAY[1,2], 3)

{1,2,3}

array_cat(anyarray, anyarray)

anyarray

Concatenates two arrays.

array_cat(ARRAY[1,2,3], ARRAY[4,5])

{1,2,3,4,5}

array_ndims(anyarray)

int

Returns the number of dimensions of an array.

array_ndims(ARRAY[[1,2,3], [4,5,6]])

2

array_dims(anyarray)

text

Returns the text representation of array dimensions.

array_dims(ARRAY[[1,2,3], [4,5,6]])

[1:2][1:3]

array_fill(anyelement, int[], [, int[]])

anyarray

Returns an array that is initialized with designated values and dimensions. Optional lower bounds cannot be 1.

array_fill(7, ARRAY[3], ARRAY[2])

[2:4]={7,7,7}

array_length(anyarray, int)

int

Returns the length of a specified array dimension.

array_length(array[1,2,3], 1)

3

array_lower(anyarray, int)

int

Returns the lower bound of a specified array dimension.

array_lower('[0:2]={1,2,3}'::int[], 1)

0

array_prepend(anyelement, anyarray)

anyarray

Appends elements to the beginning of an array.

array_prepend(1, ARRAY[2,3])

{1,2,3}

array_remove(anyarray, anyelement)

anyarray

Removes all elements that are equal to a specified value from an array. This function is supported only for one-dimensional arrays.

array_remove(ARRAY[1,2,3,2], 2)

{1,3}

array_replace(anyarray, anyelement, anyelement)

anyarray

Replaces the array elements that are equal to a specified value with a new value.

array_replace(ARRAY[1,2,5,4], 5, 3)

{1,2,3,4}

array_to_string(anyarray, text [, text])

text

Concatenates array elements by using a specified delimiter and null string.

array_to_string(ARRAY[1, 2, 3, NULL, 5], ',', '*')

1,2,3,*,5

array_upper(anyarray, int)

int

Returns the upper bound of a specified array dimension.

array_upper(ARRAY[1,8,3,7], 1)

4

cardinality(anyarray)

int

Returns the total number of elements in an array. If a value of 0 is returned, the array is empty.

cardinality(ARRAY[[1,2],[3,4]])

4

string_to_array(text, text [, text])

text[]

Splits a string to an array by using a specified delimiter and null string.

string_to_array('xx~^~yy~^~zz', '~^~', 'yy')

{xx,NULL,zz}

unnest(anyarray)

setof anyelement

Expands an array to a set of rows.

unnest(ARRAY[1,2])

1
2

unnest(anyarray, anyarray [, ...])

setof anyelement, anyelement [, ...]

Expands multiple arrays of the same or different types to a set of rows. This function can be used only in the FROM clause.

unnest(ARRAY[1,2],ARRAY['foo','bar','baz'])

1 foo
2 bar
NULL baz
Note
  • In the string_to_array function, if the delimiter parameter is NULL, each character in the input string becomes a separate element in the resulting array. If the delimiter is an empty string, the entire input string is returned as a one-element array. Otherwise, the input string is split at each occurrence of the delimiter string.

  • In the string_to_array function, if the null-string parameter is omitted or NULL, none of the substrings of the input string is replaced by NULL. In the array_to_string function, if the null-string parameter is omitted or NULL, any null elements in the array are skipped and not represented in the output string.

Examples

  • Example 1: Use the array_dims function to query the number of dimensions of an array.

    SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';

    Sample result:

    array_dims
    ------------
     [1:2][1:2]
    (1 row)
  • Example 2: Use the array_upper and array_lower functions to query the upper and lower bounds of a specified array dimension.

    SELECT array_upper(schedule, 1), array_lower(schedule, 1) FROM sal_emp WHERE name = 'Carol';

    Sample result:

     array_upper | array_lower
    -------------+-------------
               2 |           1
    (1 row)
  • Example 3: Use the array_length function to query the length of a specified array dimension.

    SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 'Carol';

    Sample result:

     array_length
    --------------
                2
    (1 row)
  • Example 4: Use the cardinality function to query the total number of elements of all dimensions in an array.

    SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';

    Sample result:

     cardinality
    -------------
               4
    (1 row)

Modify arrays

AnalyticDB for PostgreSQL allows you to modify arrays by using the following methods:

  • Replace the value of an entire array.

    UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
        WHERE name = 'Carol';
  • Update a single element in an array.

    UPDATE sal_emp SET pay_by_quarter[4] = 15000
        WHERE name = 'Bill';
  • Update a slice in an array.

    UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
        WHERE name = 'Carol';
  • Use the concatenation operator (||) to construct array values. The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. Sample statement:

    SELECT ARRAY[1,2] || ARRAY[3,4];

    Sample result:

     ?column?
    -----------
     {1,2,3,4}
    (1 row)

    The concatenation operator (||) also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array. Sample statement:

    SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]];

    Sample result:

          ?column?
    ---------------------
     {{5,6},{1,2},{3,4}}
    (1 row)
  • Use the array_prepend, array_append, or array_cat function to construct array values. The array_prepend and array_append functions support only one-dimensional arrays. The array_cat function supports multidimensional arrays.

    The concatenation operator (||) is preferred instead of the preceding functions. The purpose of the functions is to implement the concatenation operator. Using the functions in user-defined creation functions may be necessary. Sample statements:

    SELECT array_prepend(1, ARRAY[2,3]);
     array_prepend
    ---------------
     {1,2,3}
    (1 row)
    
    SELECT array_append(ARRAY[1,2], 3);
     array_append
    --------------
     {1,2,3}
    (1 row)
    
    SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);
     array_cat
    -----------
     {1,2,3,4}
    (1 row)
    
    SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);
          array_cat
    ---------------------
     {{1,2},{3,4},{5,6}}
    (1 row)
    
    SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]);
          array_cat
    ---------------------
     {{5,6},{1,2},{3,4}}

Search in arrays

To search for a value in an array, each value must be checked. You can use array operators to search for records that match specific conditions.

  • Example 1: Query the rows of the pay_by_quarter column in which the array contains a value of 10000.

    SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
  • Example 2: Query the rows of the pay_by_quarter column in which all values of the array are equal to 10000.

    SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);
  • Example 3: Use the && operator to check whether the left operand overlaps with the right operand.

    SELECT * FROM sal_emp WHERE pay_by_quarter && ARRAY[10000];

The following table describes the array operators that are supported in AnalyticDB for PostgreSQL.

Operator

Description

Example

Result

=

Equal to

ARRAY[1.1,2.1,3.1]::int[] = ARRAY[1,2,3]

t

<>

Not equal to

ARRAY[1,2,3] <> ARRAY[1,2,4]

t

<

Less than

ARRAY[1,2,3] < ARRAY[1,2,4]

t

>

Greater than

ARRAY[1,4,3] > ARRAY[1,2,4]

t

<=

Less than or equal to

ARRAY[1,2,3] <= ARRAY[1,2,3]

t

>=

Greater than or equal to

ARRAY[1,4,3] >= ARRAY[1,4,3]

t

@>

Contains

ARRAY[1,4,3] @> ARRAY[3,1]

t

<@

Is contained by

ARRAY[2,7] <@ ARRAY[1,7,4,2,6]

t

&&

Overlaps (with elements in common)

ARRAY[1,4,3] && ARRAY[2,1]

t

||

Concatenates array to array (one-dimensional)

ARRAY[1,2,3] || ARRAY[4,5,6]

{1,2,3,4,5,6}

||

Concatenates array to array (multidimensional)

ARRAY[1,2,3] || ARRAY[[4,5,6],[7,8,9]]

{{1,2,3},{4,5,6},{7,8,9}}

||

Concatenates element to array

3 || ARRAY[4,5,6]

{3,4,5,6}

||

Concatenates array to element

ARRAY[4,5,6] || 7

{4,5,6,7}

Use indexes to accelerate array search

If an array contains a large number of elements, you can use the @> operator to search for a specific element. More elements in the array cause lower search performance. In this case, you can create indexes to accelerate array search.

Important

GIN indexes are supported only for AnalyticDB for PostgreSQL instances in elastic storage mode.

The following example describes how to use indexes to accelerate array search:

-- Create a table that contains a column of the INT ARRAY type. 
REATE TABLE multivalue (
    id      int,
    values  int[]
) DISTRIBUTED BY(id);

-- Write 1,000 rows of data to the table. The values column must contain 1,000,000 array elements in each row. 
INSERT INTO multivalue SELECT g, ARRAY(SELECT generate_series(1, 1000000)) FROM generate_series(1, 1000) as g;

The table does not contain indexes. Query the rows of the values column in which the array contains a value of 800000. In this case, sequential scanning is used and delivers lower performance.

EXPLAIN ANALYZE SELECT id FROM multivalue WHERE values @> ARRAY[800000];
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..1.00 rows=1 width=4) (actual time=711.216..3478.225 rows=1000 loops=1)
   ->  Seq Scan on multivalue  (cost=0.00..1.00 rows=1 width=4) (actual time=10.420..2629.403 rows=508 loops=1)
         Filter: ("values" @> '{800000}'::integer[])
 Planning time: 0.080 ms
   (slice0)    Executor memory: 39K bytes.
   (slice1)    Executor memory: 12733K bytes avg x 2 workers, 12733K bytes max (seg0).
 Memory used:  1048576kB
 Optimizer: Postgres query optimizer
 Execution time: 3483.157 ms
(9 rows)

Time: 3483.667 ms

Create a GIN index on the values column and perform the same query. In this case, index scanning is used and the query performance significantly increases. The amount of time that is consumed deceases from 3483.667 ms to 9.731 ms.

CREATE INDEX idx_values on multivalue USING GIN (values);

EXPLAIN ANALYZE SELECT id FROM multivalue WHERE values @> ARRAY[800000];
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=12.04..14.10 rows=5 width=4) (actual time=3.727..4.127 rows=1000 loops=1)
   ->  Bitmap Heap Scan on multivalue  (cost=12.04..14.10 rows=3 width=4) (actual time=1.826..1.872 rows=508 loops=1)
         Recheck Cond: ("values" @> '{800000}'::integer[])
         ->  Bitmap Index Scan on idx_values  (cost=0.00..12.04 rows=3 width=0) (actual time=1.462..1.462 rows=508 loops=1)
               Index Cond: ("values" @> '{800000}'::integer[])
 Planning time: 0.155 ms
   (slice0)    Executor memory: 49K bytes.
   (slice1)    Executor memory: 331K bytes avg x 2 workers, 331K bytes max (seg0).  Work_mem: 9K bytes max.
 Memory used:  1048576kB
 Optimizer: Postgres query optimizer
 Execution time: 9.138 ms
(11 rows)

Time: 9.731 ms

GIN indexes can improve array search performance. However, GIN indexes degrade write performance because indexes must be updated when data is written. In addition, indexes occupy additional storage space. You can select whether to use indexes based on the query performance, write performance, and storage space.

References

For more information about how to use the ARRAY type, see PostgreSQL documentation.