Data types supported by ApsaraDB for SelectDB - ApsaraDB for SelectDB

Selecting a proper data type for table fields improves database performance and storage efficiency, helps maintain database data integrity and readability, and enhances database security and consistency. This topic describes the data types supported by ApsaraDB for SelectDB. You can execute the SHOW DATA TYPES; statement to query the data types that are supported by SelectDB.

Numeric data types

Data type	Number of bytes	Description
BOOLEAN	1	A Boolean value. Valid values: 0: false. 1: true.
TINYINT	1	A signed integer. Valid values: -128 to 127.
SMALLINT	2	A signed integer. Valid values: -32768 to 32767.
INT	4	A signed integer. Valid values: -2147483648 to 2147483647.
BIGINT	8	A signed integer. Valid values: -9223372036854775808 to 9223372036854775807.
LARGEINT	16	A signed integer. Valid values: -2^127 + 1 to 2^127 - 1.
FLOAT	4	A floating-point number. Valid values: -3.4 × 10^38 to 3.4 × 10^38.
DOUBLE	8	A floating-point number. Valid values: -1.79 × 10^308 to 1.79 × 10^308.
DECIMAL	4/8/16	A high-precision fixed-point number. Syntax: `DECIMAL(M[,D])`. M: Indicates the precision, which represents the total number of valid digits. Valid values: 1 to 38. D: Indicates the scale, which represents the total number of digits to the right of the decimal point. Valid values: 0 to the precision. If the precision is greater than or equal to 0 and smaller than or equal to 9, 4 bytes are occupied. If the precision is greater than 9 and smaller than or equal to 18, 8 bytes are occupied. If the precision is greater than 16 and smaller than or equal to 38, 16 bytes are occupied.

Date and time data types

Data type

Number of bytes

Description

DATE

A date.

Valid values: '0000-01-01' to '9999-12-31'.

Default output format: yyyy-MM-dd.

DATETIME

A date and time.

Syntax: DATETIME([P]).

Note

Indicates the precision.
Default value: 0.
Valid values: 0 to 6. Up to six decimal places are supported, which indicates that the value can be accurate to microseconds.

Valid values: '0000-01-01 00:00:00[.000000]' to '9999-12-31 23:59:59[.999999]'.

Default output format: yyyy-MM-dd HH:mm:ss.SSSSSS.

String data types

Data type	Number of bytes	Description
CHAR	M	A fixed-length string. Syntax: `CHAR(M)`. Note M indicates the number of bytes that a fixed-length string contains. Valid values of M: 1 to 255.
VARCHAR	Indefinite	A variable-length string. Syntax: `VARCHAR(M)`. Note M indicates the number of bytes that a variable-length string contains. Variable-length strings are stored in the UTF-8 format. Each English character occupies 1 byte and each Chinese character occupies 3 bytes. Valid values of M: 1 to 65533.
STRING	Indefinite	A variable-length string. Default and minimum length: 1,048,576 bytes, which is equal to 1 MB. Maximum length: 2,147,483,643 bytes, which is equal to 2 GB. Important You can modify the `jsonb_type_length_soft_limit_bytes` parameter to adjust the length. The STRING data type applies only to the value columns and does not apply to key, partition, or bucket columns.

Semi-structured data types

Data type	Number of bytes	Description
ARRAY	Indefinite	An array that consists of elements of Type T. Syntax: ARRAY<T>. Note T indicates the basic data type. Valid values of T: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING` Usage notes: A MAP column cannot be used as a key column. MAP columns can be used only in tables in the Duplicate or Unique key model.
MAP	Indefinite	A map that consists of key-value pairs. Syntax: MAP<K,V>. Note The following types are supported for the key-value pairs: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING` Usage notes: A MAP column cannot be used as a key column. MAP columns can be used only in tables in the Duplicate or Unique key model.
STRUCT	Indefinite	A structure composed of multiple fields. The structure is a collection of multiple columns. Syntax: STRUCT<field_name:field_type, ... > Note field_name: the identifier of the field, which is unique. field_type: the type of the field. Usage notes: A MAP column cannot be used as a key column. MAP columns can be used only in tables in the Duplicate key model.
JSON	Indefinite	The JSON data type stores data in a binary format. Internal fields are accessed by using JSON functions. Default length: 1,048,576 bytes, which is equal to 1 MB. Maximum length: 2,147,483,643 bytes, which is equal to 2 GB. Note You can modify the `jsonb_type_length_soft_limit_bytes` parameter to adjust the length.
VARIANT	Indefinite	The dynamic variable data type, which is developed for semi-structured data such as JSON data. This type of data can be stored in any JSON data field. The system automatically splits a JSON data field into sub-columns for storage. This improves the storage efficiency and query and analysis performance. The VARIANT data type is suitable for complex nested data structures, which can change at any time. Usage notes: The VARIANT data type applies only to the value columns and does not apply to key, partition, or bucket columns. Important Make sure that your instance runs SelectDB Core `4.0.0` or later. If the instance runs a kernel version earlier than SelectDB Core 4.0.0, upgrade the kernel version. For more information, see Update the kernel version of an instance.

Aggregation data types

Data type	Number of bytes	Description
HLL	Indefinite	An approximate count of distinct elements. The performance of HyperLogLog (HLL) is higher than Count Distinct if the amount of data is large. Usage notes: In most cases, the error rate of HLL is about 1%. Occasionally, the error rate of HLL reaches up to 2%. When you create a table that contains an HLL column, you cannot set the HLL column as a key column and must set the aggregation type of the HLL column to HLL_UNION. You do not need to specify the length and default value. The length automatically varies based on the degree of data aggregation. HLL columns can be queried or used only by using the `hll_union_agg`, `hll_raw_agg`, `hll_cardinality`, and `hll_hash` functions.
BITMAP	Indefinite	An exact count of distinct elements, which is applicable to scenarios such as unique visitor (UV) count or audience selection. Usage notes: BITMAP columns can be used in tables in the Aggregate, Unique, or Duplicate key model and cannot be set as key columns. When you create a table that contains a BITMAP column in the Aggregate key model, you must set the aggregation type of the BITMAP column to BITMAP_UNION. You do not need to specify the length and default value. The length is specified within the system based on the degree of data aggregation. BITMAP columns can be queried or used only by using functions such as `bitmap_union_count`, `bitmap_union`, `bitmap_hash`, and `bitmap_hash64`.
QUANTILE_STATE	Indefinite	The data type that can be used to calculate the approximate value of quantiles. Different values with the same key are pre-aggregated when data is imported. If the number of aggregated values does not exceed 2,048, all data is recorded in detail. If the number of aggregated values is greater than 2,048, the TDigest algorithm is used for data aggregation or clustering and to save the centroid points after the data aggregation or clustering. Usage notes: When you create a table that contains a QUANTILE_STATE column, you cannot set the QUANTILE_STATE column as a key column and must set the aggregation type of the QUANTILE_STATE column to QUANTILE_UNION. You do not need to specify the length and default value. The length is specified within the system based on the degree of data aggregation. QUANTILE_STATE columns can be queried or used only by using functions such as `QUANTILE_PERCENT`, `QUANTILE_UNION`, and `TO_QUANTILE_STATE`.
AGG_STATE	Indefinite	The aggregate function. It is used to accelerate aggregations and can be used only with the `state, merge, and union` combinators. Usage notes: An AGG_STATE column cannot be used as a key column. When you create a table that contains an AGG_STATE column, you must declare the signature of the aggregate function. You do not need to specify a length or a default value. The actual size of the stored data depends on the function implementation.

IP address data types

Important

The following data types are supported only for SelectDB Core 4.0.0 or later. If your instance runs a kernel version earlier than SelectDB Core 4.0.0, upgrade the kernel version. For more information, see Update the kernel version of an instance.

Data type	Number of bytes	Description
IPv4	4 bytes	Stores an IPv4 address in a 4-byte binary string. The IPv4 address data type is used with ipv4_* functions.
IPv6	16 bytes	Stores an IPv6 address in a 16-byte binary string. The IPv6 address data type is used with ipv6_* functions.