ApsaraDB for SelectDB supports data types across six categories. Choosing the right type for each column improves storage efficiency, query performance, and data integrity. Run SHOW DATA TYPES; to view all supported types in your instance.
Type categories
| Category | Types |
|---|---|
| Numeric | BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL |
| Date | DATE, DATETIME |
| String | CHAR, VARCHAR, STRING |
| Semi-structured | ARRAY, MAP, STRUCT, JSON, VARIANT |
| Aggregation | HLL, BITMAP, QUANTILE_STATE, AGG_STATE |
| IP | IPv4, IPv6 |
Numeric types
| Type | Bytes | Description |
|---|---|---|
| BOOLEAN | 1 | Boolean value. 0 = false, 1 = true. |
| TINYINT | 1 | Signed integer. Range: [-128, 127]. |
| SMALLINT | 2 | Signed integer. Range: [-32768, 32767]. |
| INT | 4 | Signed integer. Range: [-2147483648, 2147483647]. |
| BIGINT | 8 | Signed integer. Range: [-9223372036854775808, 9223372036854775807]. |
| LARGEINT | 16 | Signed integer. Range: [-2^127 + 1, 2^127 - 1]. |
| FLOAT | 4 | Floating-point number. Range: [-3.4x10^38, 3.4x10^38]. |
| DOUBLE | 8 | Floating-point number. Range: [-1.79x10^308, 1.79x10^308]. |
| DECIMAL | 4/8/16 | High-precision fixed-point number. See DECIMAL for details. |
DECIMAL
Definition format: DECIMAL(M[,D])
| Parameter | Description | Range |
|---|---|---|
M | Total number of significant digits (precision) | [1, 38] |
D | Number of digits after the decimal point (scale) | [0, M] |
Storage size is determined by the precision value:
| Precision (M) | Storage |
|---|---|
| 0 <= M <= 9 | 4 bytes |
| 9 < M <= 18 | 8 bytes |
| 16 < M <= 38 | 16 bytes |
Date types
| Type | Bytes | Description |
|---|---|---|
| DATE | 16 | Date only. Range: 0000-01-01 to 9999-12-31. Output format: yyyy-MM-dd. |
| DATETIME | 16 | Date and time. Supports up to microsecond precision. See DATETIME for details. |
DATETIME
Definition format: DATETIME([P])
P controls the sub-second precision (number of decimal places):
Default:
0(second-level precision)Range: [0, 6], where
6gives microsecond precision
Value range: 0000-01-01 00:00:00[.000000] to 9999-12-31 23:59:59[.999999]
Output format: yyyy-MM-dd HH:mm:ss.SSSSSS
String types
| Type | Bytes | Description |
|---|---|---|
| CHAR | M | Fixed-length string. Format: CHAR(M). Storage: 1–255 bytes. M is the byte length. |
| VARCHAR | Variable | Variable-length string. Format: VARCHAR(M). Storage: 1–65533 bytes. Stored in UTF-8 encoding (1 byte per English character, 3 bytes per Chinese character). |
| STRING | Variable | Variable-length string with large capacity. Default minimum: 1,048,576 bytes (1 MB). Maximum: 2,147,483,643 bytes (2 GB). |
STRING usage notes:
Adjust the default storage size with the
string_type_length_soft_limit_bytesparameter.STRING can only be used for Value columns. It cannot be used as a Key column, or as a partitioning or bucketing column.
Semi-structured types
SelectDB provides five semi-structured types. Use the table below to select the right type for your data.
| Type | Use when |
|---|---|
| ARRAY | Your data is an ordered list of elements with the same type (for example, a list of scores or tags). |
| MAP | Your data is a set of key-value pairs with consistent key and value types. |
| STRUCT | Your data has a fixed set of named fields with known types (for example, an address or a product record). |
| JSON | Your JSON data has a predictable, stable structure and you need to query specific fields using JSON functions. |
| VARIANT | Your JSON data has a dynamic or evolving structure with varying keys or schemas. VARIANT auto-splits JSON fields into sub-columns for better storage efficiency and query performance. |
| Type | Bytes | Description |
|---|---|---|
| ARRAY | Variable | An ordered collection of elements of a single type T. Format: ARRAY<T>. |
| MAP | Variable | A collection of key-value pairs. Format: MAP<K,V>. |
| STRUCT | Variable | A record of named fields, each with its own type. Format: STRUCT<field_name:field_type, ...>. |
| JSON | Variable | Stored in binary JSON format. Default storage: 1,048,576 bytes (1 MB). Maximum: 2,147,483,643 bytes (2 GB). Access fields using JSON functions. |
| VARIANT | Variable | A dynamic type for semi-structured JSON data. Automatically splits JSON fields into sub-columns for improved storage and query performance. Requires kernel 4.0.0+. |
ARRAY
Supported element types for T:
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE,
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRINGLimitations:
Cannot be used as a Key column.
Only supported in tables with the Duplicate or Unique model.
MAP
Supported types for K and V:
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE,
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRINGLimitations:
Cannot be used as a Key column.
Only supported in tables with the Duplicate or Unique model.
STRUCT
field_name: Field identifier. Must be unique within the struct.field_type: Data type of the field.
Limitations:
Cannot be used as a Key column.
Only supported in tables with the Duplicate model.
JSON
Default storage size: 1,048,576 bytes (1 MB). Maximum: 2,147,483,643 bytes (2 GB).
Adjust the storage size with the
jsonb_type_length_soft_limit_bytesparameter.
VARIANT
Accepts any valid JSON input and automatically splits JSON fields into sub-columns for improved storage efficiency and query performance.
Useful for complex nested structures that may change over time.
For details, see VARIANT.
VARIANT is only supported in SelectDB kernel version 4.0.0 and later. If your instance runs an earlier version, upgrade the kernel version.
Aggregation types
Aggregation types pre-compute and store aggregate results during data ingestion. They require dedicated functions for queries — they cannot be queried directly with standard SQL expressions.
| Type | Description |
|---|---|
| HLL | Approximate deduplication using the HyperLogLog (HLL) algorithm. Typical error rate: ~1%, up to 2%. |
| BITMAP | Exact deduplication. Suited for UV counting and audience segmentation. |
| QUANTILE_STATE | Approximate quantile computation. Uses explicit storage for small datasets, and the TDigest algorithm for large datasets. |
| AGG_STATE | Stores intermediate aggregation state to accelerate aggregation computations. |
HLL
HLL provides approximate deduplication with a typical error rate of ~1% (up to 2%).
Limitations and requirements:
Cannot be defined as a Key column. The aggregation type must be
HLL_UNION.No length or default value is required — storage is managed internally based on data aggregation.
Query HLL columns using functions:
hll_union_agg,hll_raw_agg,hll_cardinality,hll_hash.
BITMAP
BITMAP provides exact deduplication and is commonly used for UV counting and audience segmentation.
Limitations and requirements:
Supported in Aggregate, Unique, and Duplicate tables, but only as non-Key columns. In Aggregate tables, use with the
BITMAP_UNIONaggregation type.No length or default value is required — storage is managed internally based on data aggregation.
Query BITMAP columns using functions:
bitmap_union_count,bitmap_union,bitmap_hash,bitmap_hash64.
QUANTILE_STATE
QUANTILE_STATE computes approximate quantiles. During ingestion, values with the same Key are pre-aggregated:
If the number of values <= 2048: all values are stored explicitly.
If the number of values > 2048: the TDigest algorithm clusters data and stores centroid points.
Limitations and requirements:
Cannot be defined as a Key column. The table's aggregation type must be
QUANTILE_UNION.No length or default value is required — storage is managed internally based on data aggregation.
Query QUANTILE_STATE columns using functions:
QUANTILE_PERCENT,QUANTILE_UNION,TO_QUANTILE_STATE.
AGG_STATE
AGG_STATE stores intermediate aggregation state to accelerate aggregation computations. Use it with the state, merge, and union function combinators.
Limitations and requirements:
Cannot be used as a Key column.
When creating a table, declare the signature of the aggregate function alongside the AGG_STATE column.
No length or default value is required — actual storage size depends on the function implementation.
IP types
IPv4 and IPv6 are only supported in SelectDB kernel version 4.0.0 and later. If your instance runs an earlier version, upgrade the kernel version.
| Type | Bytes | Description |
|---|---|---|
| IPv4 | 4 | Stores IPv4 addresses as 4-byte binary values. Use with ipv4_* family functions. |
| IPv6 | 16 | Stores IPv6 addresses as 16-byte binary values. Use with ipv6_* family functions. |