All Products
Search
Document Center

ApsaraDB for SelectDB:Variable management

Last Updated:Jan 27, 2025

This topic describes how to manage and use variables in ApsaraDB for SelectDB. It also describes how to configure time zone variables. This topic helps you better use the variables of ApsaraDB for SelectDB to meet your business requirements.

Variables

The variables of ApsaraDB for SelectDB are divided into those specific to ApsaraDB for SelectDB and those compatible with MySQL client protocols. For more information, see Appendix.

Query and configure variables

Query variables

You can execute the SHOW VARIABLES [LIKE 'xxx']; statement to query all or specified variables. Sample code:

SHOW VARIABLES;
SHOW VARIABLES LIKE '%time_zone%';

Configure variables

You can configure global variables, which take effect globally. You can also configure session variables, which take effect only for the current session.

Session variables: If you want to configure a variable that takes effect only for the current session, you can execute the SET var_name=xxx; statement. Sample code:

SET exec_mem_limit = 137438953472;
SET forward_to_master = true;
SET time_zone = "Asia/Shanghai";

Global variables: If you want to configure a variable that takes effect globally, you can execute the SET GLOBAL var_name=xxx; statement. After a global variable takes effect, the configured value is used in the current and subsequent sessions. Sample code:

SET GLOBAL exec_mem_limit = 137438953472
Important

Only the admin user can configure global variables.

The following variables can be configured to take effect globally or only for the current session:

  • time_zone

  • wait_timeout

  • sql_mode

  • enable_profile

  • query_timeout

  • insert_timeout

  • exec_mem_limit

  • batch_size

  • allow_partition_column_nullable

  • insert_visible_timeout_ms

  • enable_fold_constant_by_be

The following variables can be configured to take effect only globally:

  • default_rowset_type

  • default_password_lifetime

  • password_history

  • validate_password_policy

In addition, you can use constant expressions to configure variables. Sample code:

SET exec_mem_limit = 10 * 1024 * 1024 * 1024;
SET forward_to_master = concat('tr', 'u', 'e');

Configure variables in query statements

In some scenarios, you may need to configure variables for specific queries. You can use the SET_VAR hint to configure session variables in a query. The configured variables take effect only for a single statement. Sample code:

SELECT /*+ SET_VAR(exec_mem_limit = 8589934592) */ name FROM people ORDER BY name;
SELECT /*+ SET_VAR(query_timeout = 1, enable_partition_cache=true) */ sleep(3);
Note

Comments must start with /*+ and follow the SELECT statement.

Time zone variables

As a critical type of variable, time zone variables affect many functions and variable values in ApsaraDB for SelectDB. To meet different business requirements in various regions, ApsaraDB for SelectDB supports custom time zone settings.

ApsaraDB for SelectDB provides the following two time zone variables:

  • system_time_zone: the system time zone of a cluster. When a server starts, this variable is automatically configured based on the time zone of the server. The configured time zone cannot be modified.

  • time_zone: the current time zone of a cluster. This time zone can be modified.

Query and configure time zone variables

Query time zone variables. Sample code:

SHOW variables LIKE '%time_zone%'

Configure the time zone of a session. Sample code:

SET [GLOBAL] time_zone = 'Asia/Shanghai'
Note

If the GLOBAL keyword is used, the configured time zone also takes effect for all subsequent sessions of the current ApsaraDB for SelectDB cluster.

Time zone formats

You can specify the values of time zone variables in multiple formats. ApsaraDB for SelectDB supports the following standard formats:

  • Standard named formats, such as Asia/Shanghai and America/Los_Angeles.

  • Standard offset formats, such as +02:30 and -10:00.

  • Abbreviated formats. The following formats are supported:

    • Greenwich Mean Time (GMT) and Coordinated Universal Time (UTC), which are equivalent to the +00:00 time zone.

    • Central Standard Time (CST), which is equivalent to the Asia/Shanghai time zone.

  • Single letter Z, which represents the Zulu time zone and is equivalent to the +00:00 time zone.

Note

Due to the implementation of different data import methods, ApsaraDB for SelectDB may support other time zone formats if specific data import methods are used. We recommend that you do not rely on other time zone formats in the production environment. ApsaraDB for SelectDB may change its processing methods of other time zone formats at any time.

Impacts of time zone settings

Functions

Time zone settings affect the values returned by time functions such as NOW() and CURTIME(), and also affect time values in the SHOW LOAD and SHOW BACKENDS statements. However, time zone settings do not affect the less than values of time-type partition key columns in the CREATE TABLE statement, and do not affect the display of values stored as the DATE or DATETIME type.

The following functions are affected by time zone settings:

  • FROM_UNIXTIME: returns the date and time in the specified time zone based on the specified UTC timestamp. If the CST time zone is used, FROM_UNIXTIME(0) returns 1970-01-01 08:00:00.

  • UNIX_TIMESTAMP: returns the UTC timestamp based on the specified date and time in the specified time zone. If the CST time zone is used, UNIX_TIMESTAMP('1970-01-01 08:00:00') returns 0.

  • CURTIME: returns the current time in the specified time zone.

  • NOW: returns the current date and time in the specified time zone.

  • CONVERT_TZ: converts the date and time in the specified time zone to those in another time zone.

Values of time types

When you import data of the DATE, DATEV2, DATETIME, or DATETIMEV2 type, ApsaraDB for SelectDB supports time zone conversion.

If the data contains a time zone, such as 2020-12-12 12:12:12+08:00, and the current time zone of the ApsaraDB for SelectDB cluster is +00:00, the imported data is converted to 2020-12-12 04:12:12.

If the data does not contain a time zone, such as 2020-12-12 12:12:12, the time is considered an absolute time and the imported data is not converted.

Daylight saving time

Daylight saving time is essentially the actual time offset of a named time zone, which changes on specific dates.

For example, the America/Los_Angeles time zone has a daylight saving time adjustment that lasts approximately from March to November each year. In the America/Los_Angeles time zone, the time offset changes from -08:00 to -07:00 when daylight saving time starts in March, and changes from -07:00 to -08:00 when daylight saving time ends in November.

If you do not use daylight saving time, you must set the time_zone variable to -08:00 instead of America/Los_Angeles.

Data sources

The time zone data contains the name of the time zone, the time offset, and the change of daylight saving time. On the machine where a backend (BE) node is deployed, the time zone data can be found in the following data sources:

  1. The directory returned by the TZDIR command.

  2. The /usr/share/zoneinfo directory.

  3. The zoneinfo directory generated in the BE deployment directory of ApsaraDB for SelectDB.

ApsaraDB for SelectDB searches the preceding data sources in sequence for the time zone data. If the time zone data is found, it is used. If no time zone data is found, the BE node of ApsaraDB for SelectDB fails to start.

Best practices

Time zone-sensitive data

ApsaraDB for SelectDB is compatible with the data imported from different time zones. Time zone settings are affected by the following three factors:

  • The current time zone of the cluster, which is configured by the time_zone session variable.

  • The time zone configured by the timezone header during data import by using a method such as Stream Load or Broker Load.

  • The literal time zone of the data, such as +08:00 in 2023-12-12 08:00:00+08:00.

In ApsaraDB for SelectDB, data of time types such as DATETIME does not contain a time zone. Data of time types in an ApsaraDB for SelectDB cluster can be divided into two categories: absolute time and time in a specific time zone.

  • Absolute time: The time format of the data does not contain a time zone, and no time zone is specified when the data is imported. Example: 2023-12-12 08:00:00. The scenario associated with the data is independent of a time zone. When data of this category is imported, it is not affected by time zone settings. If you run a function such as UNIX_TIMESTAMP based on data of this category, no time zone information can be obtained. In addition, if you modify the time_zone session variable, data of this category is not affected.

  • Time in a specific time zone: The time format of the data contains a time zone. When data of this category is imported, it is affected by time zone settings. Example: "2020-12-12 12:12:12+02:00". During data import, ApsaraDB for SelectDB processes data of this category based on time zone settings.

    • If the data is imported by using the INSERT INTO statement, the data is affected by the time_zone session variable.

    • If the data is imported by using a method such as Stream Load, the data is affected by the timezone header of Stream Load. If the setting of the timezone header conflicts with that of the time_zone session variable, the setting of the timezone header takes precedence.

    The specific time zone refers to the current time zone of the cluster, which is configured by the time_zone session variable. This variable is configured before the data is imported and is no longer modified. When data of this category is imported, ApsaraDB for SelectDB converts the time in the literal time zone of the data to be imported to the time in the time zone configured by the time_zone session variable.

In the best practices of time in a specific time zone, we recommend that you do not modify the time_zone session variable. The following sample code provides an example:

mysql> SELECT@@time_zone;
+----------------+
| @@time_zone    |
+----------------+
| Asia/Hong_Kong |
+----------------+
1 row in set (0.12 sec)
--- The literal time zone of the data to be imported is +02:00.
mysql> INSERT INTO dtv23 VALUES('2020-12-12 12:12:12+02:00'); 
Query OK, 1 row affected (0.27 sec)

mysql> SELECT* FROM dtv23;
--- The imported data is converted to the time in the Asia/Hong_Kong time zone, which is the current time zone of the ApsaraDB for SelectDB cluster. This time zone is also used for subsequent imports and queries.
+-------------------------+
| k0                      |
+-------------------------+
| 2020-12-12 18:12:12.000 | 
+-------------------------+
1 row in set (0.19 sec)

mysql> SET time_zone = 'America/Los_Angeles';
Query OK, 0 rows affected (0.15 sec)

mysql> SELECT* FROM dtv23;
--- If you modify the time_zone session variable, the time value does not change accordingly but its meaning varies based on the configured time zone. This causes confusion in subsequent queries. 
+-------------------------+
| k0                      |
+-------------------------+
| 2020-12-12 18:12:12.000 | 
+-------------------------+
1 row in set (0.18 sec)

mysql> insert into dtv23 values('2020-12-12 12:12:12+02:00');
Query OK, 1 row affected (0.17 sec)

mysql> SELECT* FROM dtv23;
--- The query results are incorrect. 
+-------------------------+
| k0                      |
+-------------------------+
| 2020-12-12 02:12:12.000 |
| 2020-12-12 18:12:12.000 |
+-------------------------+ 
2 rows in set (0.19 sec)

In summary, take note of the following items for the best practices of time zone settings:

  • Before you import data to a cluster, confirm the current time zone of the cluster and configure the time_zone session variable based on your business requirements. Do not modify this variable after you import data.

  • If you use a data import method, make sure that the setting of the timezone header is the same as that of the time_zone session variable.

  • For absolute time, import the data without a time zone suffix. For time in a specific time zone, import the data with a specific time zone suffix. ApsaraDB for SelectDB converts the time in the specific time zone to the time in the time zone configured by the time_zone session variable.

Daylight saving time

The start time and end time of daylight saving time are based on the data source of the current time zone and may not be exactly the same as those officially confirmed by a time zone location for the current year. The time zone data is maintained by the Internet Corporation for Assigned Names and Numbers (ICANN). If you need to use the start time and end time of daylight saving time that are officially confirmed by a time zone location for the current year, make sure that the data source selected by ApsaraDB for SelectDB contains the latest time zone data published by ICANN. For more information about how to download the latest time zone data, see References.

Appendix

Supported variables

BOOLEAN-type variables

This type of variable is used to enable or disable a feature. Valid values are true and false.

Variable

Description

delete_without_partition

Specifies whether to delete data without specifying a partition when you execute the DELETE statement and automatically apply the DELETE operation to all partitions. Default value: false.

Important

If the DELETE operation is automatically applied to all partitions, the duration of the DELETE statement is extended and a large number of tasks are triggered. This extends the duration of the DELETE operation. We recommend that you do not enable this feature.

disable_colocate_join

Specifies whether to disable the Colocation Join feature. Default value: false.

A value of false specifies that the feature is enabled. A value of true specifies that the feature is disabled. If this feature is disabled, the execution plan does not attempt to perform a Colocation Join operation.

enable_bucket_shuffle_join

Specifies whether to enable the Bucket Shuffle Join feature. Default value: true.

A value of true specifies that the feature is enabled. A value of false specifies that the feature is disabled. If this feature is disabled, the execution plan does not attempt to perform a Bucket Shuffle Join operation.

disable_streaming_preaggregations

Specifies whether to disable streaming pre-aggregation. Default value: false.

A value of false specifies that the feature is enabled. This variable cannot be modified.

enable_insert_strict

Specifies whether to enable the strict mode when data is imported by using the INSERT statement. Default value: false.

A value of false specifies that the strict mode is disabled.

enable_spilling

Specifies whether to enable sorting for large amounts of data. Default value: false.

A value of false specifies that the feature is disabled. This feature is enabled if you set the enable_spilling variable to true and do not specify a LIMIT condition for the ORDER BY clause.

After this feature is enabled, the temporary data of a query is stored in the doris-scratch/ directory of the BE data directory. After the query is complete, the temporary data is cleared.

This feature is used to sort large amounts of data by using limited memory.

Important

This feature is experimental. We recommend that you do not enable this feature in the production environment.

forward_to_master

Specifies whether to forward some SHOW statements to the master frontend (FE) node for execution. Default value: true.

A value of true specifies that the feature is enabled. An ApsaraDB for SelectDB instance has multiple FE nodes, one of which is the master FE node. You can connect to one of the FE nodes to perform full-featured operations. However, after you execute some SHOW statements, detailed information can be obtained only from the master FE node. For example, if you do not forward the SHOW BACKENDS; statement to the master FE node, you can obtain only some basic information such as whether the BE node is alive. If you forward the statement to the master FE node, you can obtain more detailed information such as the startup time and last heartbeat time of the BE node.

If you forward the following statements to the master FE node, more detailed information can be returned:

  • SHOW FRONTENDS;

    Returns the last heartbeat information.

  • SHOW BACKENDS;

    Returns the startup time, last heartbeat information, and disk capacity information.

  • SHOW BROKER;

    Returns the startup time and last heartbeat information.

  • SHOW TABLET;, SHOW REPLICA DISTRIBUTION;, or SHOW REPLICA STATUS;

    Returns the tablet information stored in the metadata of the master FE node. In most cases, the same tablet information is stored in the metadata of different FE nodes. If an error occurs, you can use this method to compare the differences between the tablet information stored in the metadata of the current FE node and that in the metadata of the master FE node.

  • SHOW PROC;

    Returns the information about the proc system stored in the metadata of the master FE node. You can use the obtained information for metadata comparison.

enable_profile

Specifies whether to display the profile of a query. Default value: false.

A value of false specifies that the feature is disabled. By default, the BE sends the profile of a query to the FE to display errors only if an error occurs in the query. This requires a specific amount of network overhead and reduces the performance of high-concurrency queries. Therefore, the BE does not send the profile of a successful query to the FE.

If you want to analyze the profile of a query, you can set this variable to true before you perform the query. After the query is complete, you can view the profile on the web page of the currently connected FE node by using the following URL:

fe_host:fe_http_port/query

The web page displays the most recent 100 queries after the enable_profile variable is set to true.

rewrite_count_distinct_to_bitmap_hll

Specifies whether to replace the COUNT DISTINCT function with the BITMAP_UNION_COUNT or HLL_UNION_AGG function for data in the bitmap or HyperLogLog (HLL) format. Default value: true.

allow_partition_column_nullable

Specifies whether to allow partition key columns to be defined as NULL when you create a table. Default value: true.

A value of true specifies that partition key columns can be defined as NULL. A value of false specifies that partition key columns must be defined as NOT NULL.

enable_exchange_node_parallel_merge

Specifies whether the exchange node starts multiple threads to merge data in parallel during the sorting process of a query. This reduces additional CPU and memory consumption. Default value: false.

In a query whose results are sorted, when an upstream node receives data from a downstream node, the data is sorted on the exchange node to ensure that the final data is ordered. However, if the exchange node starts a single thread to merge large amounts of data from multiple downstream nodes, a single point of merging bottleneck may occur on the exchange node.

To resolve this issue, ApsaraDB for SelectDB optimizes the merging process. If an upstream node receives data from multiple downstream nodes, the exchange node starts multiple threads to merge data in parallel to speed up the sorting process.

extract_wide_range_expr

Specifies whether to enable wide common factors. Default value: false.

enable_fold_constant_by_be

Specifies whether to perform the calculation of constant folding on the BE. Default value: false.

A value of false specifies that the calculation is performed on the FE. A value of true specifies that the calculation is performed on the BE by using remote procedure call (RPC) requests.

disable_join_reorder

Specifies whether to disable all automatic Join Reorder algorithms. Default value: false.

A value of false specifies that all automatic Join Reorder algorithms are enabled. A value of true specifies that all automatic Join Reorder algorithms are disabled. In this case, a JOIN operation is performed based on the order of tables in SQL statements.

enable_infer_predicate

Specifies whether to perform predicate derivation. Default value: true.

A value of true specifies that predicate derivation is performed. A value of false specifies that ApsaraDB for SelectDB does not perform predicate derivation, but uses the original predicate to perform related operations.

trim_tailing_spaces_for_external_table_query

Specifies whether to trim the trailing spaces of fields when you query Hive external tables. Default value: false.

enable_inverted_index_query

Specifies whether to enable inverted indexes. Default value: true.

skip_missing_version

Specifies whether to ignore missing versions. Default value: false.

In some extreme scenarios, all replicas of tablets in a table have missing versions. These tablets cannot be restored. As a result, the entire table cannot be queried.

This variable can be used to control the behavior of a query. If you set this variable to true, the query ignores the visible version of a tablet recorded on the FE and uses the replica version instead. If the replicas on the BE have missing versions, the query skips these missing versions and returns only the data of existing versions.

In addition, the query always tries to select the replica of the latest successful version among all surviving replicas on the BE to restore as much data as possible.

We recommend that you set this variable to true only in emergency scenarios where you need to restore the data of a table for queries.

Important

This variable conflicts with the use_fix_replica variable. This variable takes effect only if the use_fix_replica variable is set to -1.

enable_parquet_lazy_materialization

Specifies whether to enable the late materialization technology for Parquet Reader. Default value: true.

enable_orc_lazy_materialization

Specifies whether to enable the late materialization technology for ORC Reader. Default value: true.

enable_strong_consistency_read

Specifies whether to enable strong consistency for data reads. Default value: true.

By default, ApsaraDB for SelectDB supports strong consistency within the same session. The changes to data within the same session are visible in real time. If you want to enable strong consistency for data reads between sessions, set this variable to true.

truncate_char_or_varchar_columns

Specifies whether to truncate columns of the CHAR or VARCHAR type based on the maximum length in the table schema. Default value: false.

If you manage data by using an external table, the maximum length of columns of the CHAR or VARCHAR type in the schema of the table may be inconsistent with that in the schema of the underlying Parquet or ORC file.

If you set this variable to true, ApsaraDB for SelectDB truncates columns of the CHAR or VARCHAR type based on the maximum length in the table schema.

jdbc_clickhouse_query_final

Specifies whether to add the final keyword when you use the Java Database Connectivity (JDBC) catalog feature to query ClickHouse data. Default value: false.

This variable is used to remove duplicates together with the ReplacingMergeTree table engine of ClickHouse.

enable_memtable_on_sink_node

Specifies whether to enable the memory table on the Data Sink node during data import. Default value: false.

If you set this variable to true, ApsaraDB for SelectDB creates a memory table on the Data Sink node and sends segments to other BE nodes by using bRPC streaming. This reduces duplicate work among replicas, and saves time for data serialization and deserialization.

enable_unique_key_partial_update

Specifies whether to enable the semantics of partial column update for the INSERT INTO statement. Valid values:

  • false (default)

  • true

Important
  • This variable is available only when the table uses the Unique model that uses the Merge on Write (MOW) mode.

  • If you set both the enable_unique_key_partial_update and enable_insert_strict variables to true, you can execute the INSERT INTO statement only to update data. The default value true of the enable_insert_strict variable indicates that the strict mode is enabled for the INSERT statement. If the INSERT INTO statement contains keys that do not exist in the table, an error is reported.

  • After you set this variable to true, if you want to execute the INSERT INTO statement to update specific columns and insert data to the columns, you must set the enable_unique_key_partial_update session variable to true and the enable_insert_strict session variable to false. For more information about variables settings, see Configure variables.

describe_extend_variant_column

Specifies whether to display sub-columns for columns of the VARIANT type when you query a table. Default value: false.

dry_run_query

Specifies whether to return only the number of rows in the results of a query. Default value: false.

If you set this variable to true, ApsaraDB for SelectDB no longer returns the actual result set for a query, but returns only the number of rows in the results. If you import or insert data, the Data Sink node drops the data instead of writing the data.

This variable can be used to avoid the amount of time required to transmit the result set when you test a large number of datasets. ApsaraDB for SelectDB spends time only in executing the underlying query.

SELECT* FROM bigtable;
+--------------+
| ReturnedRows |
+--------------+
| 10000000     |
+--------------+

drop_table_if_ctas_failed

Specifies whether to delete the created table when you execute the CREATE TABLE AS SELECT statement but a write error occurs. Default value: true.

show_user_default_role

Specifies whether to display the implicit role of each user in the results of the SHOW ROLES statement. Default value: false.

group_by_and_having_use_alias_first

Specifies whether to use column aliases in the GROUP BY and HAVING clauses instead of searching for column names from the From clause. Default value: false.

enable_file_cache

Specifies whether to enable the block file cache. Default value: false.

This variable takes effect only if it is set to true in the be.conf file of a BE node. If it is set to false in the be.conf file, the block file cache is disabled on the BE node.

enable_auto_analyze

Specifies whether to enable the automatic statistics collection feature. Default value: true.

enable_cost_based_join_reorder

Specifies whether to enable Join Reorder to automatically adjust the order of tables when you perform a JOIN operation.

Default value: false.

enable_fallback_to_original_planner

Specifies whether to automatically fall back to the old optimizer based on the scenario after the new optimizer is enabled. Default value: false.

enable_odbc_transcation

Specifies whether to enable the transaction feature for writing data to a JDBC catalog. Default value: false.

enable_parallel_outfile

Specifies whether to enable parallel scan for the SELECT INTO OUTFILE statement. Default value: false.

enable_sql_cache

Specifies whether to enable the cache for SQL statements. Default value: false.

enable_unicode_name_support

Specifies whether to support the table and column names that contain Unicode characters. Default value: false.

If you set this variable to true, Chinese column names are supported.

enable_vectorized_engine

Specifies whether to enable the vectorized engine to accelerate queries. Default value: true.

experimental_enable_nereids_planner

Specifies whether to enable the new optimizer. This feature is experimental. Default value: true.

experimental_enable_pipeline_engine

Specifies whether to enable the Pipeline execution engine. This feature is experimental. Default value: true.

experimental_enable_single_replica_insert

Specifies whether to enable the single-replica import feature. Default value: false.

If you set this variable to true, ApsaraDB for SelectDB selects a replica from multiple replicas as the primary replica and performs computing only on the primary replica. Other replicas are secondary replicas. When all data files are written to the primary replica, the node on which each secondary replica resides can pull the data files from the primary replica. This synchronizes data between replicas.

faster_float_convert

Specifies whether to enable the faster floating-point number conversion algorithm. Take note that this algorithm affects the output format. Default value: false.

enable_page_cache

Specifies whether to enable the page-level cache. Default value: true.

strict_mode

Specifies whether to enable the strict mode for data import. Default value: false.

Configuration variables

The value and valid values of a configuration variable vary based on the configuration.

Variable

Description

auto_broadcast_join_threshold

The threshold for the maximum bytes of a table to be broadcast to all nodes when you perform a JOIN operation. A value of -1 specifies that the broadcast feature is disabled.

ApsaraDB for SelectDB provides the Broadcast Join and Shuffle Join methods.

If you use Broadcast Join, ApsaraDB for SelectDB filters the data of the small table based on conditions and broadcasts the data to each node on which the large table resides to generate an in-memory hash table. Then, ApsaraDB for SelectDB reads the data of the large table in streaming mode to perform a Hash Join operation.

If you use Shuffle Join, ApsaraDB for SelectDB hashes the data of the small table and large table based on the join key, and then performs distributed JOIN operations.

If the small table contains a small amount of data, Broadcast Join delivers better performance. If the small table contains a large amount of data, Shuffle Join delivers better performance.

ApsaraDB for SelectDB automatically tries to use Broadcast Join. You can also explicitly specify the implementation method of each join operator. ApsaraDB for SelectDB provides the auto_broadcast_join_threshold variable to limit the ratio of the memory used by the hash table to the overall execution memory when Broadcast Join is used. The maximum ratio can be in the range of 0 to 1. By default, the maximum ratio is 0.8. If the memory used by the hash table exceeds this limit, ApsaraDB for SelectDB automatically changes to use Shuffle Join. The overall execution memory is estimated by the query optimizer.

Note

We recommend that you do not use this variable to change the join method. If you want to specify the join method to be used, we recommend that you use a hint, such as join[shuffle].

batch_size

The number of rows in a single packet transmitted by each node during a query.

By default, a packet contains 1,024 rows. Each time the source node generates 1,024 rows of data, the data is packaged and sent to the destination node. A larger number of rows in a single packet increase the throughput of queries in scenarios where a large amount of data is scanned. However, a larger number of rows in a single packet may increase the query latency in scenarios where a small amount of data is queried. This also increases the memory overhead of queries.

We recommend that you set this variable to a value in the range of 1024 to 4096.

default_order_by_limit

The default number of data entries to be returned after the ORDER BY clause is used.

Default value: -1, which specifies that the maximum number of data entries after the query is returned. The upper limit is the maximum value of data of the LONG type.

exec_mem_limit

The maximum memory used by a single query. By default, a single query can use up to 2 GB of memory. You can specify the maximum memory in units of bytes, KB, MB, GB, TB, or PB. If you do not specify a unit, bytes are used by default.

This variable is used to limit the memory that can be used by an instance of a fragment in an execution plan. An execution plan can be split into multiple instances. A BE node may run one or more instances. Therefore, this variable does not accurately limit the memory usage of a query in a cluster or the memory usage of a query on a BE node. The actual limit is determined based on the generated execution plan.

In most cases, a large amount of memory is consumed only on some blocking nodes, such as sorting nodes, aggregate nodes, and join nodes. However, on other nodes, such as scan nodes, data is streamed and does not occupy a large amount of memory.

If a Memory Exceed Limit error occurs, you can try to increase this limit exponentially, such as 4 GB, 8 GB, or 16 GB. Take note that this limit may fluctuate by several MB in practice.

lower_case_table_names

The case sensitivity of table names. Valid values:

  • 0 (default): Table names are case-sensitive.

  • 1: Table names are not case-sensitive. ApsaraDB for SelectDB converts table names to lowercase when data is stored or queried. The advantage is that you can use both lowercase and uppercase table names in an SQL statement. The following sample code provides an example:

    SHOW tables;  
    +------------------+
    | Tables_in_testdb |
    +------------------+
    | cost             |
    +------------------+
    
    SELECT* FROM COST WHERE COst.id < 100 ORDER BY cost.id;

    The disadvantage is that the uppercase table name specified in the CREATE TABLE statement cannot be obtained after a table is created. The table names returned by the SHOW TABLES statement are always in lowercase.

  • 2: Table names are not case-sensitive. ApsaraDB for SelectDB stores the table name specified in the CREATE TABLE statement and converts the specified table name to lowercase for comparison during queries. The advantage is that a table name returned by the SHOW TABLES statement is the one specified in the CREATE TABLE statement. The disadvantage is that you can use only lowercase or uppercase table names in an SQL statement. For example, you can use the table name COST to query the cost table. Sample code:

    SELECT* FROM COST WHERE COST.id < 100 ORDER BY COST.id;

    This variable is compatible with MySQL. You must configure the lower_case_table_names variable in the fe.conf file when a cluster is initialized. After the cluster is initialized, you cannot modify the variable by executing the SET statement, or by restarting or scaling up the cluster.

    The table names of system views in the information_schema database are not case-sensitive. In this case, a value of 0 is equivalent to a value of 2 for the lower_case_table_names variable.

max_pushdown_conditions_per_column

The maximum number of conditions that can be pushed down to the storage engine for a single column in a query request. For more information, see the description of the BE configuration item max_pushdown_conditions_per_column. Default value: -1, which specifies that the configured value in the be.conf file is used. If you set this variable to a value greater than 0, the query in the current session uses the variable value and ignores the configured value in the be.conf file.

max_scan_key_num

The maximum number of scan keys that a scan node can split in a query request. For more information, see the description of the BE configuration item doris_max_scan_key_num. Default value: -1, which specifies that the configured value in the be.conf file is used. If you set this variable to a value greater than 0, the query in the current session uses the variable value and ignores the configured value in the be.conf file.

parallel_exchange_instance_num

The number of exchange nodes used by an upstream node to receive data from a downstream node in an execution plan. Default value: -1, which specifies that the number of exchange nodes is equal to the number of instances that are run on the downstream node.

If you set this variable to a value greater than 0 and less than the number of instances that are run on the downstream node, the number of exchange nodes is equal to the variable value.

In a distributed execution plan, an upstream node uses one or more exchange nodes to receive data from the instances that are run on different downstream BE nodes. In most cases, the number of exchange nodes is equal to the number of instances that are run on downstream nodes.

In some aggregate query scenarios, if a large amount of data is scanned at the underlying layer but the amount of aggregated data is small, you can try to set this variable to a smaller value. This reduces the resource overhead of aggregate queries, such as an aggregate query based on the Duplicate key model.

analyze_timeout

The timeout period of the ANALYZE statement. Unit: seconds. Default value: 43200.

auto_analyze_end_time

The end time of automatic statistics collection.

auto_analyze_start_time

The start time of automatic statistics collection.

auto_analyze_table_width_threshold

The maximum number of columns in a table to which automatic statistics collection applies. If the number of columns in a table exceeds this limit, the table does not participate in automatic statistics collection. Default value: 70.

div_precision_increment

The number of digits by which you want to increase the scale of the result of division operations performed by using the / operator. Default value: 4.

external_agg_bytes_threshold

The maximum memory used by the aggregate operation in a query. If the memory usage exceeds this limit, in-memory data is temporarily written to a disk.

external_sort_bytes_threshold

The maximum memory used by the sort operation in a query. If the memory usage exceeds this limit, in-memory data is temporarily written to a disk.

external_table_auto_analyze_interval_in_millis

The minimum interval at which automatic statistics collection is performed for external tables. During the specified interval, automatic statistics collection is performed only once for external tables. Unit: millisecond. Default value: 86400000.

fragment_transmission_compression_codec

The compression format to be used during data transmission. Default value: lz4.

huge_table_auto_analyze_interval_in_millis

The minimum interval at which automatic statistics collection is performed for large tables. During the specified interval, automatic statistics collection is performed only once for tables whose size is larger than five times the threshold specified by the huge_table_lower_bound_size_in_bytes variable. Default value: 0.

huge_table_default_sample_rows

The number of rows to be sampled for large tables. Default value: 4194304.

huge_table_lower_bound_size_in_bytes

The minimum size of the table to be sampled for automatic statistics collection. If the size of a table exceeds this threshold, the table is sampled during automatic statistics collection. Default value: 0.

ignore_runtime_filter_ids

The runtime filters to be ignored.

By default, this variable is empty.

max_filter_ratio

The maximum ratio of data that can be filtered out. For example, data is filtered out because it does not conform to standards. Default value: 0.

max_instance_num

The maximum number of Pipeline tasks that can be concurrently run for an SQL query if the parallel_pipeline_task_num variable is not configured. Default value: 64.

parallel_pipeline_task_num

The number of Pipeline tasks that are concurrently run for an SQL query. Default value: 0.

If you set this variable to 0, ApsaraDB for SelectDB automatically detects the number of CPU cores on each BE node and sets the number of concurrent Pipeline tasks to half the number of CPU cores, which cannot exceed the value of the max_instance_num variable.

runtime_filter_type

The type of the runtime filter to be used. The following runtime filters are available: Bloom filter, MinMax filter, IN predicate, IN or Bloom filter, and bitmap filter. By default, the IN or Bloom filter is used. In some cases, the performance is improved if you use the Bloom filter, MinMax filter, and IN predicate simultaneously.

runtime_filter_mode

The pushdown policy of the runtime filter. The OFF, LOCAL, and GLOBAL policies are available. By default, the GLOBAL policy is used.

runtime_filter_wait_time_ms

The duration that the scan node of the left table waits for each runtime filter. Unit: millisecond. Default value: 1000.

runtime_filters_max_num

The maximum number of Bloom filters that can be used in each query. Default value: 10.

runtime_bloom_filter_min_size

The minimum length of the Bloom filter. Unit: bytes. Default value: 1048576, which specifies 1 MB.

runtime_bloom_filter_max_size

The maximum length of the Bloom filter. Unit: bytes. Default value: 16777216, which specifies 16 MB.

runtime_bloom_filter_size

The default length of the Bloom filter. Unit: bytes. Default value: 2097152, which specifies 2 MB.

runtime_filter_max_in_num

The maximum number of rows in the right table. If the number of rows in the right table exceeds this limit, no IN predicate can be generated. Default value: 1024.

parallel_fragment_exec_instance_num

The number of instances that process scan ranges on each BE node. Default value: 1.

An execution plan generates a set of scan ranges to specify the range of data to be scanned. The data is distributed across multiple BE nodes. A BE node may have one or more scan ranges. By default, a set of scan ranges on each BE node is processed by only one instance. If machine resources are sufficient, you can modify this variable to allow more instances to process a set of scan ranges on each BE node. This improves query efficiency.

The number of instances that process scan ranges determines the number of other upstream nodes, such as aggregate nodes and join nodes. This also increases the concurrency of the execution plan. You can modify this variable to improve the efficiency of large queries. However, the increased concurrency consumes more machine resources, such as CPU, memory, and disk I/O.

query_timeout

The timeout period of a query.

This variable applies to all query statements in the current connection. We recommend that you use the insert_timeout variable for the INSERT statement. Unit: seconds. Default value: 900, which specifies 15 minutes.

insert_timeout

The timeout period of the INSERT statement. This variable applies only to the INSERT statement. We recommend that you use this variable if it takes an extended period of time to execute the INSERT statement.

Unit: seconds. Default value: 14400, which specifies 4 hours.

In earlier versions, the value of the query_timeout variable can be increased to prevent the timeout of the INSERT statement. To be compatible with user habits in earlier versions, the insert_timeout variable does not take effect if its value is smaller than the value of the query_timeout variable.

send_batch_parallelism

The default number of concurrent tasks that send the data to be processed in batches for the INSERT stmt operation. If the value of this variable exceeds the value of the BE configuration item max_send_batch_parallelism_per_job, the Coordinator BE node uses the value of the BE configuration item max_send_batch_parallelism_per_job.

sql_mode

The SQL mode, which helps accommodate different styles of SQL syntax and data validation strictness. For more information, see SQL Mode.

sql_select_limit

The default number of rows returned by the SELECT statement, including the SELECT clause of the INSERT statement.

By default, no limit is set.

time_zone

The time zone of the current session. The default value equals the value of the system_time_zone variable. This time zone affects the results of some time functions. For more information, see the Time zone variables section of this topic.

wait_timeout

The timeout period of an idle connection. If the idle connection does not interact with ApsaraDB for SelectDB within the timeout period, ApsaraDB for SelectDB closes the connection.

Unit: seconds. Default value: 28800, which specifies 8 hours.

default_rowset_type

The default storage format of the storage engine on compute nodes.

Valid values: alpha and beta. Default value: beta.

use_v2_rollup

Specifies whether to obtain data by using ROLLUP indexes in the Segment V2 storage format during queries.

This variable is used for verification when the Segment V2 feature is used. In other cases, we recommend that you do not use this variable.

prefer_join_method

The preferred join method between Broadcast Join and Shuffle Join if the cost of Broadcast Join equals the cost of Shuffle Join.

Valid values: broadcast and shuffle.

insert_visible_timeout_ms

The timeout period during which ApsaraDB for SelectDB waits for the transaction to commit and data becomes visible after the query and insert operations of the INSERT statement are complete.

Default value: 10000. The minimum value is 1000.

cpu_resource_limit

The resource overhead of a query. This feature is experimental.

This variable limits the number of scan threads for a query on a single node. After the number of scan threads is limited, the underlying layer returns data at a lower speed. This way, the overall computing resource overhead of the query can be limited. If you set this variable to 2, a query can use up to two scan threads on a single node.

This variable overrides the parallel_fragment_exec_instance_num variable. If the parallel_fragment_exec_instance_num variable is set to 4 and this variable is set to 2, four instances on a single node share up to two scan threads.

This variable is overridden by the cpu_resource_limit user property.

Default value: -1, which specifies no limits.

return_object_data_as_binary

Specifies whether to return data in the bitmap or HLL format in the results of the SELECT statement.

After you execute the SELECT INTO OUTFILE statement, if the exported file is in the CSV format, data in the bitmap or HLL format is encoded in Base64. If the exported file is in the Parquet format, data in the bitmap or HLL format is stored as a byte array.

The following sample code provides an example in Java. For more information, see samples.

try (Connection conn = DriverManager.getConnection("jdbc:mysql://127.0.0.1:9030/test?user=root");
             Statement stmt = conn.createStatement()
) {
    stmt.execute("set return_object_data_as_binary=true"); // IMPORTANT!!!
    ResultSet rs = stmt.executeQuery("SELECT uids from t_bitmap");
    while(rs.next()){
        byte[] bytes = rs.getBytes(1);
        RoaringBitmap bitmap32 = new RoaringBitmap();
        switch(bytes[0]) {
            case 0: // for empty bitmap
                break;
            case 1: // for only 1 element in bitmap32
                bitmap32.add(ByteBuffer.wrap(bytes,1,bytes.length-1)
                        .order(ByteOrder.LITTLE_ENDIAN)
                        .getInt());
                break;
            case 2: // for more than 1 elements in bitmap32
                bitmap32.deserialize(ByteBuffer.wrap(bytes,1,bytes.length-1));
                break;
            // for more details, see https://github.com/apache/doris/tree/master/samples/read_bitmap
        }
    }
}

block_encryption_mode

The block encryption mode. Default value: NULL.

A value of AES_128_ECB specifies the Advanced Encryption Standard (AES) algorithm. A value of SM3_128_ECB specifies the ShangMi 3 (SM3) algorithm.

Valid values:

 AES_128_ECB, AES_192_ECB, AES_256_ECB, AES_128_CBC, AES_192_CBC, AES_256_CBC, AES_128_CFB, AES_192_CFB, AES_256_CFB, AES_128_CFB1, AES_192_CFB1, AES_256_CFB1, AES_128_CFB8, AES_192_CFB8, AES_256_CFB8, AES_128_CFB128, AES_192_CFB128, AES_256_CFB128, AES_128_CTR, AES_192_CTR, AES_256_CTR, AES_128_OFB, AES_192_OFB, AES_256_OFB, SM4_128_ECB, SM4_128_CBC, SM4_128_CFB128, SM4_128_OFB, SM4_128_CTR,

default_password_lifetime

The default expiration time of a password.

Default value: 0, which specifies that the password never expires. Unit: day.

This variable takes effect only if the PASSWORD_EXPIRE policy of a user is set to DEFAULT. Examples:

CREATE USER user1 IDENTIFIED BY "12345" PASSWORD_EXPIRE DEFAULT;
ALTER USER user1 PASSWORD_EXPIRE DEFAULT;

password_history

The default number of historical passwords.

Default value: 0, which specifies no limits.

This variable takes effect only if the PASSWORD_HISTORY policy of a user is set to DEFAULT. Examples:

CREATE USER user1 IDENTIFIED BY "12345" PASSWORD_HISTORY DEFAULT;
ALTER USER user1 PASSWORD_HISTORY DEFAULT;

validate_password_policy

The password strength verification policy.

Default value: NONE or 0, which specifies no verification. Valid values: STRONG and 2.

If you set this variable to STRONG or 2, the password that you set by executing the ALTER USER or SET PASSWORD statement must meet the following requirements: The password must be at least eight characters in length. The password must contain at least three of the following character types: uppercase letters, lowercase letters, digits, and special characters.

Special characters include: ~ ! @ # $ % ^ & * ( ) _ + | < > , . ? / : ; ' [ ] { } "

rewrite_or_to_in_predicate_threshold

The default number of OR operators for rewriting OR to IN.

Default value: 2, which specifies that two OR operators can be rewritten as the IN predicate.

file_cache_base_path

The storage path of the block file cache on the BE. Default value: random, which specifies that the storage path of BE configurations is randomly selected.

topn_opt_limit_threshold

The threshold of the Limit clause for top N query optimization features. Default value: 1024.

In this example, the SELECT * FROM t ORDER BY k LIMIT n statement is used. If n in the Limit clause is less than or equal to this threshold, top N query optimization features are automatically enabled, such as dynamic filtering and pushdown, two-phase obtaining of results, and data reading in the order of keys. Otherwise, top N query optimization features are disabled.

use_fix_replica

Specifies whether to use a fixed replica for queries. The value starts from 0. If you set the use_fix_replica variable to 0, the replica with the smallest serial number is used. If you set the use_fix_replica variable to 1, the replica with the second smallest serial number is used.

Default value: -1, which specifies that this feature is disabled.

Compatible or reserved variables

This type of variable has no practical effect.

Variable

Description

SQL_AUTO_IS_NULL

This variable is used to ensure the compatibility with the JDBC connection pool c3p0. It has no practical effect.

auto_increment_increment

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. Although ApsaraDB for SelectDB supports auto-increment columns, this variable does not affect the behavior of auto-increment columns. This also applies to the auto_increment_offset variable.

autocommit

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

character_set_client

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

character_set_connection

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

character_set_results

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

character_set_server

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

collation_connection

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

collation_database

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

collation_server

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

have_query_cache

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

init_connect

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

interactive_timeout

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

language

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

max_allowed_packet

This variable is used to ensure the compatibility with the JDBC connection pool c3p0. It has no practical effect.

net_buffer_length

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

net_read_timeout

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

net_write_timeout

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

query_cache_size

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

query_cache_type

This variable is used to ensure the compatibility with the JDBC connection pool c3p0. It has no practical effect.

sql_safe_updates

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

tx_isolation

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

tx_read_only

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

transaction_read_only

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

transaction_isolation

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

version

This variable is used to ensure the compatibility with MySQL clients. It has no practical effect.

performance_schema

This variable is used to ensure the compatibility with MySQL JDBC 8.0.16 and later. It has no practical effect.

codegen_level

This variable is used to set the level of LLVM codegen. It does not take effect.

license

This variable is used to display the license of ApsaraDB for SelectDB. It has no other effects.

resource_group

This variable is reserved.

system_time_zone

This variable is set to the system time zone when a cluster is initialized. This variable cannot be modified.

version_comment

This variable is used to display the version of ApsaraDB for SelectDB. This variable cannot be modified.

dump_nereids_memo

This variable is used for regression testing.

group_concat_max_len

This variable is used to ensure the compatibility with some business intelligence (BI) tools. It has no practical effect.

be_number_for_test

This variable has no practical effect.

workload_group

This variable is reserved.

Timeout settings

ApsaraDB for SelectDB allows you to configure timeout settings by using the query_timeout and insert_timeout variables, and the query_timeout and insert_timeout user properties.

Timeout settings take effect based on the following priorities in descending order: Session variables > User properties > Global variables > Default values. If timeout settings with a higher priority are not configured, those with a lower priority automatically take effect.

query_timeout is used to control the timeout period of all statements. insert_timeout is used to control the timeout period of the INSERT statement. When you execute the INSERT statement, the longer timeout period between those configured by query_timeout and insert_timeout is used.

The query_timeout and insert_timeout user properties are configured only by the admin user for a specific user. These two user properties are used to change the default timeout settings of the specified user and do not have quota semantics. The timeout settings configured in user properties take effect after a client is reconnected.

References