This topic describes how to manage and use variables in ApsaraDB for SelectDB. It also describes how to configure time zone variables. This topic helps you better use the variables of ApsaraDB for SelectDB to meet your business requirements.
Variables
The variables of ApsaraDB for SelectDB are divided into those specific to ApsaraDB for SelectDB and those compatible with MySQL client protocols. For more information, see Appendix.
Query and configure variables
Query variables
You can execute the SHOW VARIABLES [LIKE 'xxx']; statement to query all or specified variables. Sample code:
SHOW VARIABLES;
SHOW VARIABLES LIKE '%time_zone%';Configure variables
You can configure global variables, which take effect globally. You can also configure session variables, which take effect only for the current session.
Session variables: If you want to configure a variable that takes effect only for the current session, you can execute the SET var_name=xxx; statement. Sample code:
SET exec_mem_limit = 137438953472;
SET forward_to_master = true;
SET time_zone = "Asia/Shanghai";Global variables: If you want to configure a variable that takes effect globally, you can execute the SET GLOBAL var_name=xxx; statement. After a global variable takes effect, the configured value is used in the current and subsequent sessions. Sample code:
SET GLOBAL exec_mem_limit = 137438953472Only the admin user can configure global variables.
The following variables can be configured to take effect globally or only for the current session:
time_zonewait_timeoutsql_modeenable_profilequery_timeoutinsert_timeoutexec_mem_limitbatch_sizeallow_partition_column_nullableinsert_visible_timeout_msenable_fold_constant_by_be
The following variables can be configured to take effect only globally:
default_rowset_typedefault_password_lifetimepassword_historyvalidate_password_policy
In addition, you can use constant expressions to configure variables. Sample code:
SET exec_mem_limit = 10 * 1024 * 1024 * 1024;
SET forward_to_master = concat('tr', 'u', 'e');Configure variables in query statements
In some scenarios, you may need to configure variables for specific queries. You can use the SET_VAR hint to configure session variables in a query. The configured variables take effect only for a single statement. Sample code:
SELECT /*+ SET_VAR(exec_mem_limit = 8589934592) */ name FROM people ORDER BY name;
SELECT /*+ SET_VAR(query_timeout = 1, enable_partition_cache=true) */ sleep(3);Comments must start with /*+ and follow the SELECT statement.
Time zone variables
As a critical type of variable, time zone variables affect many functions and variable values in ApsaraDB for SelectDB. To meet different business requirements in various regions, ApsaraDB for SelectDB supports custom time zone settings.
ApsaraDB for SelectDB provides the following two time zone variables:
system_time_zone: the system time zone of a cluster. When a server starts, this variable is automatically configured based on the time zone of the server. The configured time zone cannot be modified.time_zone: the current time zone of a cluster. This time zone can be modified.
Query and configure time zone variables
Query time zone variables. Sample code:
SHOW variables LIKE '%time_zone%'Configure the time zone of a session. Sample code:
SET [GLOBAL] time_zone = 'Asia/Shanghai'If the GLOBAL keyword is used, the configured time zone also takes effect for all subsequent sessions of the current ApsaraDB for SelectDB cluster.
Time zone formats
You can specify the values of time zone variables in multiple formats. ApsaraDB for SelectDB supports the following standard formats:
Standard named formats, such as Asia/Shanghai and America/Los_Angeles.
Standard offset formats, such as +02:30 and -10:00.
Abbreviated formats. The following formats are supported:
Greenwich Mean Time (GMT) and Coordinated Universal Time (UTC), which are equivalent to the +00:00 time zone.
Central Standard Time (CST), which is equivalent to the Asia/Shanghai time zone.
Single letter Z, which represents the Zulu time zone and is equivalent to the +00:00 time zone.
Due to the implementation of different data import methods, ApsaraDB for SelectDB may support other time zone formats if specific data import methods are used. We recommend that you do not rely on other time zone formats in the production environment. ApsaraDB for SelectDB may change its processing methods of other time zone formats at any time.
Impacts of time zone settings
Functions
Time zone settings affect the values returned by time functions such as NOW() and CURTIME(), and also affect time values in the SHOW LOAD and SHOW BACKENDS statements. However, time zone settings do not affect the less than values of time-type partition key columns in the CREATE TABLE statement, and do not affect the display of values stored as the DATE or DATETIME type.
The following functions are affected by time zone settings:
FROM_UNIXTIME: returns the date and time in the specified time zone based on the specified UTC timestamp. If the CST time zone is used,FROM_UNIXTIME(0)returns1970-01-01 08:00:00.UNIX_TIMESTAMP: returns the UTC timestamp based on the specified date and time in the specified time zone. If the CST time zone is used,UNIX_TIMESTAMP('1970-01-01 08:00:00')returns0.CURTIME: returns the current time in the specified time zone.NOW: returns the current date and time in the specified time zone.CONVERT_TZ: converts the date and time in the specified time zone to those in another time zone.
Values of time types
When you import data of the DATE, DATEV2, DATETIME, or DATETIMEV2 type, ApsaraDB for SelectDB supports time zone conversion.
If the data contains a time zone, such as 2020-12-12 12:12:12+08:00, and the current time zone of the ApsaraDB for SelectDB cluster is +00:00, the imported data is converted to 2020-12-12 04:12:12.
If the data does not contain a time zone, such as 2020-12-12 12:12:12, the time is considered an absolute time and the imported data is not converted.
Daylight saving time
Daylight saving time is essentially the actual time offset of a named time zone, which changes on specific dates.
For example, the America/Los_Angeles time zone has a daylight saving time adjustment that lasts approximately from March to November each year. In the America/Los_Angeles time zone, the time offset changes from -08:00 to -07:00 when daylight saving time starts in March, and changes from -07:00 to -08:00 when daylight saving time ends in November.
If you do not use daylight saving time, you must set the time_zone variable to -08:00 instead of America/Los_Angeles.
Data sources
The time zone data contains the name of the time zone, the time offset, and the change of daylight saving time. On the machine where a backend (BE) node is deployed, the time zone data can be found in the following data sources:
The directory returned by the
TZDIRcommand.The
/usr/share/zoneinfodirectory.The
zoneinfodirectory generated in the BE deployment directory of ApsaraDB for SelectDB.
ApsaraDB for SelectDB searches the preceding data sources in sequence for the time zone data. If the time zone data is found, it is used. If no time zone data is found, the BE node of ApsaraDB for SelectDB fails to start.
Best practices
Time zone-sensitive data
ApsaraDB for SelectDB is compatible with the data imported from different time zones. Time zone settings are affected by the following three factors:
The current time zone of the cluster, which is configured by the time_zone session variable.
The time zone configured by the timezone header during data import by using a method such as Stream Load or Broker Load.
The literal time zone of the data, such as
+08:00in2023-12-12 08:00:00+08:00.
In ApsaraDB for SelectDB, data of time types such as DATETIME does not contain a time zone. Data of time types in an ApsaraDB for SelectDB cluster can be divided into two categories: absolute time and time in a specific time zone.
Absolute time: The time format of the data does not contain a time zone, and no time zone is specified when the data is imported. Example:
2023-12-12 08:00:00. The scenario associated with the data is independent of a time zone. When data of this category is imported, it is not affected by time zone settings. If you run a function such asUNIX_TIMESTAMPbased on data of this category, no time zone information can be obtained. In addition, if you modify thetime_zonesession variable, data of this category is not affected.Time in a specific time zone: The time format of the data contains a time zone. When data of this category is imported, it is affected by time zone settings. Example:
"2020-12-12 12:12:12+02:00". During data import, ApsaraDB for SelectDB processes data of this category based on time zone settings.If the data is imported by using the
INSERT INTOstatement, the data is affected by the time_zone session variable.If the data is imported by using a method such as Stream Load, the data is affected by the timezone header of Stream Load. If the setting of the timezone header conflicts with that of the time_zone session variable, the setting of the timezone header takes precedence.
The specific time zone refers to the current time zone of the cluster, which is configured by the time_zone session variable. This variable is configured before the data is imported and is no longer modified. When data of this category is imported, ApsaraDB for SelectDB converts the time in the literal time zone of the data to be imported to the time in the time zone configured by the
time_zonesession variable.
In the best practices of time in a specific time zone, we recommend that you do not modify the time_zone session variable. The following sample code provides an example:
mysql> SELECT@@time_zone;
+----------------+
| @@time_zone |
+----------------+
| Asia/Hong_Kong |
+----------------+
1 row in set (0.12 sec)
--- The literal time zone of the data to be imported is +02:00.
mysql> INSERT INTO dtv23 VALUES('2020-12-12 12:12:12+02:00');
Query OK, 1 row affected (0.27 sec)
mysql> SELECT* FROM dtv23;
--- The imported data is converted to the time in the Asia/Hong_Kong time zone, which is the current time zone of the ApsaraDB for SelectDB cluster. This time zone is also used for subsequent imports and queries.
+-------------------------+
| k0 |
+-------------------------+
| 2020-12-12 18:12:12.000 |
+-------------------------+
1 row in set (0.19 sec)
mysql> SET time_zone = 'America/Los_Angeles';
Query OK, 0 rows affected (0.15 sec)
mysql> SELECT* FROM dtv23;
--- If you modify the time_zone session variable, the time value does not change accordingly but its meaning varies based on the configured time zone. This causes confusion in subsequent queries.
+-------------------------+
| k0 |
+-------------------------+
| 2020-12-12 18:12:12.000 |
+-------------------------+
1 row in set (0.18 sec)
mysql> insert into dtv23 values('2020-12-12 12:12:12+02:00');
Query OK, 1 row affected (0.17 sec)
mysql> SELECT* FROM dtv23;
--- The query results are incorrect.
+-------------------------+
| k0 |
+-------------------------+
| 2020-12-12 02:12:12.000 |
| 2020-12-12 18:12:12.000 |
+-------------------------+
2 rows in set (0.19 sec)In summary, take note of the following items for the best practices of time zone settings:
Before you import data to a cluster, confirm the current time zone of the cluster and configure the
time_zonesession variable based on your business requirements. Do not modify this variable after you import data.If you use a data import method, make sure that the setting of the
timezoneheader is the same as that of thetime_zonesession variable.For absolute time, import the data without a time zone suffix. For time in a specific time zone, import the data with a specific time zone suffix. ApsaraDB for SelectDB converts the time in the specific time zone to the time in the time zone configured by the
time_zonesession variable.
Daylight saving time
The start time and end time of daylight saving time are based on the data source of the current time zone and may not be exactly the same as those officially confirmed by a time zone location for the current year. The time zone data is maintained by the Internet Corporation for Assigned Names and Numbers (ICANN). If you need to use the start time and end time of daylight saving time that are officially confirmed by a time zone location for the current year, make sure that the data source selected by ApsaraDB for SelectDB contains the latest time zone data published by ICANN. For more information about how to download the latest time zone data, see References.
Appendix
Supported variables
BOOLEAN-type variables
This type of variable is used to enable or disable a feature. Valid values are true and false.
Variable | Description |
delete_without_partition | Specifies whether to delete data without specifying a partition when you execute the DELETE statement and automatically apply the DELETE operation to all partitions. Default value: false. Important If the DELETE operation is automatically applied to all partitions, the duration of the DELETE statement is extended and a large number of tasks are triggered. This extends the duration of the DELETE operation. We recommend that you do not enable this feature. |
disable_colocate_join | Specifies whether to disable the Colocation Join feature. Default value: false. A value of false specifies that the feature is enabled. A value of true specifies that the feature is disabled. If this feature is disabled, the execution plan does not attempt to perform a Colocation Join operation. |
enable_bucket_shuffle_join | Specifies whether to enable the Bucket Shuffle Join feature. Default value: true. A value of true specifies that the feature is enabled. A value of false specifies that the feature is disabled. If this feature is disabled, the execution plan does not attempt to perform a Bucket Shuffle Join operation. |
disable_streaming_preaggregations | Specifies whether to disable streaming pre-aggregation. Default value: false. A value of false specifies that the feature is enabled. This variable cannot be modified. |
enable_insert_strict | Specifies whether to enable the A value of false specifies that the |
enable_spilling | Specifies whether to enable sorting for large amounts of data. Default value: false. A value of false specifies that the feature is disabled. This feature is enabled if you set the After this feature is enabled, the temporary data of a query is stored in the This feature is used to sort large amounts of data by using limited memory. Important This feature is experimental. We recommend that you do not enable this feature in the production environment. |
forward_to_master | Specifies whether to forward some SHOW statements to the master frontend (FE) node for execution. Default value: A value of true specifies that the feature is enabled. An ApsaraDB for SelectDB instance has multiple FE nodes, one of which is the master FE node. You can connect to one of the FE nodes to perform full-featured operations. However, after you execute some SHOW statements, detailed information can be obtained only from the master FE node. For example, if you do not forward the If you forward the following statements to the master FE node, more detailed information can be returned:
|
enable_profile | Specifies whether to display the profile of a query. Default value: false. A value of false specifies that the feature is disabled. By default, the BE sends the profile of a query to the FE to display errors only if an error occurs in the query. This requires a specific amount of network overhead and reduces the performance of high-concurrency queries. Therefore, the BE does not send the profile of a successful query to the FE. If you want to analyze the profile of a query, you can set this variable to true before you perform the query. After the query is complete, you can view the profile on the web page of the currently connected FE node by using the following URL:
The web page displays the most recent 100 queries after the |
rewrite_count_distinct_to_bitmap_hll | Specifies whether to replace the |
allow_partition_column_nullable | Specifies whether to allow partition key columns to be defined as NULL when you create a table. Default value: true. A value of true specifies that partition key columns can be defined as NULL. A value of false specifies that partition key columns must be defined as NOT NULL. |
enable_exchange_node_parallel_merge | Specifies whether the exchange node starts multiple threads to merge data in parallel during the sorting process of a query. This reduces additional CPU and memory consumption. Default value: false. In a query whose results are sorted, when an upstream node receives data from a downstream node, the data is sorted on the exchange node to ensure that the final data is ordered. However, if the exchange node starts a single thread to merge large amounts of data from multiple downstream nodes, a single point of merging bottleneck may occur on the exchange node. To resolve this issue, ApsaraDB for SelectDB optimizes the merging process. If an upstream node receives data from multiple downstream nodes, the exchange node starts multiple threads to merge data in parallel to speed up the sorting process. |
extract_wide_range_expr | Specifies whether to enable wide common factors. Default value: false. |
enable_fold_constant_by_be | Specifies whether to perform the calculation of constant folding on the BE. Default value: false. A value of false specifies that the calculation is performed on the FE. A value of true specifies that the calculation is performed on the BE by using remote procedure call (RPC) requests. |
disable_join_reorder | Specifies whether to disable all automatic Join Reorder algorithms. Default value: false. A value of false specifies that all automatic Join Reorder algorithms are enabled. A value of true specifies that all automatic Join Reorder algorithms are disabled. In this case, a JOIN operation is performed based on the order of tables in SQL statements. |
enable_infer_predicate | Specifies whether to perform predicate derivation. Default value: true. A value of true specifies that predicate derivation is performed. A value of false specifies that ApsaraDB for SelectDB does not perform predicate derivation, but uses the original predicate to perform related operations. |
trim_tailing_spaces_for_external_table_query | Specifies whether to trim the trailing spaces of fields when you query Hive external tables. Default value: false. |
enable_inverted_index_query | Specifies whether to enable inverted indexes. Default value: true. |
skip_missing_version | Specifies whether to ignore missing versions. Default value: false. In some extreme scenarios, all replicas of tablets in a table have missing versions. These tablets cannot be restored. As a result, the entire table cannot be queried. This variable can be used to control the behavior of a query. If you set this variable to true, the query ignores the visible version of a tablet recorded on the FE and uses the replica version instead. If the replicas on the BE have missing versions, the query skips these missing versions and returns only the data of existing versions. In addition, the query always tries to select the replica of the latest successful version among all surviving replicas on the BE to restore as much data as possible. We recommend that you set this variable to true only in emergency scenarios where you need to restore the data of a table for queries. Important This variable conflicts with the use_fix_replica variable. This variable takes effect only if the use_fix_replica variable is set to -1. |
enable_parquet_lazy_materialization | Specifies whether to enable the late materialization technology for Parquet Reader. Default value: true. |
enable_orc_lazy_materialization | Specifies whether to enable the late materialization technology for ORC Reader. Default value: true. |
enable_strong_consistency_read | Specifies whether to enable strong consistency for data reads. Default value: true. By default, ApsaraDB for SelectDB supports strong consistency within the same session. The changes to data within the same session are visible in real time. If you want to enable strong consistency for data reads between sessions, set this variable to true. |
truncate_char_or_varchar_columns | Specifies whether to truncate columns of the CHAR or VARCHAR type based on the maximum length in the table schema. Default value: false. If you manage data by using an external table, the maximum length of columns of the CHAR or VARCHAR type in the schema of the table may be inconsistent with that in the schema of the underlying Parquet or ORC file. If you set this variable to true, ApsaraDB for SelectDB truncates columns of the CHAR or VARCHAR type based on the maximum length in the table schema. |
jdbc_clickhouse_query_final | Specifies whether to add the final keyword when you use the Java Database Connectivity (JDBC) catalog feature to query ClickHouse data. Default value: false. This variable is used to remove duplicates together with the ReplacingMergeTree table engine of ClickHouse. |
enable_memtable_on_sink_node | Specifies whether to enable the memory table on the Data Sink node during data import. Default value: false. If you set this variable to true, ApsaraDB for SelectDB creates a memory table on the Data Sink node and sends segments to other BE nodes by using bRPC streaming. This reduces duplicate work among replicas, and saves time for data serialization and deserialization. |
enable_unique_key_partial_update | Specifies whether to enable the semantics of partial column update for the INSERT INTO statement. Valid values:
Important
|
describe_extend_variant_column | Specifies whether to display sub-columns for columns of the VARIANT type when you query a table. Default value: false. |
dry_run_query | Specifies whether to return only the number of rows in the results of a query. Default value: false. If you set this variable to true, ApsaraDB for SelectDB no longer returns the actual result set for a query, but returns only the number of rows in the results. If you import or insert data, the Data Sink node drops the data instead of writing the data. This variable can be used to avoid the amount of time required to transmit the result set when you test a large number of datasets. ApsaraDB for SelectDB spends time only in executing the underlying query. |
drop_table_if_ctas_failed | Specifies whether to delete the created table when you execute the |
show_user_default_role | Specifies whether to display the implicit role of each user in the results of the |
group_by_and_having_use_alias_first | Specifies whether to use column aliases in the |
enable_file_cache | Specifies whether to enable the block file cache. Default value: false. This variable takes effect only if it is set to |
enable_auto_analyze | Specifies whether to enable the automatic statistics collection feature. Default value: true. |
enable_cost_based_join_reorder | Specifies whether to enable Join Reorder to automatically adjust the order of tables when you perform a JOIN operation. Default value: false. |
enable_fallback_to_original_planner | Specifies whether to automatically fall back to the old optimizer based on the scenario after the new optimizer is enabled. Default value: false. |
enable_odbc_transcation | Specifies whether to enable the transaction feature for writing data to a JDBC catalog. Default value: false. |
enable_parallel_outfile | Specifies whether to enable parallel scan for the SELECT INTO OUTFILE statement. Default value: false. |
enable_sql_cache | Specifies whether to enable the cache for SQL statements. Default value: false. |
enable_unicode_name_support | Specifies whether to support the table and column names that contain Unicode characters. Default value: false. If you set this variable to true, Chinese column names are supported. |
enable_vectorized_engine | Specifies whether to enable the vectorized engine to accelerate queries. Default value: true. |
experimental_enable_nereids_planner | Specifies whether to enable the new optimizer. This feature is experimental. Default value: true. |
experimental_enable_pipeline_engine | Specifies whether to enable the Pipeline execution engine. This feature is experimental. Default value: true. |
experimental_enable_single_replica_insert | Specifies whether to enable the single-replica import feature. Default value: false. If you set this variable to true, ApsaraDB for SelectDB selects a replica from multiple replicas as the primary replica and performs computing only on the primary replica. Other replicas are secondary replicas. When all data files are written to the primary replica, the node on which each secondary replica resides can pull the data files from the primary replica. This synchronizes data between replicas. |
faster_float_convert | Specifies whether to enable the faster floating-point number conversion algorithm. Take note that this algorithm affects the output format. Default value: false. |
enable_page_cache | Specifies whether to enable the page-level cache. Default value: true. |
strict_mode | Specifies whether to enable the strict mode for data import. Default value: false. |
Configuration variables
The value and valid values of a configuration variable vary based on the configuration.
Variable | Description |
auto_broadcast_join_threshold | The threshold for the maximum bytes of a table to be broadcast to all nodes when you perform a JOIN operation. A value of -1 specifies that the broadcast feature is disabled. ApsaraDB for SelectDB provides the If you use If you use If the small table contains a small amount of data, ApsaraDB for SelectDB automatically tries to use Broadcast Join. You can also explicitly specify the implementation method of each join operator. ApsaraDB for SelectDB provides the Note We recommend that you do not use this variable to change the join method. If you want to specify the join method to be used, we recommend that you use a hint, such as join[shuffle]. |
batch_size | The number of rows in a single packet transmitted by each node during a query. By default, a packet contains 1,024 rows. Each time the source node generates 1,024 rows of data, the data is packaged and sent to the destination node. A larger number of rows in a single packet increase the throughput of queries in scenarios where a large amount of data is scanned. However, a larger number of rows in a single packet may increase the query latency in scenarios where a small amount of data is queried. This also increases the memory overhead of queries. We recommend that you set this variable to a value in the range of 1024 to 4096. |
default_order_by_limit | The default number of data entries to be returned after the ORDER BY clause is used. Default value: -1, which specifies that the maximum number of data entries after the query is returned. The upper limit is the maximum value of data of the LONG type. |
exec_mem_limit | The maximum memory used by a single query. By default, a single query can use up to 2 GB of memory. You can specify the maximum memory in units of bytes, KB, MB, GB, TB, or PB. If you do not specify a unit, bytes are used by default. This variable is used to limit the memory that can be used by an instance of a fragment in an execution plan. An execution plan can be split into multiple instances. A BE node may run one or more instances. Therefore, this variable does not accurately limit the memory usage of a query in a cluster or the memory usage of a query on a BE node. The actual limit is determined based on the generated execution plan. In most cases, a large amount of memory is consumed only on some blocking nodes, such as sorting nodes, aggregate nodes, and join nodes. However, on other nodes, such as scan nodes, data is streamed and does not occupy a large amount of memory. If a |
lower_case_table_names | The case sensitivity of table names. Valid values:
|
max_pushdown_conditions_per_column | The maximum number of conditions that can be pushed down to the storage engine for a single column in a query request. For more information, see the description of the BE configuration item |
max_scan_key_num | The maximum number of scan keys that a scan node can split in a query request. For more information, see the description of the BE configuration item |
parallel_exchange_instance_num | The number of exchange nodes used by an upstream node to receive data from a downstream node in an execution plan. Default value: -1, which specifies that the number of exchange nodes is equal to the number of instances that are run on the downstream node. If you set this variable to a value greater than 0 and less than the number of instances that are run on the downstream node, the number of exchange nodes is equal to the variable value. In a distributed execution plan, an upstream node uses one or more exchange nodes to receive data from the instances that are run on different downstream BE nodes. In most cases, the number of exchange nodes is equal to the number of instances that are run on downstream nodes. In some aggregate query scenarios, if a large amount of data is scanned at the underlying layer but the amount of aggregated data is small, you can try to set this variable to a smaller value. This reduces the resource overhead of aggregate queries, such as an aggregate query based on the Duplicate key model. |
analyze_timeout | The timeout period of the ANALYZE statement. Unit: seconds. Default value: 43200. |
auto_analyze_end_time | The end time of automatic statistics collection. |
auto_analyze_start_time | The start time of automatic statistics collection. |
auto_analyze_table_width_threshold | The maximum number of columns in a table to which automatic statistics collection applies. If the number of columns in a table exceeds this limit, the table does not participate in automatic statistics collection. Default value: 70. |
div_precision_increment | The number of digits by which you want to increase the scale of the result of division operations performed by using the / operator. Default value: 4. |
external_agg_bytes_threshold | The maximum memory used by the aggregate operation in a query. If the memory usage exceeds this limit, in-memory data is temporarily written to a disk. |
external_sort_bytes_threshold | The maximum memory used by the sort operation in a query. If the memory usage exceeds this limit, in-memory data is temporarily written to a disk. |
external_table_auto_analyze_interval_in_millis | The minimum interval at which automatic statistics collection is performed for external tables. During the specified interval, automatic statistics collection is performed only once for external tables. Unit: millisecond. Default value: 86400000. |
fragment_transmission_compression_codec | The compression format to be used during data transmission. Default value: lz4. |
huge_table_auto_analyze_interval_in_millis | The minimum interval at which automatic statistics collection is performed for large tables. During the specified interval, automatic statistics collection is performed only once for tables whose size is larger than five times the threshold specified by the |
huge_table_default_sample_rows | The number of rows to be sampled for large tables. Default value: 4194304. |
huge_table_lower_bound_size_in_bytes | The minimum size of the table to be sampled for automatic statistics collection. If the size of a table exceeds this threshold, the table is sampled during automatic statistics collection. Default value: 0. |
ignore_runtime_filter_ids | The runtime filters to be ignored. By default, this variable is empty. |
max_filter_ratio | The maximum ratio of data that can be filtered out. For example, data is filtered out because it does not conform to standards. Default value: 0. |
max_instance_num | The maximum number of Pipeline tasks that can be concurrently run for an SQL query if the |
parallel_pipeline_task_num | The number of Pipeline tasks that are concurrently run for an SQL query. Default value: 0. If you set this variable to 0, ApsaraDB for SelectDB automatically detects the number of CPU cores on each BE node and sets the number of concurrent Pipeline tasks to half the number of CPU cores, which cannot exceed the value of the |
runtime_filter_type | The type of the runtime filter to be used. The following runtime filters are available: Bloom filter, MinMax filter, IN predicate, IN or Bloom filter, and bitmap filter. By default, the IN or Bloom filter is used. In some cases, the performance is improved if you use the Bloom filter, MinMax filter, and IN predicate simultaneously. |
runtime_filter_mode | The pushdown policy of the runtime filter. The OFF, LOCAL, and GLOBAL policies are available. By default, the GLOBAL policy is used. |
runtime_filter_wait_time_ms | The duration that the scan node of the left table waits for each runtime filter. Unit: millisecond. Default value: 1000. |
runtime_filters_max_num | The maximum number of Bloom filters that can be used in each query. Default value: 10. |
runtime_bloom_filter_min_size | The minimum length of the Bloom filter. Unit: bytes. Default value: 1048576, which specifies 1 MB. |
runtime_bloom_filter_max_size | The maximum length of the Bloom filter. Unit: bytes. Default value: 16777216, which specifies 16 MB. |
runtime_bloom_filter_size | The default length of the Bloom filter. Unit: bytes. Default value: 2097152, which specifies 2 MB. |
runtime_filter_max_in_num | The maximum number of rows in the right table. If the number of rows in the right table exceeds this limit, no IN predicate can be generated. Default value: 1024. |
parallel_fragment_exec_instance_num | The number of instances that process scan ranges on each BE node. Default value: 1. An execution plan generates a set of scan ranges to specify the range of data to be scanned. The data is distributed across multiple BE nodes. A BE node may have one or more scan ranges. By default, a set of scan ranges on each BE node is processed by only one instance. If machine resources are sufficient, you can modify this variable to allow more instances to process a set of scan ranges on each BE node. This improves query efficiency. The number of instances that process scan ranges determines the number of other upstream nodes, such as aggregate nodes and join nodes. This also increases the concurrency of the execution plan. You can modify this variable to improve the efficiency of large queries. However, the increased concurrency consumes more machine resources, such as CPU, memory, and disk I/O. |
query_timeout | The timeout period of a query. This variable applies to all query statements in the current connection. We recommend that you use the |
insert_timeout | The timeout period of the INSERT statement. This variable applies only to the INSERT statement. We recommend that you use this variable if it takes an extended period of time to execute the INSERT statement. Unit: seconds. Default value: 14400, which specifies 4 hours. In earlier versions, the value of the |
send_batch_parallelism | The default number of concurrent tasks that send the data to be processed in batches for the |
sql_mode | The SQL mode, which helps accommodate different styles of SQL syntax and data validation strictness. For more information, see SQL Mode. |
sql_select_limit | The default number of rows returned by the SELECT statement, including the SELECT clause of the INSERT statement. By default, no limit is set. |
time_zone | The time zone of the current session. The default value equals the value of the |
wait_timeout | The timeout period of an idle connection. If the idle connection does not interact with ApsaraDB for SelectDB within the timeout period, ApsaraDB for SelectDB closes the connection. Unit: seconds. Default value: 28800, which specifies 8 hours. |
default_rowset_type | The default storage format of the storage engine on compute nodes. Valid values: alpha and beta. Default value: beta. |
use_v2_rollup | Specifies whether to obtain data by using ROLLUP indexes in the Segment V2 storage format during queries. This variable is used for verification when the Segment V2 feature is used. In other cases, we recommend that you do not use this variable. |
prefer_join_method | The preferred join method between Broadcast Join and Shuffle Join if the cost of Broadcast Join equals the cost of Shuffle Join. Valid values: broadcast and shuffle. |
insert_visible_timeout_ms | The timeout period during which ApsaraDB for SelectDB waits for the transaction to commit and data becomes visible after the query and insert operations of the INSERT statement are complete. Default value: 10000. The minimum value is 1000. |
cpu_resource_limit | The resource overhead of a query. This feature is experimental. This variable limits the number of scan threads for a query on a single node. After the number of scan threads is limited, the underlying layer returns data at a lower speed. This way, the overall computing resource overhead of the query can be limited. If you set this variable to 2, a query can use up to two scan threads on a single node. This variable overrides the This variable is overridden by the Default value: -1, which specifies no limits. |
return_object_data_as_binary | Specifies whether to return data in the bitmap or HLL format in the results of the SELECT statement. After you execute the SELECT INTO OUTFILE statement, if the exported file is in the CSV format, data in the bitmap or HLL format is encoded in Base64. If the exported file is in the Parquet format, data in the bitmap or HLL format is stored as a byte array. The following sample code provides an example in Java. For more information, see samples. |
block_encryption_mode | The block encryption mode. Default value: NULL. A value of Valid values: |
default_password_lifetime | The default expiration time of a password. Default value: 0, which specifies that the password never expires. Unit: day. This variable takes effect only if the PASSWORD_EXPIRE policy of a user is set to DEFAULT. Examples: |
password_history | The default number of historical passwords. Default value: 0, which specifies no limits. This variable takes effect only if the PASSWORD_HISTORY policy of a user is set to DEFAULT. Examples: |
validate_password_policy | The password strength verification policy. Default value: NONE or 0, which specifies no verification. Valid values: STRONG and 2. If you set this variable to STRONG or 2, the password that you set by executing the Special characters include: |
rewrite_or_to_in_predicate_threshold | The default number of OR operators for rewriting OR to IN. Default value: 2, which specifies that two OR operators can be rewritten as the IN predicate. |
file_cache_base_path | The storage path of the block file cache on the BE. Default value: random, which specifies that the storage path of BE configurations is randomly selected. |
topn_opt_limit_threshold | The threshold of the Limit clause for top N query optimization features. Default value: 1024. In this example, the |
use_fix_replica | Specifies whether to use a fixed replica for queries. The value starts from 0. If you set the Default value: -1, which specifies that this feature is disabled. |
Compatible or reserved variables
This type of variable has no practical effect.
Variable | Description |
SQL_AUTO_IS_NULL | This variable is used to ensure the compatibility with the JDBC connection pool c3p0. It has no practical effect. |
auto_increment_increment | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. Although ApsaraDB for SelectDB supports auto-increment columns, this variable does not affect the behavior of auto-increment columns. This also applies to the |
autocommit | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
character_set_client | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
character_set_connection | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
character_set_results | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
character_set_server | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
collation_connection | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
collation_database | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
collation_server | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
have_query_cache | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
init_connect | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
interactive_timeout | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
language | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
max_allowed_packet | This variable is used to ensure the compatibility with the JDBC connection pool c3p0. It has no practical effect. |
net_buffer_length | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
net_read_timeout | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
net_write_timeout | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
query_cache_size | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
query_cache_type | This variable is used to ensure the compatibility with the JDBC connection pool c3p0. It has no practical effect. |
sql_safe_updates | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
tx_isolation | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
tx_read_only | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
transaction_read_only | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
transaction_isolation | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
version | This variable is used to ensure the compatibility with MySQL clients. It has no practical effect. |
performance_schema | This variable is used to ensure the compatibility with MySQL JDBC 8.0.16 and later. It has no practical effect. |
codegen_level | This variable is used to set the level of LLVM codegen. It does not take effect. |
license | This variable is used to display the license of ApsaraDB for SelectDB. It has no other effects. |
resource_group | This variable is reserved. |
system_time_zone | This variable is set to the system time zone when a cluster is initialized. This variable cannot be modified. |
version_comment | This variable is used to display the version of ApsaraDB for SelectDB. This variable cannot be modified. |
dump_nereids_memo | This variable is used for regression testing. |
group_concat_max_len | This variable is used to ensure the compatibility with some business intelligence (BI) tools. It has no practical effect. |
be_number_for_test | This variable has no practical effect. |
workload_group | This variable is reserved. |
Timeout settings
ApsaraDB for SelectDB allows you to configure timeout settings by using the query_timeout and insert_timeout variables, and the query_timeout and insert_timeout user properties.
Timeout settings take effect based on the following priorities in descending order: Session variables > User properties > Global variables > Default values. If timeout settings with a higher priority are not configured, those with a lower priority automatically take effect.
query_timeout is used to control the timeout period of all statements. insert_timeout is used to control the timeout period of the INSERT statement. When you execute the INSERT statement, the longer timeout period between those configured by query_timeout and insert_timeout is used.
The query_timeout and insert_timeout user properties are configured only by the admin user for a specific user. These two user properties are used to change the default timeout settings of the specified user and do not have quota semantics. The timeout settings configured in user properties take effect after a client is reconnected.
References
For more information about time zone formats, see List of tz database time zones.
For more information about the Internet Assigned Numbers Authority (IANA) time zone database, see IANA Time Zone Database.
For more information about the ICANN time zone database, see The tz-announce Archives.