All Products
Search
Document Center

E-MapReduce:Configuration items of backend nodes

Last Updated:May 12, 2023

This topic describes the configuration items of backend nodes.

Background information

In most cases, the be.conf configuration file of a backend node resides in the conf/ directory where the backend node is deployed. In V0.14, another configuration file named be_custom.conf is also used. The be_custom.conf configuration file is used to specify the dynamically added and persistent configuration items for the running backend node. After the backend node is run, the configuration items in the be.conf and be_custom.conf configuration files are read by sequence. If a configuration item appears in both configuration files, the content in the be_custom.conf configuration file prevails.

View configuration items

To view the current configuration items of the backend node, see the following webpage: http://be_host:be_webserver_port/varz

Add configuration items

You can add the configuration items of the backend node in one of the following methods:

  • Statically add configuration items

    Add configuration items to the be.conf configuration file. The configuration items are read when the backend node is run. Default settings are used for configuration items that are excluded from the be.conf configuration file.

  • Dynamically add configuration items

    After the backend node is run, run the following command to dynamically add configuration items:

    curl -X POST http://{be_ip}:{be_http_port}/api/update_config?{key}={value}'

In V0.13 and earlier, configuration items that are dynamically added become invalid when the backend node is rerun. In V0.14 and later, you can run the following command to make the dynamically added configuration items persist. The persistent configuration items are stored in the be_custom.conf configuration file.

curl -X POST http://{be_ip}:{be_http_port}/api/update_config?{key}={value}&persist=true

Examples

  • Statically add the max_base_compaction_concurrency configuration item. After you add the max_base_compaction_concurrency=5 configuration item to the be.conf file, rerun the backend node to apply the configuration item.

  • Dynamically add the streaming_load_max_mb configuration item. After the backend node is run, run the following command to dynamically add the configuration item:

    streaming_load_max_mb:curl -X POST http://{be_ip}:{be_http_port}/api/update_config?streaming_load_max_mb=1024

    If the following information is returned, the configuration item is added:

    {
        "status": "OK",
        "msg": ""
    }

    The configuration item becomes invalid after the backend node is rerun. To make the configuration item persist, run the following command:

    curl -X POST http://{be_ip}:{be_http_port}/api/update_config?streaming_load_max_mb=1024\&persist=true

Configuration Items

alter_tablet_worker_count

  • Default value: 3.

  • Description: the number of threads for schema changes.

generate_compaction_tasks_min_interval_ms

  • Default value: 10.

  • Description: the minimum interval at which compaction tasks are generated. Unit: milliseconds.

enable_vectorized_compaction

  • Default value: true.

  • Description: specifies whether to enable vectorized compaction.

base_compaction_interval_seconds_since_last_operation

  • Default value: 86400.

  • Description: the interval between the time of the last base compaction and the current time. The configuration item is one of the conditions that trigger base compaction.

base_compaction_num_cumulative_deltas

  • Default value: 5.

  • Description: the limit of Cumulative files. The configuration item is one of the conditions that trigger base compaction. Base compaction is triggered after the limit is reached.

base_compaction_write_mbytes_per_sec

  • Default value: 5.

  • Description: the maximum disk write speed per second of a base compaction task. Unit: MB.

base_cumulative_delta_ratio

  • Default value: 0.3.

  • Description: the ratio of Cumulative files whose sizes are equal to or greater than the sizes of Base files. The configuration item is one of the conditions that trigger base compaction.

base_compaction_trace_threshold

  • Default value: 10.

  • Type: int32.

  • Description: the threshold of logging the trace information of base compaction. Unit: seconds.

A base compaction task is a time-consuming background task. To trace the runtime information of the task, you can set the configuration item to manage the logging of the trace information. The following code provides an example on the trace information in the log file:

W0610 11:26:33.804431 56452 storage_engine.cpp:552] Trace:
0610 11:23:03.727535 (+     0us) storage_engine.cpp:554] start to perform base compaction
0610 11:23:03.728961 (+  1426us) storage_engine.cpp:560] found best tablet 546859
0610 11:23:03.728963 (+     2us) base_compaction.cpp:40] got base compaction lock
0610 11:23:03.729029 (+    66us) base_compaction.cpp:44] rowsets picked
0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:24:51.784818 (+   379us) compaction.cpp:74] prepare finished
0610 11:26:33.359265 (+101574447us) compaction.cpp:87] merge rowsets finished
0610 11:26:33.484481 (+125216us) compaction.cpp:102] output rowset built
0610 11:26:33.484482 (+     1us) compaction.cpp:106] check correctness finished
0610 11:26:33.513197 (+ 28715us) compaction.cpp:110] modify rowsets finished
0610 11:26:33.513300 (+   103us) base_compaction.cpp:49] compaction finished
0610 11:26:33.513441 (+   141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue
Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_segments_num":44,"merge_rowsets_latency_us":101574444,"merged_rows":0,"output_row_num":3346807,"output_rowset_data_size":1228439659,"output_segments_num":6}

be_port

  • Default value: 9060.

  • Type: int32.

  • Description: the port of the thrift server on the backend node to receive requests from the frontend node.

be_service_threads

  • Default value: 64.

  • Type: int32.

  • Description: the number of execution threads of the thrift server service on the backend node, which represents the number of threads that can be used to execute requests from the frontend node.

brpc_max_body_size

Description: This configuration item is used to modify the max_body_size parameter of bRPC. Sometimes a query fails and the error message "body_size is too large" appears in the log file of the backend node. This may occur when the SQL mode is "multi distinct, no GROUP BY condition, and having more than 1 TB of data". This error indicates that the package size of bRPC exceeds the configured value. In this case, you can prevent this error by increasing the value of this configuration item.

brpc_socket_max_unwritten_bytes

Description: This configuration item is used to modify the socket_max_unwritten_bytes parameter of bRPC. Sometimes a query fails and the error message "The server is overcrowded" appears in the log file of the backend node, which indicates a large number of messages buffer at the sender side. This may occur when the SQL needs to send a large bitmap value. You can prevent this error by increasing the value of this configuration item.

transfer_large_data_by_brpc

  • Default value: true.

  • Type: bool.

  • Description: specifies whether to serialize the protoBuf request and nest the Tuple/Block data into the controller attachment before you send it through HTTP bRPC when the length of the Tuple/Block data is greater than 1.8 GB. When the length of the protoBuf request exceeds 2 GB, the following error message occurs: Bad request, error_text=[E1003]Fail to compress request. To prevent this error, in the previous version, the Tuple/Block data is sent through the default baidu_std bRPC after it is put in the attachment. However, when the size of the attachment exceeds 2 GB, the Tuple/Block data is truncated and no limit is placed on the length of the protoBuf request.

brpc_num_threads

Description: the number of threads for bRPC. Default value: -1, which means the number of threads equals the number of CPU cores. To ensure good queries-per-second (QPS) performance, you can set this configuration item to a greater value. For more information, see Incubator-brpc at GitHub.

brpc_port

  • Default value: 8060.

  • Type: int32.

  • Description: the port of bRPC on the backend node, which is used for communication between backend nodes.

buffer_pool_clean_pages_limit

  • Default value: 20.

  • Description: the number of pages to be cleaned and may be saved in the buffer pool.

buffer_pool_limit

  • Default value: 20%.

  • Type: string.

  • Description: the largest proportion of allocatable memories in the buffer pool.

The buffer pool is a new memory management structure of the backend node. The buffer pool uses buffer pages to manage memories and spills data to disk. The memories for all concurrent queries are allocated from the buffer pool. The current buffer pool works only on AggregationNode and ExchangeNode.

check_auto_compaction_interval_seconds

  • Default value: 5.

  • Type: int32.

  • Description: the interval at which the disenabling of automatic compaction is checked when automatic compaction is disabled.

check_consistency_worker_count

  • Default value: 1.

  • Description: the number of worker threads for calculating the checksum of the tablet.

chunk_reserved_bytes_limit

  • Default value: 20%.

  • Type: int32.

  • Description: the reserved bytes limit of Chunk Allocator, which is usually set as a percentage of mem_limit. Unit: byte. The value must be a positive multiple of 2. If the value is greater than the physical memory size, the physical memory size is set to this value. If you increase the value of this configuration item, the performance is increased but more memories that cannot be used by other modules are idled.

clear_transaction_task_worker_count

  • Default value: 1.

  • Description: the number of threads that are used to clean up transactions.

clone_worker_count

  • Default value: 3.

  • Type: int32.

  • Description: the number of threads that are used to run cloning tasks.

cluster_id

  • Default value: -1.

  • Type: int32.

  • Description: the ID of the cluster to which the backend node belongs.

The value of this configuration item is usually delivered from the frontend node to the backend node by the heartbeat. You do not need to set this configuration item. The configuration item can be set only after a Doris cluster is specified for the backend node. Make sure that the content in the cluster_id file under the data directory is changed to the value of this configuration item.

column_dictionary_key_ratio_threshold

  • Default value: 0.

  • Description: the threshold for the ratio of string values. If the ratio is less than the threshold, the dictionary compression algorithm is used.

column_dictionary_key_size_threshold

  • Default value: 0.

  • Description: the threshold for the size of the dictionary compression column. If the size is less than the threshold, the dictionary compression algorithm is used.

compaction_tablet_compaction_score_factor

  • Default value: 1.

  • Type: int32.

  • Description: the compaction score when you calculate the tablet score to find a tablet for compaction.

compaction_tablet_scan_frequency_factor

  • Default value: 0.

  • Type: int32.

  • Description: the tablet scan frequency when you calculate the tablet score to find a tablet for compaction.

This configuration item can be taken into consideration when you select a tablet for compaction and preferentially compact tablets that are frequently scanned within the latest time period. The tablet score can be calculated based on the following formula:

tablet_score = compaction_tablet_scan_frequency_factor tablet_scan_frequency + compaction_tablet_compaction_score_factor compaction_score

compaction_task_num_per_disk

  • Default value: 2.

  • Type: int32.

  • Description: the number of compaction tasks that are run in parallel for a hard disk drive (HDD).

compaction_task_num_per_fast_disk

  • Default value: 4.

  • Type: int32.

  • Description: the number of compaction tasks that are run in parallel for a fast solid state drive (SSD).

compress_rowbatches

  • Default value: true.

  • Type: bool.

  • Description: specifies whether to use the Snappy compression algorithm for data compression when RowBatch is serialized.

create_tablet_worker_count

  • Default value: 3.

  • Description: the number of worker threads used by the backend node to create a tablet.

cumulative_compaction_rounds_for_each_base_compaction_round

  • Default value: 9.

  • Type: int32.

  • Description: the number of rounds of cumulative compaction that are consecutively generated before each round of base compaction is generated.

disable_auto_compaction

  • Default value: false.

  • Type: bool.

  • Description: specifies whether to disable automatic compaction.

In most cases, automatic compaction needs to be disabled. If you want to manually run compaction tasks in the debugging or test environment, disable automatic compaction.

cumulative_compaction_budgeted_bytes

  • Default value: 104857600.

  • Description: one of the conditions that trigger base compaction. The Singleton file can be up to 100 MB in size.

cumulative_compaction_trace_threshold

  • Default value: 2.

  • Type: int32.

  • Description: the threshold for logging the trace information of cumulative compaction. Unit: seconds.

disable_compaction_trace_log

  • Default value: true.

  • Type: bool.

  • Description: specifies whether to disable the trace log of compaction.

If you set this configuration item to true, the cumulative_compaction_trace_threshold and base_compaction_trace_threshold configuration items do not take effect and the trace log of compaction is disabled.

cumulative_compaction_policy

  • Default value: size_based.

  • Type: string.

  • Description: the merge policy of cumulative compaction. Currently, the num_based and size_based merge policies are implemented. Valid values:

    • ordinary: the initial version of the cumulative compaction merge policy. After the cumulative compaction stage, the base compaction stage is directly entered.

    • size_base: an optimized version of the ordinary strategy. Versions are merged only if the disk volume of the rowset is of the same level of magnitude. After the versions are merged, the rowset that satisfies the conditions is promoted to the base compaction stage. If a large number of small-batch imports are performed, the write magnification of base compaction is reduced, read magnification and space magnification are balanced, and the data of file versions is reduced.

cumulative_size_based_promotion_size_mbytes

  • Default value: 1024.

  • Type: int64.

  • Description: If the size_based policy is implemented and the total disk size of the output rowset of cumulative compaction exceeds the value of this configuration item, the rowset is used for base compaction. Unit: MB.

In most cases, this configuration item is set to a value less than 2048 (2 GB). This prevents a version backlog caused by long-time cumulative compaction.

cumulative_size_based_promotion_ratio

  • Default value: 0.05.

  • Type: double.

  • Description: If the size_based policy is implemented and the total disk size of the output rowset of cumulative compaction exceeds the value of this configuration item, the rowset is used for base compaction.

In most cases, we recommend that you set this configuration item to a value from 0.02 to 0.1.

cumulative_size_based_promotion_min_size_mbytes

  • Default value: 64.

  • Type: int32.

  • Description: If the size_based policy is implemented and the total disk size of the output rowset of cumulative compaction is smaller than the value of this configuration item, the rowset is not used for base compaction and remains in the cumulative compaction stage. Unit: MB.

In most cases, we recommend that you set this configuration item to a value less than 512. If the value you set is too large, base compaction is not performed.

cumulative_size_based_compaction_lower_size_mbytes

  • Default value: 64.

  • Type: int64.

  • Description: If the size_based strategy is implemented, the cumulative compaction is merged, and the selected rowsets to be merged have a disk size larger than the value of this configuration item, the rowsets are divided and merged based on a level policy. If the disk size is smaller than the value of this configuration item, the rowsets are directly merged. Unit: MB.

In most cases, this configuration item is set to a value less than 128. If the value you set is too large, cumulative compaction writes are excessively amplified.

custom_config_dir

The directory to which the be_custom.conf file belongs. Default value: conf/. In some deployment environments, the conf/ directory may be overwritten due to system upgrades. In this case, the configuration items on persistence may be overwritten when the backend node is run. To resolve this problem, you can specify another directory for the be_custom.conf file to be stored.

default_num_rows_per_column_file_block

  • Default value: 1024.

  • Type: int32.

  • Description: the number of rows of data that are contained in a RowBlock.

default_rowset_type

  • Default value: BETA.

  • Type: string.

  • Description: the default storage format specified by the backend node. Valid values: ALPHA and BETA. This configuration item is set mainly for the following two purposes:

    • Specifying the storage format for the backend node when the storage_format parameter of the table is set to Default.

    • Specifying the storage format for the backend node when the backend node performs compaction.

delete_worker_count

  • Default value: 3.

  • Description: the number of threads that are used to run data deletion tasks.

disable_mem_pools

  • Default value: false.

  • Description: specifies whether to disable the memory cache pool. By default, the memory cache pool is disabled.

disable_storage_page_cache

  • Default value: false.

  • Type: bool.

  • Description: specifies whether to use page cache for index caching. This configuration item takes effect only if the default_rowset_type configuration item is set to BETA.

disk_stat_monitor_interval

  • Default value: 5.

  • Description: the interval between two consecutive checks for disk status. Unit: seconds.

doris_cgroups

Description: the cgroups assigned to Doris.

doris_max_pushdown_conjuncts_return_rate

  • Default value: 90.

  • Type: int32.

  • Description: When the backend node performs HashJoin, the backend node adopts a dynamic partitioning method to push the join condition to OlapScanner. If more than 32768 rows of data are scanned by OlapScanner, the backend node checks the filter condition. If the filter rate of the filter condition is lower than the value of this configuration item, Doris stops using the dynamic partition clipping condition for data filtering.

doris_max_scan_key_num

  • Default value: 1024.

  • Type: int.

  • Description: This configuration item is used to limit the maximum number of scan keys that a scan node can split in a query request. When a conditional query request reaches the scan node, the scan node splits the conditions related to the key column in the query condition into multiple scan key ranges. Then, these scan key ranges are assigned to multiple scanner threads for data scanning. A greater value indicates that more scanner threads can be used to increase the parallelism of the scanning operation. However, in high concurrency scenarios, too many threads may cause great scheduling overhead and system load, and may slow down the query response. An empirical value is 50.

In high concurrency scenarios, if the parallelism cannot be increased, you can reduce the value of this configuration item and observe the impact.

doris_scan_range_row_count

  • Default value: 524288.

  • Type: int32.

  • Description: When the backend node performs data scanning, the backend node splits the same scanning range into multiple ScanRanges. This configuration item specifies the scan data range of each ScanRange. You can set this configuration item to limit the time period during which a single OlapScanner occupies I/O threads.

doris_scanner_queue_size

  • Default value: 1024.

  • Type: int32.

  • Description: the length of the RowBatch buffer queue between TransferThread and OlapScanner. Doris asynchronously performs data scanning. The Rowbatch scanned by OlapScanner is placed in the scanner buffer queue and waits to be taken away by TransferThread in the upper layer.

doris_scanner_row_num

  • Default value: 16384.

  • Description: the maximum number of data rows returned by each scanning thread in a single execution.

doris_scanner_thread_pool_queue_size

  • Default value: 102400.

  • Type: int32.

  • Description: the queue length of the Scanner thread pool. In the scanning tasks of Doris, each Scanner is submitted to the thread pool as a thread task, waiting to be scheduled. After the number of submitted tasks exceeds the length of the thread pool queue, tasks that are subsequently submitted are blocked until an empty slot appears in the queue.

doris_scanner_thread_pool_thread_num

  • Default value: 48.

  • Type: int32.

  • Description: the number of threads in the Scanner thread pool. In the scanning tasks of Doris, each Scanner is submitted to the thread pool to be scheduled as a thread task. This configuration item determines the size of the Scanner thread pool.

download_low_speed_limit_kbps

  • Default value: 50.

  • Description: the minimum download speed. Unit: KB/s.

download_low_speed_time

  • Default value: 300.

  • Description: the limit of the download time. Unit: seconds.

download_worker_count

  • Default value: 1.

  • Description: the number of download threads.

drop_tablet_worker_count

  • Default value: 3.

  • Description: the number of threads that are used to delete tablets.

enable_metric_calculator

  • Default value: true.

  • Description: A value of true indicates that the metric calculator runs to collect the information about backend node-related indicators. A value of false indicates that the metric calculator does not run.

enable_partitioned_aggregation

  • Default value: true.

  • Type: bool.

  • Description: specifies whether the backend node implements the aggregation operation by PartitionAggregateNode. A value of false indicates that AggregateNode is executed to perform the aggregation. We recommend that you do not set this configuration item to false unless otherwise specified.

enable_prefetch

  • Default value: true.

  • Type: bool.

  • Description: specifies whether to perform a HashBuket prefetch when PartitionedHashTable is used for aggregation and join calculations. We recommend that you set this configuration item to true.

enable_quadratic_probing

  • Default value: true.

  • Type: bool.

  • Description: specifies whether to use the square detection method to resolve a Hash conflict when PartitionedHashTable is used. If the value is false, the linear detection method is used to resolve the Hash conflict. For information about square detection, see Quadratic probing at Wikipedia.

enable_system_metrics

  • Default value: true.

  • Description: specifies whether to turn on system indicators.

enable_token_check

  • Default value: true.

  • Description: specifies whether to be used for forward compatibility and removed later.

enable_stream_load_record

  • Default value: false.

  • Description: specifies whether to enable stream load record. By default, the feature is disabled.

es_http_timeout_ms

  • Default value: 5000.

  • Description: the timeout period for connecting to ES via HTTP. Unit: milliseconds.

es_scroll_keepalive

  • Default value: 5.

  • Description: the time period during which es scroll keeps alive. Unit: minutes.

etl_thread_pool_queue_size

  • Default value: 256.

  • Description: the size of the extract, transform, and load (ETL) thread pool.

exchg_node_buffer_size_bytes

  • Default value: 10485760.

  • Type: int32.

  • Description: the size of the Buffer queue of ExchangeNode. Unit: byte. After the size of data sent from the Sender side is larger than the Buffer size of ExchangeNode, data that is subsequently sent blocks until the Buffer frees up space for data writes.

file_descriptor_cache_capacity

  • Default value: 32768.

  • Description: the number of cached file handles.

cache_clean_interval

  • Default value: 1800.

  • Description: the interval at which file handle caches are cleaned. File handles that have not been used for a long time are cleaned. This configuration item also determines the interval at which Segment Cache is cleaned. Unit: seconds.

flush_thread_num_per_store

  • Default value: 2.

  • Description: the number of threads that are used to refresh the memory table in each store.

fragment_pool_queue_size

  • Default value: 2048.

  • Description: the upper limit of query requests that can be processed on a single node.

fragment_pool_thread_num_min

  • Default value: 64.

  • Description: the minimum number of threads in a fragment execution thread pool.

fragment_pool_thread_num_max

  • Default value: 256.

  • Description: the number of query threads. By default, a minimum of 64 threads are started. Subsequent query requests dynamically create threads, and a maximum of 256 threads are created.

heartbeat_service_port

  • Default value: 9050.

  • Type: int32.

  • Description: the heartbeat service port (thrift) on the backend node, which is used to receive heartbeats from the frontend node.

heartbeat_service_thread_count

  • Default value: 1.

  • Type: int32.

  • Description: the number of threads that execute the heartbeat service on the backend node. We recommend that you do not modify the default value.

ignore_broken_disk

  • Default value: false.

  • Description: When the backend node is run, the directories specified by the storage_root_path configuration item are checked.

    • Scenarios in which the value is true: If the directories do not exist or cannot be accessed because of disk damage, the directory is ignored. If other directories are available, the backend node can be normally run.

    • Scenarios in which the value is false: If the directories do not exist or cannot be accessed because of disk damage, the backend node is stopped.

ignore_load_tablet_failure

  • Default value: false.

  • Type: bool.

  • Description: specifies whether to ignore errors in the directories specified by the storage_root_path configuration item when the backend node is run.

When the backend node is run, a thread is started for each directory to load the metadata of the tablet header. By default, if the metadata fails to be loaded in a directory, the backend node is stopped and the following error message is returned to the be.INFO log:

load tablets from header failed, failed tablets size: xxx, path=xxx

The "failed tablets size: xxx" information indicates the number of tablets whose metadata fails to be loaded. The log also records the details of the tablets. Manual intervention is required to troubleshoot the error. You can use one of the following methods to resolve the error:

  • If the metadata cannot be repaired and other copies are normal, you can use the meta_tool tool to delete the tablets whose metadata cannot be loaded.

  • Set this configuration item to true. This way, the error is ignored and the backend node is normally run.

ignore_rowset_stale_unconsistent_delete

  • Default value: false.

  • Type: bool.

  • Description: specifies whether to delete the outdated merged rowset if the rowset cannot form a consistent version path.

The outdated merged rowset is deleted after half an hour. In abnormal scenarios, if the outdated merged rowset is deleted, the consistent path of the query cannot be constructed. If the value of this configuration item is false, the program check is strict and the program exits after it reports an error. If the value of this configuration item is true, the program ignores the error and normally runs. In most cases, if the error is ignored, queries are not affected, but the -230 error is reported when the merged version is issued in fe.

inc_rowset_expired_sec

  • Default value: 1800.

  • Description: the time period used to import activated data and store the engine in incremental cloning. Unit: seconds.

index_stream_cache_capacity

  • Default value: 10737418240.

  • Description: the capacity to cache statistics such as the information about BloomFilter, Min, and Max.

kafka_api_version_request

  • Default value: true.

  • Description: If the dependent Kafka version is lower than 0.10.0.0, set this configuration item to false.

kafka_broker_version_fallback

  • Default value: 0.10.0.

  • Description: If the dependent Kafka version is lower than 0.10.0.0 and the kafka_api_version_request configuration item is set to false, the value specified by this configuration item is applied. Valid values: 0.9.0.x and 0.8.x.y.

load_data_reserve_hours

  • Default value: 4.

  • Description: This configuration item is used for mini load. The mini load data file is deleted after the time period specified by this configuration item is passed. Unit: hours.

load_error_log_reserve_hours

  • Default value: 48.

  • Description: the time period after which the load error log is deleted. Unit: hours.

load_process_max_memory_limit_bytes

  • Default value: 107374182400, which represents 100 GB.

  • Description: the maximum sizes of memories occupied by all imported threads on a single node. We recommend that you set this configuration item to a large value to ensure the load performance when you upgrade Doris.

load_process_max_memory_limit_percent

  • Default value: 80.

  • Description: the maximum percentage of the memories occupied by all imported threads on a single node. We recommend that you set this configuration item to a large value to ensure the load performance when you upgrade Doris.

log_buffer_level

Description: the log flushing strategy. By default, the strategy of "keeping in memories" is applied.

madvise_huge_pages

  • Default value: false.

  • Description: specifies whether to use the huge pages of Linux memories. By default, the huge pages of Linux memories are not used.

make_snapshot_worker_count

  • Default value: 5.

  • Description: the number of threads that are used to make snapshots.

max_client_cache_size_per_host

  • Default value: 10.

  • Description: the maximum number of client caches in each host. The backend node supports multiple types of client caches. Currently, client caches of the same specification are used. If necessary, different types of client caches are used.

max_base_compaction_threads

  • Default value: 4.

  • Type: int32.

  • Description: the maximum number of threads in the thread pool of base compaction.

max_cumu_compaction_threads

  • Default value: 10.

  • Type: int32.

  • Description: the maximum number of threads in the thread pool of cumulative compaction.

max_consumer_num_per_group

  • Default value: 3.

  • Description: the maximum number of consumers in a data consumer group, which is used for routine load.

min_cumulative_compaction_num_singleton_deltas

  • Default value: 5.

  • Description: the minimum number of incremental files, which is used as a cumulative compaction strategy.

max_cumulative_compaction_num_singleton_deltas

  • Default value: 1000.

  • Description: the maximum number of incremental files, which is used as a cumulative compaction strategy.

max_download_speed_kbps

  • Default value: 50000.

  • Description: the maximum download speed. Unit: KB/s.

max_free_io_buffers

  • Default value: 128.

  • Description: The I/O buffers reserved by IoMgr can be 1024 bytes to 8 MB in size. The overall size of all I/O buffers reserved by IoMgr can be up to 2 GB.

max_garbage_sweep_interval

  • Default value: 3600, which represents an hour.

  • Description: the maximum interval at which disk garbage is cleaned.

max_memory_sink_batch_count

  • Default value: 20.

  • Description: the maximum number of batches for external scan caches. This configuration item is usually set together with the max_memory_cache_batch_count batch_size row configuration item whose default value is 1024. If the default settings of the two configuration items are applied, 20 batches of external scan caches are performed, and 1024 rows of data are cached in each batch.

max_percentage_of_error_disk

  • Default value: 0.

  • Type: int32.

  • Description: the percentage of damaged hard disks allowed in the storage engine. If the percentage exceeds the value of this configuration item, the backend node is stopped.

max_pushdown_conditions_per_column

  • Default value: 1024.

  • Type: int.

  • Description: the maximum number of conditions that can be pushed down to the storage engine for a single column in a query request. When the query plan is executed, the filter conditions on some columns can be pushed down to the storage engine. This way, the index information in the storage engine can be used to filter data. As a result, the amount of data that needs to be scanned in the query is reduced. Conditions available in the query include equivalent conditions and conditions in the IN predicate. In most cases, this configuration item takes effect only on queries that contain the IN predicate, such as WHERE colA IN (1,2,3,4,...). A greater value indicates that more conditions in the IN predicate can be pushed to the storage engine. However, too many conditions may cause an increase in random reads, and in some cases may reduce the query efficiency. This configuration item can be individually configured at the session level. For more information, see Variable.

In this example, the table structure is id INT, col2 INT, col3 varchar (32), .... The query statement includes ... WHERE id IN (v1, v2, v3, ...). If the number of conditions in the IN predicate exceeds the value of this configuration item, increase the value of this configuration item and observe whether the query response efficiency is improved.

max_runnings_transactions_per_txn_map

  • Default value: 100.

  • Description: the maximum number of txns for each txn_partition_map in the txn manager. This configuration item is used to prevent too many txns from being saved in the txn manager.

max_send_batch_parallelism_per_job

  • Default value: 5.

  • Type: int.

  • Description: the maximum concurrency of sending data from OlapTableSink for batch processing. The value of send_batch_parallelism must not exceed the value of this configuration item. Otherwise, the value of this configuration item is changed to the value of send_batch_parallelism.

max_tablet_num_per_shard

  • Default value: 1024.

  • Description: the number of tablets in each shard. This configuration item is used to plan the layout of the tablet and prevent a directory from containing an excessive number of table subdirectories.

max_tablet_version_num

  • Default value: 500.

  • Type: int.

  • Description: the maximum number of versions of a tablet. This configuration item is used to prevent a large number of version accumulation problems caused by frequent import or untimely compaction. If the number of versions exceeds the value of this configuration item, the import task is rejected.

mem_limit

  • Default value: 80%.

  • Type: string.

  • Description: the maximum percentage of the server's memories used by the backend node. This configuration item is used to prevent the backend node memories from occupying too many memories of the machine. The value must be 0 or a positive percentage. If the value exceeds 100%, 100% is adopted as the effective value.

memory_limitation_per_thread_for_schema_change

  • Default value: 2.

  • Description: the maximum memory size allowed for a schema change task. Unit: GB.

memory_maintenance_sleep_time_s

  • Default value: 10.

  • Description: the hibernation time period between memory iterations. Unit: seconds.

memory_max_alignment

  • Default value: 16.

  • Description: the maximum memory size for alignment.

read_size

  • Default value: 8388608.

  • Description: the size of the read data that is sent to os. A trade-off is made between the latency and the whole process to keep the disk busy without introducing seeks. If 8 MB of data is read, random I/Os and sequential I/Os have similar performance.

min_buffer_size

  • Default value: 1024.

  • Description: the minimum read buffer size. Unit: bytes.

min_compaction_failure_interval_sec

  • Default value: 5.

  • Type: int32.

  • Description: During cumulative compaction, when the selected tablet fails to be merged, the selected tablet waits for a time period before it may be selected again. The waiting time period is the value of this configuration item. Unit: seconds.

min_compaction_threads

  • Default value: 10.

  • Type: int32.

  • Description: the minimum number of threads in the compaction thread pool.

min_file_descriptor_number

  • Default value: 60000.

  • Description: the minimum number of file handles for the backend node.

min_garbage_sweep_interval

  • Default value: 180.

  • Description: the minimum interval at which disk garbage is cleaned. Unit: seconds.

mmap_buffers

  • Default value: false.

  • Description: specifies whether to use mmap to allocate memories. By default, mmap is not used.

num_cores

  • Default value: 0.

  • Type: int32.

  • Description: the number of CPU cores that the backend node can use. A value of 0 indicates that the backend node obtains the number of CPU cores of the machine from /proc/cpuinfo.

num_disks

  • Default value: 0.

  • Description: the number of disks on the machine. A value of 0 indicates that the system settings are applied.

num_threads_per_core

  • Default value: 3.

  • Description: the number of threads that each core runs. In most cases, the value is set to two or three times the number of cores. This keeps the core busy without causing excessive jitters.

num_threads_per_disk

  • Default value: 0.

  • Description: the maximum number of threads for each disk, which is also the maximum queue depth of each disk.

number_tablet_writer_threads

  • Default value: 16.

  • Description: the number of threads for tablet writes.

path_gc_check

  • Default value: true.

  • Description: specifies whether to enable the checks for recycle scan data threads. By default, the checks are enabled.

path_gc_check_interval_second

  • Default value: 86400.

  • Description: the interval between the checks for recycle scan data threads. Unit: seconds.

path_gc_check_step

  • Default value: 1000.

  • Description: the step size of the gc check.

path_gc_check_step_interval_ms

  • Default value: 10.

  • Description: the interval between two gc checks.

path_scan_interval_second

  • Default value: 86400.

  • Description: the interval between the scans that are performed before gc checks.

pending_data_expire_time_sec

  • Default value: 1800.

  • Description: the maximum duration of unvalidated data retained by the storage engine. Unit: seconds.

periodic_counter_update_period_ms

  • Default value: 500.

  • Description: the cycle based on which the rate calculator and the sampling calculator are updated. Unit: milliseconds.

plugin_path

  • Default value: ${DORIS_HOME}/plugin.

  • Description: the directory of the plug-in.

port

  • Default value: 20001.

  • Type: int32.

  • Description: the port used to test the backend node.

pprof_profile_dir

  • Default value: ${DORIS_HOME}/log.

  • Description: the directory in which the pprof profile is stored.

priority_networks

Description: This configuration item is used to declare a selection strategy for servers with many IP addresses. Make sure that at most one IP address should reside within the CIDR blocks in this list. The CIDR blocks must be separated by semicolons (;). For example, if the list contains 10.10.10.0/24 and no IP address resides within the CIDR block, an IP address is randomly selected.

priority_queue_remaining_tasks_increased_frequency

  • Default value: 512.

  • Description: the frequency of adjusting the priority for the tasks in BlockingPriorityQueue.

publish_version_worker_count

  • Default value: 8.

  • Description: the number of threads of effective versions.

pull_load_task_dir

  • Default value: ${DORIS_HOME}/var/pull_load.

  • Description: the directory used to pull the load task.

push_worker_count_high_priority

  • Default value: 3.

  • Description: the number of import threads that are used to process HIGH priority tasks.

push_worker_count_normal_priority

  • Default value: 3.

  • Description: the number of import threads that are used to process NORMAL priority tasks.

push_write_mbytes_per_sec

  • Default value: 10.

  • Type: int32.

  • Description: the data import speed. By default, the maximum speed is 10 MB per second. This configuration item applies to all import methods. Unit: MB.

query_scratch_dirs

  • Default value: ${DORIS_HOME}.

  • Type: string.

  • Description: the directory selected by the backend node to store temporary data when data is spilled to a disk, which is similar to the configuration of the storage directory. Multiple directories are separated by semicolons (;).

release_snapshot_worker_count

  • Default value: 5.

  • Description: the number of threads that are used to release snapshots.

report_disk_state_interval_seconds

  • Default value: 60.

  • Description: the interval at which the agent reports the disk status to the frontend node. Unit: seconds.

report_tablet_interval_seconds

  • Default value: 60.

  • Description: the interval at which the agent reports the olap table to the frontend node. Unit: seconds.

report_task_interval_seconds

  • Default value: 10.

  • Description: the interval at which the agent reports the task signature to the frontend node. Unit: seconds.

result_buffer_cancelled_interval_time

  • Default value: 300.

  • Description: the interval at which result buffers are canceled. Unit: seconds.

routine_load_thread_pool_size

  • Default value: 10.

  • Description: the thread pool size of the routine load task. The value of this configuration must be greater than the max_concurrent_task_num_per_be configuration item whose default value is 5.

row_nums_check

  • Default value: true.

  • Description: specifies whether to check the row numbers for the backend node, CE node, and schema changes. A value of true indicates that the check is enabled. A value of false indicates that the check is disabled.

row_step_for_compaction_merge_log

  • Default value: 0.

  • Type: int64.

  • Dynamically modified: supported.

  • Description: the number of merged data rows when a log is printed during compaction. If this configuration item is set to 0, no log is printed during compaction.

scan_context_gc_interval_min

  • Default value: 5.

  • Description: the scheduling cycle of the context gc thread. Unit: minutes.

send_batch_thread_pool_thread_num

  • Default value: 256.

  • Type: int32.

  • Description: the number of threads in the SendBatch thread pool. In the data sending task of a NodeChannel, the SendBatch operation of each NodeChannel is submitted to the thread pool as a thread task to wait for scheduling. This configuration item determines the size of the SendBatch thread pool.

send_batch_thread_pool_queue_size

  • Default value: 102400.

  • Type: int32.

  • Description: the queue length of the SendBatch thread pool. In the data sending task of a NodeChannel, the SendBatch operation of each NodeChannel is submitted as a thread task to the thread pool, waiting to be scheduled. After the number of submitted tasks exceeds the length of the thread pool queue, tasks that are subsequently submitted are blocked until an empty slot appears in the queue.

serialize_batch

  • Default value: false.

  • Description: specifies whether to serialize RowBatch for remote procedure calls (RPCs) between backend nodes. This configuration item is used to configure data transmission at the query level.

sleep_one_second

  • Default value: 1, which represents 1 second.

  • Type: int32.

  • Description: the hibernation time period of the threads of the backend node, which cannot be modified. This configuration item is a global variable.

small_file_dir

  • Default value: ${DORIS_HOME}/lib/small_file/.

  • Description: the directory to which files downloaded by SmallFileMgr belong.

snapshot_expire_time_sec

  • Default value: 172800, which represents 48 hours.

  • Description: the interval at which snapshots are cleaned.

status_report_interval

  • Default value: 5.

  • Description: the interval at which profile reports are configured. Unit: seconds.

storage_flood_stage_left_capacity_bytes

  • Default value: 1073741824, which represents 1 GB.

  • Description: the minimum memory size that is available in a data directory.

storage_flood_stage_usage_percent

  • Default value: 95.

  • Description: the maximum proportion of memories that are used in a data directory. The storage_flood_stage_usage_percent and storage_flood_stage_left_capacity_bytes configuration items limit the maximum disk capacity of a data directory. If the values specified by the two configuration items are reached, no more data can be written to the data directory.

storage_medium_migrate_count

  • Default value: 1.

  • Description: the number of threads for cloning.

storage_page_cache_limit

  • Default value: 20%.

  • Description: the cache for the storage page size.

storage_page_cache_shard_size

  • Default value: 16.

  • Description: the shard size of StoragePageCache. Valid values: 2^n (n = 0, 1, 2, ...). We recommend that you set this configuration item to a value close to the number of CPU cores of the backend node to reduce the lock contentions of StoragePageCache.

index_page_cache_percentage

  • Default value: 10.

  • Type: int32.

  • Description: the proportion of index page caches among all page caches. Valid values: 0 to 100.

storage_root_path

  • Default value: ${DORIS_HOME}.

  • Type: string.

  • Description: the directory in which the backend node data is stored. Multiple directories are separated by semicolons (;). You can specify the storage medium of each directory, such as a solid state drive (SSD) or a hard disk drive (HDD). You can add a capacity limit at the end of each directory and use commas (,) for separation. If you do not use an SSD and an HDD at the same time, you do not need to perform configurations as shown in Example 1 and Example 2, or modify the default storage medium of the frontend node. You need to only specify the storage directory.

    • Example 1: storage_root_path=/home/disk1/doris.HDD;/home/disk2/doris.SSD;/home/disk2/doris

    • If the storage medium is an SSD, add .SSD at the end of the storage directory. If the storage medium is an HDD, add .HDD at the end of the storage directory.

    • Example 2: storage_root_path=/home/disk1/doris,medium:hdd;/home/disk2/doris

    • No suffix needs to be added at the end of the storage directory regardless of the storage medium. You need to only set this configuration item.

      Note

      Meaning of medium:ssd:

      • /home/disk1/doris.HDD: The storage medium is an HDD.

      • /home/disk2/doris.SSD: The storage medium is an SSD.

      • /home/disk2/doris: The storage medium is an HDD.

      • /home/disk1/doris,medium:hdd: The storage medium is an HDD.

      • /home/disk2/doris,medium:ssd: The storage medium is an SSD.

storage_strict_check_incompatible_old_format

  • Default value: true.

  • Type: bool.

  • Dynamically modified: not supported.

  • Description: This configuration item is used to check whether a strict verification method is used when an incompatible format of the previous version is used.

If a strict verification method is used when the HDR format of the previous version is used, the program exists after it prints the fatal log. Otherwise, the program prints the warn log.

streaming_load_max_mb

  • Default value: 10240.

  • Type: int64.

  • Dynamically modified: supported.

  • Description: the maximum amount of CSV data allowed in a Stream load. Unit: MB.

Stream loads are generally suitable for loading a few GB of data, and are not suitable for loading large amounts of data.

streaming_load_json_max_mb

  • Default value: 100.

  • Type: int64.

  • Dynamically modified: supported.

  • Description: the maximum amount of JSON data allowed in a Stream load. Unit: MB.

Data of some specific formats, such as JSON data, cannot be split. The system must store all the data in memories before it parses data. This configuration item is used to limit the maximum amount of data that can be loaded in a Stream load.

streaming_load_rpc_max_alive_time_sec

  • Default value: 1200.

  • Description: the lifetime of TabletsChannel. If the channel does not receive data, the channel is deleted.

sync_tablet_meta

  • Default value: false.

  • Description: specifies whether the storage engine opens sync and keeps it to the disk.

sys_log_dir

  • Default value: ${DORIS_HOME}/log.

  • Type: string.

  • Description: the directory in which the log data of the backend node is stored.

sys_log_level

  • Default value: INFO.

  • Description: the log level. Valid log levels, from low to high: INFO, WARNING, ERROR, and FATAL.

sys_log_roll_mode

  • Default value: SIZE-MB-1024.

  • Description: the size of a split log. Each split log is 1 GB in size.

sys_log_roll_num

  • Default value: 10.

  • Description: the number of reserved log files.

sys_log_verbose_level

  • Default value: 10.

  • Description: the log display level, which is used to control the log output at the beginning of VLOG in the code.

sys_log_verbose_modules

Description: the log printing module. If olap is written, only logs under the olap module are printed.

tablet_map_shard_size

  • Default value: 1.

  • Description: the tablet_map_lock fragment size. Valid values: 2^n (n = 0, 1, 2, 3, 4). This configuration item is used to manage tablets.

tablet_meta_checkpoint_min_interval_secs

  • Default value: 600.

  • Description: the interval at which TabletMeta Checkpoint threads are polled. Unit: seconds.

tablet_meta_checkpoint_min_new_rowsets_num

  • Default value: 10.

  • Description: the minimum number of rowsets in TabletMeta Checkpoint.

tablet_scan_frequency_time_node_interval_second

  • Default value: 300.

  • Type: int64.

  • Description: the interval at which the value of the query_scan_count metric is recorded. To calculate the frequency at which the tablets are scanned, the value of the query_scan_count metric is recorded at a specific interval.

tablet_stat_cache_update_interval_second

  • Default value: 300.

  • Description: the interval at which the tablet status caches are updated. Unit: seconds.

tablet_rowset_stale_sweep_time_sec

  • Default value: 1800.

  • Type: int64.

  • Description: the expiration time of the merged rowset version. If the result of subtracting the time when the rowset is last created in the directory of the merged version from the time specified by now() is greater than the value of this configuration item, the current directory is cleaned and the merged rowsets are deleted. Unit: seconds.

If data writes are too frequent and the disk space is insufficient, you can set this configuration item to a relatively small value. If the time specified by this configuration item is less than 5 minutes, the frontend node cannot query the merged version and the -230 error is reported.

tablet_writer_open_rpc_timeout_sec

  • Default value: 60, which represents 60 seconds.

  • Description: The RPC timeout period of opening a tablet writer in the remote backend node. This configuration item determines the RPC timeout period for sending a Batch (1024 lines) during imports that consume little time. Since the RPC may involve writing multiple batches of memories, the RPC timeout may be caused by writing batches. To reduce timeout errors, such as the "send batch fail" error, you can modify this configuration item. If you increase the value of the write_buffer_size configuration item, the value of this configuration item needs to be increased at the same time.

tablet_writer_ignore_eovercrowded

  • Default value: false.

  • Type: bool.

  • Description: specifies whether to ignore the '[E1011]The server is overcrowded' bRPC error during data writes.

If the '[E1011]The server is overcrowded' error is returned, you can modify the brpc_socket_max_unwritten_bytes configuration item. Take note that the brpc_socket_max_unwritten_bytes configuration item cannot be dynamically modified. To temporarily prevent data write errors, you can set the tablet_writer_ignore_eovercrowded configuration item to true. This configuration item only affects data writes. Overcrowded problems are still checked in other RPC requests.

tc_free_memory_rate

  • Default value: 20.

  • Description: the proportion of available memories. Valid values: 0 to 100.

tc_max_total_thread_cache_bytes

  • Default value: 1073741824.

  • Type: int64.

  • Description: the total thread cache size in TCMalloc. The limit specified by this configuration item is not strictly implemented. The total thread cache size in actual operations may exceed the limit. For more information, see TCMalloc : Thread-Caching Malloc.

If the system is found to be in a high-stress scenario and a large number of threads are found in the TCMalloc lock competition phase through the thread stack of the backend node, such as a large number of SpinLock-related stacks, you can increase the value of this configuration item to improve the system performance.

tc_use_memory_min

  • Default value: 10737418240.

  • Description: the minimum memory size of TCmalloc. If the memory size of TCmalloc is smaller than the value of this configuration item, TCmalloc is not returned to the operating system.

thrift_client_retry_interval_ms

  • Default value: 1000.

  • Type: int64.

  • Description: the interval at which the thrift client of the backend node is retried when an avalanche occurs in the thrift server of the frontend node. Unit: milliseconds.

thrift_connect_timeout_seconds

  • Default value: 3.

  • Description: the default timeout period for the connection of the thrift client. Unit: seconds.

thrift_rpc_timeout_ms

  • Default value: 5000.

  • Description: the default timeout period of thrift. Unit: milliseconds.

thrift_server_type_of_fe

Description: the service model used by the thrift service of the frontend node. The value of this configuration item is not case-sensitive and must be the same as the value of the thrift_server_type configuration that you set for the frontend node. Type: string. Valid values: THREADED and THREAD_POOL.

  • A value of THREADED indicates that the model is a non-blocking I/O model.

  • A value of THREAD_POOL indicates that the model is a blocking I/O model.

total_permits_for_compaction_score

  • Default value: 10000.

  • Type: int64.

  • Dynamically modified: supported.

  • Description: The upper limit of permits held by all compaction tasks. This configuration item is used to limit the memory consumption for compaction.

trash_file_expire_time_sec

  • Default value: 259200.

  • Description: the interval at which the recycle bin is cleaned. If the disk space is insufficient, this configuration item may not be applied.

txn_commit_rpc_timeout_ms

  • Default value: 10000.

  • Description: the RPC timeout period of txn submission.

txn_map_shard_size

  • Default value: 128.

  • Description: the txn_map_lock fragment size. Vaid values: 2^n (n = 0, 1, 2, 3, 4). This configuration item is used to improve the performance of managing txns.

txn_shard_size

  • Default value: 1024.

  • Description: the txn_lock fragment size. Vaid values: 2^n (n = 0, 1, 2, 3, 4). This configuration item is used to improve the performance of managing txns.

unused_rowset_monitor_interval

  • Default value: 30.

  • Description: the interval at which expired rowsets are cleaned. Unit: seconds.

upload_worker_count

  • Default value: 1.

  • Description: the maximum number of threads that are used to upload files.

use_mmap_allocate_chunk

  • Default value: false.

  • Description: specifies whether to use mmap to allocate blocks. If you enable this feature, we recommend that you increase the value of vm.max_map_count whose default value is 65530. You can run the sysctl -w vm.max_map_count=262144 or echo 262144> /proc/sys/vm/max_map_count command to operate max_map_count as the root user. If you set this configuration item to true, you must set the chunk_reserved_bytes_limit configuration item to a relatively large number to ensure good performance.

user_function_dir

  • Default value: ${DORIS_HOME}/lib/udf.

  • Description: the directory of the udf function.

webserver_num_workers

  • Default value: 48.

  • Description: the default number of worker threads of the webserver.

webserver_port

  • Default value: 8040.

  • Type: int32.

  • Description: the service port of the HTTP server on the backend node

write_buffer_size

  • Default value: 104857600.

  • Description: the size of the buffers before flashing. Before data is imported, data is written to a memory block on the backend node, and written back to the disk only if the size of the memory block reaches the threshold. Default size: 100 MB. If the threshold is exceedingly small, a large number of small files may be generated on the backend node. To resolve this problem, you can increase the threshold. However, if the threshold is exceedingly large, an RPC timeout may occur. Make sure that you set the threshold to an appropriate value.

zone_map_row_num_threshold

  • Default value: 20.

  • Type: int32.

  • Description: If the number of rows in a page is less than the value of this configuration item, no zonemap is created. This configuration item is used to reduce data expansion.

aws_log_level

  • Default value: 3.

  • Type: int32.

  • Description: the log level of Amazon Web Services (AWS) SDK. Sample code:

    Off = 0,
       Fatal = 1,
       Error = 2,
       Warn = 3,
       Info = 4,
       Debug = 5,
       Trace = 6

enable_tcmalloc_hook

  • Default value: true.

  • Type: bool.

  • Description: specifies whether to create or delete Hook TCmalloc. Currently, thread local MemTracker is calculated in Hook.

mem_tracker_consume_min_size_bytes

  • Default value: 1048576.

  • Type: int32.

  • Description: the minimum length of TCMalloc Hook when MemTracker is consumed or released. Consume sizes smaller than the value of this configuration item are accumulated to prevent frequent calls for consuming or releasing MemTracker. If you reduce the value of this configuration item, the frequency at which MemTracker is consumed or released is increased. If you increase the value of this configuration item, the statistic of MemTracker may be inaccurate. Theoretically, the difference between the statistic of MemTracker and the accurate value of MemTracker can be calculated based on the following formula: Value of mem_tracker_consume_min_size_bytes × Number of threads on the backend node to which MemTracker belongs.

max_segment_num_per_rowset

  • Default value: 200.

  • Type: int32.

  • Description: the maximum number of segments in the newly generated rowset during data imports. If the value of this configuration item is exceeded, data import fails and the -238 error is returned. If the rowset has too many segments, a large number of memories may be occupied during compaction and out of memory (OOM) errors may occur.

remote_storage_read_buffer_mb

  • Default value: 16.

  • Type: int32.

  • Description: the size of the used caches when you read files from Hadoop Distributed File System (HDFS) and Object Storage Service (OSS). Unit: MB. If you increase the value of this configuration item, the number of calls to read remote data can be reduced, but the memory overhead is increased.

external_table_connect_timeout_sec

  • Default value: 5.

  • Type: int32.

  • Description: the timeout period when an external table is connected. Unit: seconds.

segment_cache_capacity

  • Default value: 1000000.

  • Type: int32.

  • Description: the maximum number of segments cached by Segment Cache.

The default value is currently only an empirical value, and may need to be modified based on actual scenarios. If you increase the value of this configuration item, more segments can be cached and I/Os can be reduced. If you reduce the value of this configuration item, fewer memories are used.

auto_refresh_brpc_channel

  • Default value: false.

  • Type: bool.

  • Description: When a bRPC connection is obtained, the availability of the connection through the hand_shake RPC is evaluated. If the connection is not available, another connection is made.

high_priority_flush_thread_num_per_store

  • Default value: 1.

  • Type: int32.

  • Description: the number of flush threads that are used for HIGH priority import tasks in each storage directory.

routine_load_consumer_pool_size

  • Default value: 10.

  • Type: int32.

  • Description: the number of caches for the data consumer used by the routine load.

load_task_high_priority_threshold_second

  • Default value: 120.

  • Type: int32.

  • Description: When the timeout period of an import task is less than the value of this configuration item, Doris considers the task to be of HIGH priority. HIGH priority tasks use a separate pool of flush threads.

min_load_rpc_timeout_ms

  • Default value: 20.

  • Type: int32.

  • Description: the minimum timeout period for each RPC in the load job.

doris_scan_range_max_mb

  • Default value: 1024.

  • Type: int32.

  • Description: the maximum amount of data read by each OlapScanner.

string_type_length_soft_limit_bytes

  • Default value: 1048576.

  • Type: int32.

  • Description: the maximum length of the string data. The limit specified by this configuration item is not strictly implemented. Unit: bytes.

enable_quick_compaction

  • Default value: false.

  • Type: bool.

  • Description: specifies whether to enable quick compaction. In scenarios in which small amounts of data are frequently imported, you can enable quick compaction to merge imported versions in time. This prevents the -235 error. Whether the amount of data is small is measured based on the number of data rows.

quick_compaction_max_rows

  • Default value: 1000.

  • Type: int32.

  • Description: If the number of imported data rows is less than the value of this configuration item, the amount of imported data is considered small and the data is selected for quick compaction.

quick_compaction_batch_size

  • Default value: 10.

  • Type: int32.

  • Description: If the times of data imports reach the value of this configuration item, quick compaction is triggered.

quick_compaction_min_rowsets

  • Default value: 10.

  • Type: int32.

  • Description: the minimum number of versions in the compaction. If the number of rowsets of the selected data exceeds the value of this configuration item, the real compaction is performed.