This document describes MaxCompute flags, runtime flags, and MaxFrame runtime parameters. It provides detailed configuration examples and describes their meanings, default values, available values, common scenarios, and recommended settings.
MaxFrame parameter examples
MaxCompute SQL flags
In MaxFrame, all MaxCompute SQL-related flags are managed in the options.sql.settings dictionary.
from maxframe import options
options.sql.settings = {
# Example: Set the maximum job runtime to 72 hours.
"odps.sql.job.max.time.hours": 72,
# Example: Specify a custom image for the job.
"odps.session.image": "common",
# Example: Set the concurrency to 50000 for all input tables.
"odps.sql.split.dop": '{"*":50000}',
# Example: Set the data processing batch size to 1024 rows.
"odps.sql.executionengine.batch.rowcount": 1024,
}
MaxFrame options
MaxFrame runtime parameters are configured directly using the options.xxx format. The following code provides an example:
from maxframe import options
# Example: Set the retention period for LogView links to 24 hours.
options.session.logview_hours = 24
# Example: Set the number of retries for the client when a retryable error occurs.
options.retry_times = 3
# Example: Enable the built-in query optimization feature of MaxCompute.
options.sql.enable_mcqa = True
MaxCompute flags
The following table describes the common flags in the options.sql.settings dictionary.
Parameter category | Parameter | Purpose | Value range and default value | Recommendation |
Concurrency and chunking |
|
| Range: 1 to 99999. Default: None. | When you process large tables or run large-scale tasks, explicitly enable this parameter to achieve high concurrency. |
| If CMF information is unavailable, the system chunks tasks based on the input table size in MB. | Range: ≥ 1. Default: 256 MB. | Keep the default value. | |
Resource and memory |
| Allocates memory in MB to a single worker for the Mapper, Reducer, and Joiner stages, respectively. | Range: 1024 MB to 12288 MB. Default: 1024 MB. | Increase this value if you process large data volumes, encounter data hot spots, or experience out-of-memory (OOM) errors from complex joins. |
| Manually sets the number of concurrent instances for the Reduce and Join stages. | Maximum: 10000. Default: Dynamically calculated by the system. | If a job involves a large-scale shuffle, such as GROUP BY or | |
Shuffle and output safety |
&
| Enables backups for intermediate data output by Mappers and sets the number of replicas. | For long-running jobs with large-scale shuffles, set the number of replicas to 2:
This significantly improves fault tolerance and data read stability. | |
| Sets the maximum total size in MB for intermediate shuffle data that a single job can generate. | If you encounter an | ||
Compute stability and monitoring |
&
| These two parameters must be used together. They enable heartbeat monitoring for the underlying Fuxi scheduler and set the timeout period in seconds. This prevents the system from incorrectly identifying a long-running UDF as unresponsive and terminating it. | To modify | |
| The maximum number of times the system automatically retries a single worker (instance) after it fails due to a transient error, such as a machine breakdown. | Default: 3. Recommended maximum: 100. | To set a value higher than the default, contact technical support to add the parameter to the whitelist. | |
| Configures the reuse policy for underlying workers. Set it to | If a UDF has a risk of memory leaks or state pollution, disable reuse to ensure each task runs in a clean environment. This slightly increases the task startup overhead. | ||
Execution efficiency and optimization |
| Sets the size, in rows, of a batch, which is the basic unit for internal data processing in MaxCompute. | 1024 | This value balances memory and performance. If a single row contains a large amount of data and causes an OOM error, decrease this value. If the computation is simple, you can increase this value to improve throughput. |
| Enables the vectorized execution engine for expressions. This can significantly improve the performance of compute-intensive operations. | Enable this parameter when you use the rand() function or perform many arithmetic operations. | ||
&
| Use these two parameters together to disable HashJoin. | Set cbo.rule.filter.black to "hj". This is an expert option. Do not configure it unless you fully understand its impact on the execution plan. | ||
| Concurrently reads CMF information during the task split stage. | Enable this option if the split stage of a job takes too long. | ||
| Sets the memory size for the job's Master node. | When you run a shuffle job that involves very large tables, increase this value. For example, set it to 30000 MB. | ||
UDF and function safety |
&
| Controls the timeout period in seconds for a data batch to execute in a UDF or function. | Range: 1 to 3600s. Default: 1800s. Setting it to 0 has no effect. | |
| Limits the maximum length in MB of logs output to stdout by print statements in a Python UDF. | Maximum: 100 MB. Default: 20 MB. | To modify this value, contact technical support to add it to the whitelist. | |
Resource and environment dependencies |
| Specifies the runtime environment for a job. The value must be the name of an existing custom image in the current tenant's MaxCompute project. | ||
| Locks a job to a specific major version of MaxCompute to ensure feature and behavior stability. | This is an expert option. Do not configure it unless you understand its impact. | ||
&
| Control the row group size of ORC files and the version of CMF metadata files, respectively. | These are expert options. Do not configure them unless you understand the underlying mechanisms. | ||
Other general parameters |
| Specifies whether to allow a full table scan on a partitioned table without a partition filter condition. | Enable this with caution to prevent unexpected high costs and long runtimes. | |
| Defines the maximum allowed storage size in bytes for a single field (column). | Default: 8388608 (8 MB). Maximum: 268435456 (256 MB). | Increase this value when you process fields that contain very large content, such as long text, HTML, or Base64-encoded data. | |
| Sets the maximum runtime in hours for the entire SQL job. | Maximum: 72 hours. Default: 24 hours. | ||
&
| Use these two parameters together to enable the partial commit feature. Even if a job fails because some data processing failed, the successful results are still committed. | This is suitable for extract, transform, and load (ETL) scenarios where partial success is acceptable. | ||
Table writes and CMF (fixed combination) | | This is a fixed combination of flags. It ensures that column-store statistics information (CMF) is generated quickly and correctly when you write data to a dynamic partitioned table. This is crucial for downstream jobs to precisely split data using odps.sql.split.dop. | ||
MaxFrame options
The following table describes the main built-in options for MaxFrame. You can configure these options directly using the options.xxx format.
Parameter name | Purpose | Type | Default value |
| Sets the local time zone. This affects the default behavior of date and time functions. | STR/None | None |
| Sets the retention period in hours for generated LogView links. | INT | 24 |
| Specifies whether to enable the built-in intelligent query optimization and acceleration feature of MaxCompute. | BOOL | TRUE |
| Specifies whether to automatically add comments to generated SQL statements for traceability. | BOOL | TRUE |
| Specifies whether to automatically configure a common public image when the system detects that the code uses libraries with extra dependencies. | BOOL | TRUE |
| Control the session lifecycle.
The value of | ||
| Sets the default lifecycle in days for temporary tables created using MaxFrame. | INT | 1 |
| Specifies whether to automatically clean up all temporary tables created in the current session when the session ends. | BOOL | FALSE |
| Sets the default resource configuration for functions registered with the | dict. Keys can include |
The use of many special flags is subject to prerequisites, such as whitelist requests, custom image management, and dependencies on CMF statistics information. Before you configure these advanced options, contact the MaxCompute technical support team to ensure your configurations are correct and effective.