All Products
Search
Document Center

MaxCompute:MaxFrame flag configuration guide

Last Updated:Dec 10, 2025

This document describes MaxCompute flags, runtime flags, and MaxFrame runtime parameters. It provides detailed configuration examples and describes their meanings, default values, available values, common scenarios, and recommended settings.

MaxFrame parameter examples

MaxCompute SQL flags

In MaxFrame, all MaxCompute SQL-related flags are managed in the options.sql.settings dictionary.

from maxframe import options

options.sql.settings = {
  # Example: Set the maximum job runtime to 72 hours.
  "odps.sql.job.max.time.hours": 72,
  # Example: Specify a custom image for the job.
  "odps.session.image": "common",
  # Example: Set the concurrency to 50000 for all input tables.
  "odps.sql.split.dop": '{"*":50000}',
  # Example: Set the data processing batch size to 1024 rows.
  "odps.sql.executionengine.batch.rowcount": 1024,
}

MaxFrame options

MaxFrame runtime parameters are configured directly using the options.xxx format. The following code provides an example:

from maxframe import options

# Example: Set the retention period for LogView links to 24 hours.
options.session.logview_hours = 24

# Example: Set the number of retries for the client when a retryable error occurs.
options.retry_times = 3

# Example: Enable the built-in query optimization feature of MaxCompute.
options.sql.enable_mcqa = True

MaxCompute flags

The following table describes the common flags in the options.sql.settings dictionary.

Parameter category

Parameter

Purpose

Value range and default value

Recommendation

Concurrency and chunking

odps.sql.split.dop

  • Configures the degree of parallelism (DOP) for data reads on a per-table basis, using column-store statistics information (CMF). This parameter has a higher priority than split.size.

  • Configure this parameter using a dictionary in the format of {table_name: value}. To target a specific table, use its fully qualified name: project.[schema.]table.

  • To specify the chunking for all tables, use an asterisk (*) for matching, such as {"*":50000}.

Range: 1 to 99999. Default: None.

When you process large tables or run large-scale tasks, explicitly enable this parameter to achieve high concurrency.

odps.stage.mapper.split.size

If CMF information is unavailable, the system chunks tasks based on the input table size in MB.

Range: ≥ 1. Default: 256 MB.

Keep the default value.

Resource and memory

odps.stage.mapper.mem/reducer.mem/joiner.mem

Allocates memory in MB to a single worker for the Mapper, Reducer, and Joiner stages, respectively.

Range: 1024 MB to 12288 MB. Default: 1024 MB.

Increase this value if you process large data volumes, encounter data hot spots, or experience out-of-memory (OOM) errors from complex joins.

odps.stage.reducer.num / odps.stage.joiner.num

Manually sets the number of concurrent instances for the Reduce and Join stages.

Maximum: 10000. Default: Dynamically calculated by the system.

If a job involves a large-scale shuffle, such as GROUP BY or JOIN, or has data skew, increase this value to distribute the computing load.

Shuffle and output safety

odps.sql.runtime.flag.fuxi_streamline_x_EnableNormalCheckpoint

&

fuxi_ShuffleService_client_CheckpointMaxCopy

Enables backups for intermediate data output by Mappers and sets the number of replicas.

For long-running jobs with large-scale shuffles, set the number of replicas to 2:

2 ("fuxi_ShuffleService_client_CheckpointMaxCopy": 2),

This significantly improves fault tolerance and data read stability.

odps.sql.sys.flag.fuxi_JobMaxInternalFolderSize

Sets the maximum total size in MB for intermediate shuffle data that a single job can generate.

If you encounter an Internal data size exceeds limit error when you perform a shuffle operation on a very large table, increase this value.

Compute stability and monitoring

odps.sql.runtime.flag.fuxi_EnableInstanceMonitor

&

fuxi_InstanceMonitorTimeout

These two parameters must be used together. They enable heartbeat monitoring for the underlying Fuxi scheduler and set the timeout period in seconds. This prevents the system from incorrectly identifying a long-running UDF as unresponsive and terminating it.

To modify fuxi_InstanceMonitorTimeout, contact technical support to add it to the whitelist.

odps.job.instance.retry.times

The maximum number of times the system automatically retries a single worker (instance) after it fails due to a transient error, such as a machine breakdown.

Default: 3. Recommended maximum: 100.

To set a value higher than the default, contact technical support to add the parameter to the whitelist.

odps.dag2.compound.config

Configures the reuse policy for underlying workers. Set it to fuxi.worker.reuse.policy:NO_REUSE to disable worker reuse.

If a UDF has a risk of memory leaks or state pollution, disable reuse to ensure each task runs in a clean environment. This slightly increases the task startup overhead.

Execution efficiency and optimization

odps.sql.executionengine.batch.rowcount

Sets the size, in rows, of a batch, which is the basic unit for internal data processing in MaxCompute.

1024

This value balances memory and performance. If a single row contains a large amount of data and causes an OOM error, decrease this value. If the computation is simple, you can increase this value to improve throughput.

odps.sql.runtime.flag.executionengine_EnableVectorizedExpr

Enables the vectorized execution engine for expressions. This can significantly improve the performance of compute-intensive operations.

Enable this parameter when you use the rand() function or perform many arithmetic operations.

odps.optimizer.enable.conditional.mapjoin

&

odps.optimizer.cbo.rule.filter.black

Use these two parameters together to disable HashJoin.

Set cbo.rule.filter.black to "hj". This is an expert option. Do not configure it unless you fully understand its impact on the execution plan.

odps.sql.split.cluster.parallel_explore

Concurrently reads CMF information during the task split stage.

Enable this option if the split stage of a job takes too long.

odps.sql.jobmaster.memory

Sets the memory size for the job's Master node.

When you run a shuffle job that involves very large tables, increase this value. For example, set it to 30000 MB.

UDF and function safety

odps.sql.udf.timeout

&

odps.function.timeout

Controls the timeout period in seconds for a data batch to execute in a UDF or function.

Range: 1 to 3600s. Default: 1800s. Setting it to 0 has no effect.

odps.sql.runtime.flag.executionengine_PythonStdoutMaxsize

Limits the maximum length in MB of logs output to stdout by print statements in a Python UDF.

Maximum: 100 MB. Default: 20 MB.

To modify this value, contact technical support to add it to the whitelist.

Resource and environment dependencies

odps.session.image

Specifies the runtime environment for a job. The value must be the name of an existing custom image in the current tenant's MaxCompute project.

odps.task.major.version

Locks a job to a specific major version of MaxCompute to ensure feature and behavior stability.

This is an expert option. Do not configure it unless you understand its impact.

odps.storage.orc.row.group.stride

&

odps.storage.meta.file.version

Control the row group size of ORC files and the version of CMF metadata files, respectively.

These are expert options. Do not configure them unless you understand the underlying mechanisms.

Other general parameters

odps.sql.allow.fullscan

Specifies whether to allow a full table scan on a partitioned table without a partition filter condition.

Enable this with caution to prevent unexpected high costs and long runtimes.

odps.sql.cfile2.field.maxsize

Defines the maximum allowed storage size in bytes for a single field (column).

Default: 8388608 (8 MB). Maximum: 268435456 (256 MB).

Increase this value when you process fields that contain very large content, such as long text, HTML, or Base64-encoded data.

odps.sql.job.max.time.hours

Sets the maximum runtime in hours for the entire SQL job.

Maximum: 72 hours. Default: 24 hours.

odps.sql.always.commit.result

&

odps.sql.runtime.flag.executionengine_EnableWorkerCommit

Use these two parameters together to enable the partial commit feature. Even if a job fails because some data processing failed, the successful results are still committed.

This is suitable for extract, transform, and load (ETL) scenarios where partial success is acceptable.

Table writes and CMF (fixed combination)

{
    "odps.task.merge.enabled": "false",
    "odps.sql.reshuffle.dynamicpt": "false",
    "odps.sql.enable.dynaparts.stats.collection": "true",
    "odps.optimizer.dynamic.partition.is.first.nth.value.split.enable": "false",
    "odps.sql.stats.collection.aggressive": "true",
}

This is a fixed combination of flags. It ensures that column-store statistics information (CMF) is generated quickly and correctly when you write data to a dynamic partitioned table.

This is crucial for downstream jobs to precisely split data using odps.sql.split.dop.

MaxFrame options

The following table describes the main built-in options for MaxFrame. You can configure these options directly using the options.xxx format.

Parameter name

Purpose

Type

Default value

options.local_timezone

Sets the local time zone. This affects the default behavior of date and time functions.

STR/None

None

options.session.logview_hours

Sets the retention period in hours for generated LogView links.

INT

24

options.sql.enable_mcqa

Specifies whether to enable the built-in intelligent query optimization and acceleration feature of MaxCompute.

BOOL

TRUE

options.sql.generate_comments

Specifies whether to automatically add comments to generated SQL statements for traceability.

BOOL

TRUE

options.sql.auto_use_common_image

Specifies whether to automatically configure a common public image when the system detects that the code uses libraries with extra dependencies.

BOOL

TRUE

options.session.max_alive_seconds

options.session.max_idle_seconds

Control the session lifecycle.

  • max_alive_seconds is the maximum time to live for a session.

  • max_idle_seconds is the maximum idle time allowed for a session. If the idle time is exceeded, the session is revoked.

The value of max_idle_seconds must be less than or equal to the value of max_alive_seconds.

options.session.temp_table_lifecycle

Sets the default lifecycle in days for temporary tables created using MaxFrame.

INT

1

options.session.auto_purge_temp_tables

Specifies whether to automatically clean up all temporary tables created in the current session when the session ends.

BOOL

FALSE

options.function.default_running_options

Sets the default resource configuration for functions registered with the @remote decorator.

dict. Keys can include cpu, memory, and gpu.

Important

The use of many special flags is subject to prerequisites, such as whitelist requests, custom image management, and dependencies on CMF statistics information. Before you configure these advanced options, contact the MaxCompute technical support team to ensure your configurations are correct and effective.