All Products
Search
Document Center

MaxCompute:SET

Last Updated:Aug 26, 2024

You can execute the SET statement to configure MaxCompute system variables for the current session.

Syntax

SET <KEY>=<VALUE>

Parameters

  • KEY: the name of the property.

  • VALUE: the value of the property.

    The following table describes common properties at the session level.

    Property

    Description

    Valid values

    console.sql.result.instancetunnel

    Specifies whether to enable InstanceTunnel. For more information, see Usage notes.

    • True: enabled.

    • False: disabled.

    odps.stage.mapper.mem

    The memory size of each Map worker.

    Valid values: 256 to 12288.

    Default value: 1024. Unit: MB.

    odps.stage.reducer.mem

    The memory size of each Reduce worker.

    Valid values: 256 to 12288.

    Default value: 1024. Unit: MB.

    odps.stage.joiner.mem

    The memory size of each Join worker.

    Valid values: 256 to 12288.

    Default value: 1024. Unit: MB.

    odps.stage.mem

    The total memory size of all workers in a specified MaxCompute job. This property has a lower priority than the odps.stage.mapper.mem, odps.stage.reducer.mem, and odps.stage.joiner.mem properties.

    Valid values: 256 to 12288.

    No default value. Unit: MB.

    odps.stage.mapper.split.size

    The input data amount of each Map worker, which is the split size of the input file. You can use this property to indirectly control the number of workers at each Map stage.

    Note

    If the LIMIT keyword is used in SQL statements, the number of concurrent workers is limited to 1. This way, only a single worker is allowed to run at a time. In this case, when you configure this property, we recommend that you do not add the LIMIT keyword to SQL statements.

    Default value: 256. Unit: MB.

    odps.stage.reducer.num

    The number of workers at each Reduce stage.

    Note

    If the LIMIT keyword is used in SQL statements, the number of concurrent workers is limited to 1. This way, only a single worker is allowed to run at a time. In this case, when you configure this property, we recommend that you do not add the LIMIT keyword to SQL statements.

    Valid values: 1 to 10000.

    odps.stage.joiner.num

    The number of workers at each Join stage.

    Note

    If the LIMIT keyword is used in SQL statements, the number of concurrent workers is limited to 1. This way, only a single worker is allowed to run at a time. In this case, when you configure this property, we recommend that you do not add the LIMIT keyword to SQL statements.

    Valid values: 1 to 10000.

    odps.stage.num

    The total number of concurrent workers in a specified MaxCompute job. This parameter has a lower priority than the odps.stage.mapper.split.size, odps.stage.reducer.num, and odps.stage.joiner.num parameters.

    Note

    If the LIMIT keyword is used in SQL statements, the number of concurrent workers is limited to 1. This way, only a single worker is allowed to run at a time. In this case, when you configure this property, we recommend that you do not add the LIMIT keyword to SQL statements.

    Valid values: 1 to 10000.

    odps.sql.reshuffle.dynamicpt

    Specifies whether to enable dynamic partitioning. Dynamic partitioning prevents the generation of a large number of small files.

    • True: enabled.

    • False: disabled.

    Note

    If a small number of dynamic partitions are generated, we recommend that you set this property to False to prevent data skew issues.

    odps.sql.type.system.odps2

    Specifies whether to enable the MaxCompute V2.0 data type edition. For more information about the MaxCompute V2.0 data type edition, see MaxCompute V2.0 data type edition.

    • True: enabled.

    • False: disabled.

    odps.sql.hive.compatible

    Specifies whether to enable the Hive-compatible data type edition. MaxCompute supports Hive syntaxes, such as inputRecordReader, outputRecordReader, and Serde, only after the Hive-compatible data type edition is enabled. For more information about the Hive-compatible data type edition, see Hive-compatible data type edition.

    • True: enabled.

    • False: disabled.

    odps.sql.executionengine.coldata.deep.buffer.size.max

    The cache size requested to store a column that contains data of complex types when MaxCompute writes data to a table. This improves the write performance.

    You can configure this property if the output table contains excessive data of complex types or the size of a variable of a specified complex type in the output table is too large.

    • If the schemas of three columns in the output table are of complex data types, such as STRING, MAP, STRUCT, ARRAY, or BINARY, MaxCompute automatically reserves 192 MB (64 × 3) of memory for table write operations. The cache requested for each column is used to store data in the row that is specified by batch row count.

    • If the estimated size of the variables of complex types in the table is small, we recommend that you set this property to a small value. For example, if the size of each variable of a specific complex type does not exceed 1,024 bytes and the default value (1024) of batch row count is used, you can set the value of this property to 1048576 (1024 × 1024). Sample statement:

      set odps.sql.executionengine.coldata.deep.buffer.size.max=1048576;
    • If the estimated size of each variable of a complex type is between 7 MB and 8 MB, and batch row count is set to 32, you can adjust the value of this property to 256 MB (8 × 32).

    • If the output table contains variables of complex types or the small table involved in the MAPJOIN operation contains data of complex types, the memory used to run jobs may be adversely affected when you adjust the value of this property. Based on the preceding calculation method, a large value of this property may cause an out of memory (OOM) error.

    Unit: bytes. Default value: 67108864.

    odps.task.quota.preference.tag

    The quota group that is allocated to a specified job. You can configure the quota group on MaxCompute Management. For projects that use subscription resources, you can configure this property to specify a level-2 quota group that can be used by a job. A job is preferentially scheduled to a quota group whose quota tag is the same as the quota tag of the job. You must configure a quota tag for a job when you submit the job. If no quota group has the same quota tag, the job is scheduled to the quota group specified by the project to which the job belongs.

    You can execute the following statement to configure a quota tag:

    set odps.task.quota.preference.tag = tag_name;
    Note

    This property is available only for SQL jobs.

    tag_name specifies the tag of the quota group configured in MaxCompute Management. You can configure tag_name only for quota groups in the region where the project of your job belongs. The value of the tag_name parameter can contain letters, digits, and underscores (_).

    Note

    You can specify the odps.task.quota.preference.tag property only after you configure tags for quota groups in MaxCompute Management.

    odps.sql.udf.getjsonobj.new

    Specifies whether the GET_JSON_OBJECT function retains the original string when it returns a value.

    From April 20, 2021, the GET_JSON_OBJECT function automatically retains the original strings for the returned results in new MaxCompute projects. For MaxCompute projects created before April 20, 2021, the GET_JSON_OBJECT function automatically returns JSON-formatted reserved characters by using escape characters. This prevents the impact on existing jobs.

    For more information about the GET_JSON_OBJECT function, see GET_JSON_OBJECT.

    • True: enabled.

    • False: disabled.

    odps.sql.udf.jvm.memory

    The maximum memory size of the Java Virtual Machine (JVM) heap for a user-defined function (UDF).

    If a large amount of data is calculated and sorted in the memory by using a UDF, the OOM error may occur. In this case, you can increase the value of this property. However, you must optimize UDF code based on your business requirements to prevent OOM errors.

    Valid values: 256 to 12288.

    Default value: 1024. Unit: MB.

    odps.function.timeout

    The timeout period of a UDF.

    Valid values: 0 to 3600.

    Default value: 600. Unit: seconds.

    LabelSecurity

    Specifies whether to enable LabelSecurity. For more information about LabelSecurity, see Label-based access control.

    • True: enabled.

    • False: disabled.

    odps.sql.session.resources

    The resources that are referenced by user-defined types (UDTs). You can specify multiple resources and separate them with commas (,). For more information about how to reference resources, see Overview.

    The uploaded resources.

    odps.sql.udt.display.tostring

    Specifies whether to enable the java.util.Objects.toString(...) mechanism on the Wrap for all the columns whose outputs are UDTs.

    • True: enabled.

    • False: disabled.

    odps.sql.session.java.imports

    The Java packages that are referenced by UDTs. You can specify multiple Java packages and separate them with commas (,). For more information about how to reference Java packages, see Overview.

    The uploaded Java packages.

    CheckPermissionUsingACL

    Specifies whether to enable ACL-based access control. For more information about ACL-based access control, see MaxCompute permissions.

    • True: enabled.

    • False: disabled.

    CheckPermissionUsingPolicy

    Specifies whether to enable policy-based access control. For more information about policy-based access control, see Policy-based access control.

    • True: enabled.

    • False: disabled.

    ObjectCreatorHasAccessPermission

    Specifies whether creators of objects can perform operations on the objects created by themselves.

    • True: Creators of objects can perform operations on the objects that are created by themselves.

    • False: Creators of objects cannot perform operations on the objects created by themselves.

    ObjectCreatorHasGrantPermission

    Specifies whether creators of objects can authorize other users to perform operations on the objects.

    • True: Creators of objects can authorize other users to perform operations on the objects.

    • False: Creators of objects cannot authorize other users to perform operations on the objects.

    ProjectProtection

    Specifies whether to enable project data protection. For more information about the project data protection mechanism, see Project data protection.

    • True: enabled.

    • False: disabled.

    odps.output.field.formatter

    The dynamic data masking rule for SQL query results. For more information about dynamic data masking, see Dynamic data masking.

    The custom dynamic data masking rule.

    odps.isolation.session.enable

    Specifies whether to enable dynamic data masking for a single session. For more information about dynamic data masking, see Dynamic data masking.

    • True: enabled.

    • False: disabled.

    odps.sql.skewjoin

    Specifies whether to enable the SKEWJOIN feature to prevent long tails.

    • True: enabled.

    • False: disabled.

    odps.sql.skewinfo

    The key and value on which SKEWJOIN is executed. For more information, see Data skew tuning.

    -

    odps.sql.reducer.instances

    The number of hash buckets. For more information about hash buckets, see Table operations.

    Valid values: 0 to 4000.

    odps.sql.udf.ppr.deterministic

    Specifies whether to enable partition pruning for UDFs. For more information about partition pruning, see the "WHERE clause (where_condition)" section in SELECT syntax.

    • True: enabled.

    • False: disabled.

    odps.sql.udf.ppr.to.subquery

    Specifies whether to ignore the error that occurs during result backfilling when partition pruning is being performed. For more information about partition pruning, see the "WHERE clause (where_condition)" section in SELECT syntax.

    • True: enabled.

    • False: disabled.

    odps.optimizer.enable.range.partial.repartitioning

    Specifies whether to enable the Shuffle Remove feature for range-clustered tables.

    • True: enabled.

    • False: disabled.

    odps.optimizer.skew.join.topk.num

    The number of hot key values obtained by the optimizer when it performs an Aggregate operation. For more information, see SKEWJOIN HINT.

    -

    odps.optimizer.stat.collect.auto

    Specifies whether to enable the Freeride feature. After the Freeride feature is enabled, the column stats metrics of tables are automatically collected. For more information, see Optimizer.

    • True: enabled.

    • False: disabled.

    odps.optimizer.stat.collect.plan

    The collection plan that is used to collect specific column stats metrics of specific columns. For more information, see Optimizer.

    -

    odps.sql.executionengine.batch.rowcount

    The number of data rows that can be processed by the SQL engine at a time. Default value: 1024. If an OOM error occurs because of a large field in a row or if a timeout occurs due to poor UDF performance, you must set this property to a smaller value. We recommend that you do not set this property to an excessively small value because an excessively small value may adversely affect the performance of the SQL engine.

    Valid values: 1 to 1024.

    odps.sql.external.net.vpc

    Specifies whether to enable virtual private cloud (VPC) for external tables. For more information, see Hologres external tables.

    • True: enabled.

    • False: disabled.

    odps.sql.groupby.position.alias

    Specifies whether to process integer constants in the GROUP BY clause as column numbers in a SELECT statement.

    • True: enabled.

    • False: disabled.

    odps.sql.groupby.skewindata

    Specifies whether to enable the anti-skew feature for the GROUP BY clause.

    • True: enabled.

    • False: disabled.

    odps.sql.orderby.position.alias

    Specifies whether to process integer constants in the ORDER BY clause as column numbers in a SELECT statement.

    • True: enabled.

    • False: disabled.

    odps.sql.groupby.orderby.position.alias

    Specifies whether to process integer constants in the GROUP BY and ORDER BY clauses as column numbers in a SELECT statement. We recommend that you specify this property.

    Note

    If you set this property to True for projects in which jobs are running, the job parsing and running may be adversely affected. Therefore, you must make sure that existing jobs can still run properly based on the original logic when you set the property to True. Otherwise, specify this property for sessions.

    • True: enabled.

    • False: disabled.

    odps.sql.mapjoin.memory.max

    The size of small tables that are read to the memory when the MAPJOIN statement is executed. Unit: MB.

    Valid values: 0 to 8192.

    odps.sql.metering.value.max

    The upper limit on resources consumed by an SQL statement. For more information, see Consumption control.

    -

    odps.sql.python.version

    The Python version on which SQL statements are executed.

    • cp27

    • cp37

    odps.sql.select.output.format

    Specifies whether to display the table headers in the returned results of the MaxCompute client. For more information, see MaxCompute client (odpscmd).

    • {""needHeader"":false,""fieldDelim"":""""}: The table headers are not displayed.

    • {""needHeader"":true,""fieldDelim"":""""}: The table headers are displayed.

    odps.sql.timezone

    The time zone of the MaxCompute project that you accessed. For more information about time zones, see Time zone configuration operations.

    -

    odps.sql.unstructured.data.oss.use.https

    Specifies whether to enable the mechanism of obtaining data by using HTTPS at the underlying layer when you access external tables. For more information, see Create an OSS external table.

    • True: enabled.

    • False: disabled.

    odps.sql.unstructured.oss.commit.mode

    Specifies whether to enable the multipart upload feature of Object Storage Service (OSS) to write data to OSS external tables. For more information, see Write data to OSS.

    • True: enabled.

    • False: disabled.

    odps.sql.decimal.tostring.trimzero

    Specifies whether to remove the zero at the end of the decimal point when the data type is DECIMAL. The default value is True, which indicates to remove the zero at the end of the decimal point.

    • True: The zero at the end of the decimal point is removed.

    • False: The zero at the end of the decimal point is retained.

    odps.sql.unstructured.tablestore.put.row

    Specifies whether to call the PutRow operation to write data to a Tablestore external table.

    Note

    For more information about PutRow, see PutRow.

    • True: enabled.

    • False: disabled.

    odps.sql.unstructured.external.max.dop

    The maximum concurrency of workers that are accessed from an external tables.

    The value of this property must be a positive integer.

    odps.sql.executionengine.enable.string.to.date.full.format

    Specifies whether to convert a date string that contains the hour, minute, and second parts.

    • True: enabled.

    • False: enabled.

    Default value: False.

    odps.sql.executionengine.enable.rand.time.seed

    Specifies whether the RAND function uses the current system time or the current instance ID as the seed to generate random numbers.

    • True: The RAND function uses the current system time as the seed to generate random numbers. However, the RAND function no longer ensures idempotence and cannot be used as the shuffle key. As a result, rerun results may be different.

    • False: The RAND function uses the current instance ID as the seed to generate random numbers. The RAND function can ensure idempotence.

    Default value: False.

    odps.sql.type.json.enable

    Specifies whether to enable JSON data types. For more information about JSON data types, see Instructions for using JSON data types in MaxCompute.

    • True: enabled.

    • False: disabled.

    odps.ext.oss.orc.native

    Specifies whether to use the native ORC reader to read tables.

    • True: The native ORC reader is used.

    • False: The native ORC reader is not used.

    odps.sql.job.max.time.hours

    The maximum duration in which a single job runs.

    Note

    If you need to configure the property at the project level, you can click the link or search for the DingTalk group (ID: 11782920) to contact MaxCompute technical support.

    Valid values: 1 to 72. Unit: hours.

    Default value: 24.

    odps.task.wlm.quota

    The name of the quota that is used to run tasks in the current session. For more information about quotas, see Manage quotas in the new MaxCompute console.

    -

    Note

    If you need to query the MaxCompute system variables that are configured for the current session, you can view them in the SourceXML of LogView. For more information, see Use LogView 2.0 to view job running information.

Examples

-- Set the size of data read by each Mapper to 256 MB. 
SET odps.stage.mapper.split.size=256;

Related statements

SHOW: This statement is used to display the properties that you configured by using the SET statement.