This topic describes how to use the set command to set MaxCompute or user-defined system variables and how to clear the settings.
You can use the set command to configure operations in MaxCompute.
|odps.sql.allow.fullscan||Specifies whether to allow a partitioned table to be fully scanned.||
|odps.stage.mapper.mem||Sets the memory size of each map worker.||Default value: 1024 MB|
|odps.stage.reducer.mem||Sets the memory size of each reduce worker.||Default value: 1024 MB|
|odps.stage.joiner.mem||Sets the memory size of each join worker.||Default value: 1024 MB|
|odps.stage.mem||Sets the memory size of all workers in specified MaxCompute jobs. This set key has lower priority than that of the preceding three ones.||This set key does not have a default value.|
|odps.stage.mapper.split.size||Modifies the input data quantity of each map worker (the input file shard size) to manage the number of workers at each map stage.||Default value: 256 MB|
|odps.stage.reducer.num||Modifies the number of workers at each reduce stage.||This variable does not have a default value.|
|odps.stage.joiner.num||Modifies the number of workers at each join stage.||This variable does not have a default value.|
|odps.stage.num||Modifies the worker concurrency at all stages in specified MaxCompute jobs. This set key has lower priority than that of the preceding three ones.||This variable does not have a default value.|
|odps.sql.reshuffle.dynamicpt||Sets dynamic partitions to avoid generating excessive small files.||
Default value: True
Note If a small number of dynamic partitions are generated, we recommend that you set the value of this variable to False to avoid data skew.
|odps.sql.type.system.odps2||Sets the value to True if the SQL statement contains new data types such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, and TIMESTAMP BINARY.||
Default value: False
|odps.sql.hive.compatible||Specifies whether to enable the Hive compatibility mode. MaxCompute supports various syntaxes specified by Hive, such as inputRecordReader, outputRecordReader, and Serde only after the Hive compatibility mode is enabled.||
Default value: False
- The preceding SQL statement is used to adjust the buffer size of a complex type column in a table that is written in MaxCompute.
- Application scenarios
- An output table contains many complex data types.
- The size of a single complex variable contained in the output table exceeds the specified size.
- The default buffer size is 67,108,864 byte (64 MB).
- If an output table has 3 columns whose schema is of a complex type, such as string,
map, struct, array, and binary, MaxCompute reserves 64 MB for each column by default.
The buffer that is applied for each column is used to store data for the corresponding
batch row countrow.
- We recommend that you set a proper value according to the estimated size of the complex
variables in a table. For example, if the size of each complex variable does not exceed
1,024 byte and the
batch row countvalue is 1024 (the default value), you can set the flag to 1024 × 1024. The following is an example:
- If the value of each complex variable is between 7 MB and 8 MB, and the value of
batch row countis 32, you can set this value to 8 MB × 32.
- If the output of a task has a complex type, or if the
mapjointable of a task has a complex type, adjusting the value affects the memory during task running. An excessive large value might cause out of memory (OOM).
- Similar to
set odps.stage.mapper.split.size, the
set odps.stage.mapper.split.sizecommand can also be used to adjust the quantity of data (in MB) read by each
mapperworker. The following is an example:
show flags; --Display parameters set by using the set command.