Before you use MaxCompute, we recommend that you learn the limits on the use of MaxCompute. This topic describes the limits on the use of MaxCompute.

Limits on data upload and download

Before you upload or download data in MaxCompute, take note of the following limits:

  • Limits of data uploads by using Tunnel commands
    • You cannot use Tunnel commands to upload or download data of the ARRAY, MAP, or STRUCT type.
    • No limits are imposed on the upload speed. The upload speed depends on the network bandwidth and server performance.
    • The number of retries is limited. If the number of retries exceeds the limit, the next block is uploaded. After data is uploaded, you can execute the SELECT COUNT(*) FROM table_name statement to check whether any data is lost.
    • By default, a project supports a maximum of 2,000 concurrent tunnel connections.
    • On the server, the lifecycle of a session is 24 hours. A session can be shared among processes and threads on the server, but you must make sure that each block ID is unique.
  • Limits of data uploads by using DataHub:
    • The size of each field cannot exceed its upper limit. For more information, see Data type editions.
      Note The size of a string cannot exceed 8 MB.
    • During the upload, multiple data entries are packaged.
  • Limits of the TableTunnel SDK interface:
    • The value of a block ID must be greater than or equal to 0 and less than 20000. The amount of data that you want to upload in a block cannot exceed 100 GB.
    • The lifecycle of a session is 24 hours. If you want to transfer large amounts of data, we recommend that you transfer your data in multiple sessions.
    • The lifecycle of an HTTP request that corresponds to a RecordWriter is 120 seconds. If no data flows over an HTTP connection within 120 seconds, the server closes the connection.

For more information about data upload and download, see Data upload and download.

Limits on SQL

The following table describes the limits on the development of SQL jobs in MaxCompute.

Item Maximum value/Limit Category Description
Table name length 128 bytes Length limit A table or column name cannot contain special characters. It must start with a letter, and can contain only letters, digits, and underscores (_).
Comment length 1,024 bytes Length limit A comment is a valid string that can be up to 1,024 bytes in length.
Columns in a table 1,200 Quantity limit A table can contain a maximum of 1,200 columns.
Partitions in a table 60,000 Quantity limit A table can contain a maximum of 60,000 partitions.
Partition levels of a table 6 Quantity limit A table can contain a maximum of six levels of partitions.
Screen display 10,000 rows Quantity limit A SELECT statement can return a maximum of 10,000 rows.
INSERT targets 256 Quantity limit A MULTI-INSERT operation can insert data to up to 256 tables at a time.
UNION ALL targets 256 Quantity limit A UNION ALL operation can merge up to 256 tables at a time.
MAPJOIN targets 128 Quantity limit A MAPJOIN operation can join up to 128 small tables at a time.
MAPJOIN memory 512 MB Size limit The memory size for all small tables on which the MAPJOIN operation is performed cannot exceed 512 MB.
Window functions 5 Quantity limit A SELECT statement can contain a maximum of five window functions.
ptinsubq 1,000 rows Quantity limit If a subquery contains a partition column, the subquery can return no more than 1,000 rows.
Length of an SQL statement 2 MB Length limit The maximum length of an SQL statement is 2 MB. This limit is suitable for the scenarios where you use the SDK to call SQL statements.
Conditions of a WHERE clause 256 Quantity limit A WHERE clause can contain a maximum of 256 conditions.
Length of a column record 8 MB Length limit The maximum length of a column record in a table is 8 MB.
Parameters in an IN clause 1024 Quantity limit This item specifies the maximum number of parameters in an IN clause, for example, in (1,2,3….,1024). If an in (…) clause contains excessive parameters, the compilation performance will be affected. We recommend that you use no more than 1,024 parameters, but this is not a fixed upper limit.
jobconf.json 1 MB Size limit The maximum size of the jobconf.json file is 1 MB. If a table contains a large number of partitions, the size of the jobconf.json file may exceed 1 MB.
View Not writable Operation limit A view is not writable and does not support the INSERT operation.
Data type and position of a column Unmodifiable Operation limit The data type and position of a column are unmodifiable.
Java UDFs Not allowed to be abstract or static Operation limit Java UDFs cannot be abstract or static.
Partitions that can be queried 10,000 Quantity limit A maximum of 10,000 partitions can be queried.
SQL execution plans 1 MB Size limit The size of an execution plan generated from MaxCompute SQL statements cannot exceed 1 MB. Otherwise, the error message FAILED: ODPS-0010000:System internal error - The Size of Plan is too large appears.

For more information about SQL, see SQL.

Limits on MapReduce

The following table describes the limits on the development of MapReduce jobs in MaxCompute.

Item Value range Classification Configuration item Default value Configurable Description
Memory occupied by an instance [256 MB,12 GB] Memory odps.stage.mapper(reducer).mem and odps.stage.mapper(reducer).jvm.mem 2,048 MB and 1,024 MB Yes The memory occupied by a single map or reduce instance. The memory consists of two parts: the framework memory, which is 2,048 MB by default, and Java Virtual Machine (JVM) heap memory, which is 1,024 MB by default.
Number of resources 256 Quantity - N/A No Each job can reference up to 256 resources. Each table or archive is considered as one resource.
Numbers of inputs and outputs 1,024 and 256 Quantity - N/A No The number of the inputs of a job cannot exceed 1,024, and that of the outputs of a job cannot exceed 256. A partition of a table is regarded as one input. The number of tables cannot exceed 64.
Number of counters 64 Quantity - N/A No The number of custom counters in a job cannot exceed 64. The counter group name and counter name cannot contain number signs (#). The total length of the two names cannot exceed 100 characters.
Number of map instances [1,100000] Quantity odps.stage.mapper.num N/A Yes The number of map instances in a job is calculated by the framework based on the split size. If no input table is specified, you can set the odps.stage.mapper.num parameter to specify the number of map instances. The value ranges from 1 to 100,000.
Number of reduce instances [0,2000] Quantity odps.stage.reducer.num N/A Yes By default, the number of reduce instances in a job is 25% of the number of map instances. You can set the number to a value that ranges from 0 to 2,000. Reduce instances process much more data than map instances, which may result in long processing time in the reduce stage. A job can have 2,000 reduce instances at most.
Number of retries 3 Quantity - N/A No The maximum number of retries that are allowed for a map or reduce instance is 3. Exceptions that do not allow retries may cause jobs to fail.
Local debug mode A maximum of 100 instances Quantity - N/A No
In local debug mode:
  • The number of map instances is 2 by default and cannot exceed 100.
  • The number of reduce instances is 1 by default and cannot exceed 100.
  • The number of download records for one input is 100 by default and cannot exceed 10,000.
Number of times a resource is read repeatedly 64 Quantity - N/A No The number of times that a map or reduce instance repeatedly reads a resource cannot exceed 64.
Resource bytes 2 GB Length - N/A No The total bytes of resources that are referenced by a job cannot exceed 2 GB.
Split size Greater than or equal to 1 Length odps.stage.mapper.split.size 256 MB Yes The framework determines the number of map instances based on the split size.
Length of a string in a column 8 MB Length - N/A No A string in a column cannot exceed 8 MB in length.
Worker timeout period [1,3600] Time odps.function.timeout 600 Yes The timeout period of a map or reduce worker when the worker does not read or write data, or stops sending heartbeats by using context.progress(). The default value is 600 seconds.
Field types supported by tables that are referenced by MapReduce BIGINT, DOUBLE, STRING, DATETIME, and BOOLEAN Data type - N/A No When a MapReduce task references a table, an error is returned if the table has field types that are not supported.
Object Storage Service (OSS) data read - Feature - N/A No MapReduce cannot read OSS data.
New data types in MaxCompute V2.0 - Feature - N/A No MapReduce does not support the new data types in MaxCompute V2.0.

For more information about MapReduce, see Overview.

Limits on PyODPS

Before you use DataWorks to develop PyODPS jobs in MaxCompute, take note of the following limits:

  • Each PyODPS node can process a maximum of 50 MB of data and can occupy a maximum of 1 GB of memory. Otherwise, DataWorks terminates the PyODPS node. Do not write unnecessary Python data processing code in PyODPS tasks.
  • The efficiency of writing and debugging code in DataWorks is low. We recommend that you install an integrated development environment (IDE) on your machine to write code.
  • To avoid excess pressure on the gateway of DataWorks, DataWorks limits the CPU utilization and memory usage. If the system displays Got killed, the memory usage exceeds the limit and the system terminates the related processes. Therefore, we recommend that you do not perform local data operations. However, the limits on the memory usage and CPU utilization do not apply to SQL or DataFrame nodes, except to_pandas, that are initiated by PyODPS.
  • Functions may be limited in the following aspects due to the lack of packages such as matplotlib:
    • The use of the plot function of DataFrame is affected.
    • DataFrame user-defined functions (UDFs) can be used only after they are submitted to MaxCompute. As required by the Python sandbox, you can use only pure Python libraries and the NumPy library to run UDFs. Other third-party libraries such as pandas cannot be used.
    • However, you can use the NumPy and pandas libraries that are pre-installed in DataWorks to run non-UDFs. Third-party packages that contain binary code are not supported.
  • For compatibility reasons, options.tunnel.use_instance_tunnel is set to False in DataWorks by default. If you want to enable InstanceTunnel globally, you must set this parameter to True.
  • For implementation reasons, the Python atexit package is not supported. You must use try-finally to implement relevant features.

For more information about PyODPS, see PyODPS.

Limits on Graph

Before you develop Graph jobs in MaxCompute, take note of the following limits:

  • Each job can reference up to 256 resources. Each table or archive is considered as one unit.
  • The total bytes of resources referenced by a job cannot exceed 512 MB.
  • The number of the inputs of a job cannot exceed 1,024, and that of the outputs of a job cannot exceed 256. The number of input tables cannot exceed 64.
  • Labels that are specified for multiple outputs cannot be null or empty strings. A label cannot exceed 256 strings in length and can contain only letters, digits, underscores (_), number signs (#), periods (.), and hyphens (-).
  • The number of custom counters in a job cannot exceed 64. The counter group name and counter name cannot contain number signs (#). The total length of the two names cannot exceed 100 characters.
  • The number of workers for a job is calculated by the framework. The maximum number of workers is 1,000. An exception is thrown if the number of workers exceeds this value.
  • A worker consumes 200 units of CPU resources by default. The range of resources consumed is 50 to 800.
  • A worker consumes 4,096 MB memory by default. The range of memory consumed is 256 MB to 12 GB.
  • A worker can repeatedly read a resource up to 64 times.
  • The default value of split_size is 64 MB. You can set the value as needed. The value of split_size must be greater than 0 and smaller than or equal to the result of the 9223372036854775807>>20 operation.
  • GraphLoader, Vertex, and Aggregator in MaxCompute Graph are restricted by the Java sandbox when they are run in a cluster. However, the main program of Graph jobs is not restricted by the Java sandbox. For more information, see Java Sandbox.

For more information about Graph, see Graph.

Other limits

The following table describes the maximum parallelism of jobs that you can submit in a MaxCompute project in different regions.

Region Maximum job parallelism for a MaxCompute project
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), and China (Chengdu) 2500
China (Hong Kong), Singapore (Singapore), Australia (Sydney), Malaysia (Kuala Lumpur), Indonesia (Jakarta), Japan (Tokyo), Germany (Frankfurt), US (Silicon Valley), US (Virginia), UK (London), India (Mumbai), and UAE (Dubai) 300

If you continue to submit jobs when the parallelism of jobs that you submit in a MaxCompute project reaches the maximum, an error is returned. The following error message shows an example: Request rejected by flow control. You have exceeded the limit for the number of tasks you can run concurrently in this project. Please try later.