This topic describes the updates to MaxCompute in reverse chronological order.

June 29, 2020 (UTC+08:00): Users can choose data type editions for new projects

The data type editions available for new MaxCompute projects are being updated for the international website (alibabacloud.com) from June 29, 2020 to July 15, 2020. After the update, users must choose initial data type editions for new MaxCompute projects.

MaxCompute provides the following three data type editions: MaxCompute V1.0 data type edition, MaxCompute V2.0 data type edition, and Hive-compatible data type edition. These editions are different in definitions and usage. MaxCompute provides three attributes to configure data type editions. You can configure these attributes as required to enable an edition. For more information, see Date types.
Note This feature has no impact on the data type editions of existing MaxCompute projects. To modify the data type editions of existing MaxCompute projects, perform the operations described in Date types.

March 15, 2020 (UTC+08:00): The storage price of MaxCompute is reduced

From March 15, 2020, the storage price of MaxCompute is reduced. For more information, see Storage pricing (pay-as-you-go). The price is reduced based on the following rules:
  • The number of pricing tiers is reduced from five to three to simplify storage billing.
  • The unit price in each tier is reduced to lower the storage price.
The following table lists the tiered pricing method used before March 15, 2020.
Volume of data stored Tiered unit price Fixed price
0 < Data volume ≤ 1 GB N/A USD 0.00 per day
1 GB < Data volume ≤ 100 GB USD 0.0028 per GB per day N/A
100 GB < Data volume ≤ 1 TB USD 0.0014 per GB per day N/A
1 TB < Data volume ≤ 10 TB USD 0.0013 per GB per day N/A
10 TB < Data volume ≤ 100 TB USD 0.0011 per GB per day N/A
Data volume > 100 TB USD 0.0009 per GB per day N/A
The following table lists the tiered pricing method used from March 15, 2020.
Volume of data stored Tiered unit price Fixed price
0 < Data volume ≤ 1 GB N/A USD 0.00 per day
1 GB < Data volume ≤ 10 TB USD 0.0011 per GB per day N/A
10 TB < Data volume ≤ 100 TB USD 0.0009 per GB per day N/A
Data volume > 100 TB USD 0.0006 per GB per day N/A
The storage billing method remains unchanged. For more information, see Storage pricing (pay-as-you-go).
  • In MaxCompute, you are charged daily for the volume of data stored, which includes the tables and resources.
  • MaxCompute records the volume of data stored in each project on an hourly basis and then calculates the average volume of data stored for each day. The storage fee is equal to the average volume of data stored multiplied by the unit price. Since MaxCompute calculates the average volume of data stored in each project in a day, storing more data in a certain project means lower storage expenses.
Assume that the daily average data volume of a project is 1 PB. The daily fee charged by using the original tiered pricing method is as follows:
(100 - 1) GB × USD 0.0028 per GB per day 
+ (1024 - 100) GB × USD 0.0014 per GB per day 
+ (10240 - 1024) GB × USD 0.0013 per GB per day 
+ (102400 - 10240) GB × USD 0.0011 per GB per day 
+ (10240 × 10240 - 102400) GB × USD 0.0009 per GB per day
= USD 966.486 per day
The daily expense charged by using the new tiered pricing method is as follows:
(10240 - 1) GB × USD 0.0011 per GB per day
+ (102400 - 10240) GB × USD 0.0009 per GB per day 
+ (10240 × 10240 - 102400) GB × USD 0.0006 per GB per day
= USD 661.9125 per day

February 24, 2020 (UTC+08:00): The compatibility of the SQL engine is upgraded

The following table describes the schedule for upgrading the SQL engine compatibility. If any change occurs, the new schedule prevails.
Batch Site Date
1 Indonesia (Jakarta), UK (London), and India (Mumbai) February 24, 2020
2 UAE (Dubai), US (Virginia), China (Beijing), and China (Hong Kong) February 26, 2020
3 Malaysia (Kuala Lumpur), Japan (Tokyo), and Germany (Frankfurt) March 2, 2020
4 US (Silicon Valley), Singapore, and Australia (Sydney) March 4, 2020
  • The execution rule of the GET_IDCARD_AGE function has changed.
    • Based on the previous execution rule of the GET_IDCARD_AGE function, if the difference between the current year and the year of birth is greater than 100, NULL is returned. However, after the new rule is applied, the difference between the current year and the year of birth is returned even if the difference exceeds 100. For example, the execution result of get_idcard_age('110101190001011009') is NULL before the change and 120 after the change.
    • If you want to apply the original execution rule after the change, you need to find the query statement that corresponds to the rule and modify it as required. For example, you can modify the GET_IDCARD_AGE by adding an IF or a CASE WHEN expression.
      Original query statement Modified query statement
      GET_IDCARD_AGE(idcardno) if(GET_IDCARD_AGE(idcardno) > 100, NULL, GET_IDCARD_AGE(idcardno))
      GET_IDCARD_AGE(idcardno) CASE WHEN GET_IDCARD_AGE(idcardno) > 100 THEN NULL ELSE GET_IDCARD_AGE(idcardno) END
  • The execution rule of the CONCAT_WS function has changed.
    • Before the change, if the CONCAT_WS function used in a query did not support Hive but had three or more parameters that included at least one parameter of the ARRAY type, the array items did not appear in the final result. For example, after you run the concat_ws(',', array('a'), array('b', 'c')) function, the expected result is "a,b,c", but the actual result is ",,,".
    • After the change, parameters of the STRING and ARRAY types can coexist in the CONCAT_WS function without Hive support enabled. For example, the result of the concat_ws(',', array('a'), array('b', 'c')) function is "a,b,c".
  • The execution rule of the Like%% function has changed in scenarios where the input value is an empty string.
    Before the change, if the input value for the character matching function LIKE is an empty string and its pattern is %%, the return value is False. After the change, the return value is True.
    -- Create a table and insert an empty string into the table.
    create table if not exists table_test (a string) lifecycle 3;
    insert into table table_test values ('');
    
    select a like '%%' from table_test;
    
    -- The following result is returned before the change:
    +------+
    | _c0  |
    +------+
    | false |
    +------+
    
    -- The following result is returned after the change:
    +------+
    | _c0  |
    +------+
    | true |
    +------+

December 25, 2019 (UTC+08:00): MaxCompute is compatible with open-source geospatial UDFs

MaxCompute is compatible with open-source geospatial UDFs, which are specifically implemented by ESRI for Apache Hive. You can register open-source geospatial UDFs with MaxCompute so that the functions can be called like open-source Hive UDFs. For more information, see Open source geospatial UDFs.

October 11, 2019 (UTC+08:00): New features of MaxCompute SQL are introduced

  • You can force the JOIN operations or SET operations inside parentheses (( )) to run first.
    SELECT * FROM src JOIN (src2 JOIN src3 on xxx) ON yyy; 
    SELECT * FROM src UNION ALL (SELECT * FROM src2 UNION ALL SELECT * FROM src3);

    For more information, see JOIN and UNION, INTERSECT, and EXCEPT.

  • MaxCompute SQL supports the hive.orderby.position.alias and hive.groupby.position.alias flags.
    If the two flags are enabled, the constants of the INTEGER type in the ORDER BY and GROUP BY clauses of the SELECT statements are processed as column numbers.
    The columns in the src table are key and value.
    SELECT * FROM src ORDER BY 1;
    -- Equivalent to
    SELECT * FROM src ORDER BY key;

    For more information, see SELECT syntax.

  • MaxCompute supports the following built-in functions:
    • STRING JSON_TUPLE(STRING json,STRING key1,STRING key2,...): It converts a JSON string to a tuple based on a group of key. The JSON_TUPLE() function supports multi-level nesting. It can parse JSON data that contains nested arrays. To parse the same JSON string multiple times, you need to call the GET_JSON_OBJECT() function multiple times. However, the JSON_TUPLE function allows you to enter multiple keys at a time and parse the JSON string only once, which is more efficient. For more information, see String functions.
    • INT EXTRACT(datepart from timestamp): It is used to extract a part from the date specified by the datepart parameter, such as YEAR, MONTH, and DAY. The value of timestamp is a date value of the TIMESTAMP type. For more information, see Date functions.
  • MaxCompute allows you to specify default values for columns in a table.
    DEAFULT VALUE allows you to specify a default value for a column when you create a table. If you do not specify a column during an INSERT operation, this default value is inserted. The following code provides an example:
    CREATE TABLE t (a bigint default 10, b bigint);
    INSERT INTO TABLE t(b) SELECT 100; 
    -- Equivalent to
    INSERT INTO TABLE t(a, b) SELECT 10, 100;
  • MaxCompute supports a NATURAL JOIN.
    A NATURAL JOIN is a process where two tables are joined on the basis of their common columns. Common columns are columns that have the same name in both tables. MaxCompute supports OUTER NATURAL JOIN. You can use the USING clause so that the JOIN operation returns common columns only once. The following code provides an example:
    -- To join the src table that contains the key1, key2, a1, and a2 columns and the src2 table that contains the key1, key2, b1, and b2 columns, you can execute the following statement:
    SELECT * FROM src NATURAL JOIN src2;
    -- Both the src and src2 tables contain the key1 and key2 columns. In this case, the preceding statement is equivalent to the following statement:
    SELECT src.key1 as key1, src.key2 as key2, src.a1, src.a2, src2.b1, src2.b2 FROM src INNER JOIN src2 ON src.key1 = src2.key1 AND src.key2 = src2.key2;
    For more information, see JOIN.
  • The LIMIT and OFFSET clauses are supported.
    The OFFSET clause can be used with the ORDER BY LIMIT clause to skip the number of rows specified by OFFSET. For example, execute the following statement to sort the rows of the src table in ascending order by key, and return the 11th to 20th rows. OFFSET 10 indicates that the first 10 rows are skipped, and LIMIT 10 indicates that a maximum of 10 rows can be returned.
    SELECT * FROM src ORDER BY key LIMIT 10 OFFSET 10;

    For more information, see SELECT syntax.

  • MaxCompute supports built-in operators.
    • The IS DISTINCT FROM operator is supported. a is distinct from b is equivalent to not(a <=> b) , and a is not distinct from b is equivalent to a <=> b .
    • The operator used to concatenate strings (||) is supported. For example, a || b || c is equivalent to concat(a, b, c).

    For more information, see Operators.

  • MaxCompute supports partition merging.

    In MaxCompute, a maximum of 60,000 partitions can be created in a table. If excessive partitions exist, you can archive data in the data warehouse and merge partitions to reduce the number of partitions. When you trigger partition merging, MaxCompute quickly merges multiple partitions in the same table into one partition, migrates their data to the merged partition, and drops the previous partitions. You can use the following statement to merge partitions. For more information, see Partition and column operations.

    ALTER TABLE <tableName> MERGE [IF EXISTS] PARTITION(<predicate>) [, PARTITION(<predicate2>) ...] OVERWRITE PARTITION(<fullPartitionSpec>) ;
  • Add/Drop Partitions
    MaxCompute allows you to add or drop multiple partitions at a time by executing the following statements:
    ALTER TABLE t ADD [IF NOT EXISTS] PARTITION (p = '1') PARTITION (p = '2');
    ALTER TABLE t DROP [IF EXISTS]  PARTITION (p = '1'), PARTITION (p = '2');
    -- No commas (,) exist between partitions in the ADD clause while commas (,) are used to separate partitions in the DROP clause.

August 29, 2019 (UTC+08:00): A custom storage handler for an external table is used to upgrade the Outputer interface in certain regions

On August 29, 2019, MaxCompute is upgraded. The upgrade may fail if you use a custom storage handler for an external table to upgrade the Outputer interface and the column data is obtained by column name instead of array subscript.

Upgrade time: (UTC+08:00), 14:00 to 23:00, August 29, 2019

Regions: US (Silicon Valley) and Singapore

August 21, 2019 (UTC+08:00): A custom storage handler for an external table is used to upgrade the Outputer interface in certain regions

On August 21, 2019, MaxCompute is upgraded. The upgrade may fail if you use a custom storage handler for an external table to upgrade the Outputer interface and the column data is obtained by column name instead of array subscript.

Upgrade time: (UTC+08:00), 14:00 to 23:00, August 21, 2019

Regions: Japan (Tokyo), Germany (Frankfurt), China (Hong Kong), and Australia (Sydney)

Impact: In Outputer.output(Record record), the input record is generated by the last operator of Outputer. Column names are not fixed.

For example, the column name generated by the some_function(column_a) expression is a temporary column name. Therefore, we recommend that you use record.get(index) instead of record.get(Column name) to obtain the content of a column. To obtain column names from a table in Outputer, call DataAttributes.getFullTableColumns().

If you have any questions, submit a ticket.

July 24, 2019 (UTC+08:00): MaxCompute Spark is released

Regions: China (Hangzhou), China (Beijing), China (Shenzhen), US (Silicon Valley), China (Hong Kong), Germany (Frankfurt), Singapore, and India (Mumbai)

March 26, 2019 (UTC+08:00): MaxCompute SQL is upgraded

  • The GROUPING SET clause can be used with both the CUBE and ROLLUP subclauses to aggregate and analyze data of multiple dimensions. For example, you need to aggregate column a, aggregate column b, and then aggregate columns a and b. For more information, see GROUPING SETS.
  • The INTERSECT, MINUS, and EXCEPT clauses are supported. For more information, see UNION, INTERSECT, and EXCEPT.
  • When MaxCompute reads files in the ORC or Parquet format in OSS by using external tables, it prunes the columns in files to reduce I/O, save resources, and lower overall computing costs.
  • Systems that run in the Java UDX framework are enhanced to support writable parameters. For more information, see Java UDF.
Optimized SQL performance
  • DynamicDAG: a required mechanism for dynamic optimization. DynamicDAG is triggered at runtime to delay the optimization of resource allocation or algorithm selection. This improves optimizations and reduces the generation of low-performance execution plans.
  • ShuffleRemove optimization: optimization for shuffles. MaxCompute supports ShuffleRemove for right tables that have duplicate keys during the execution of the INNER JOIN clause.

March 1, 2019 (UTC+08:00): MaxCompute SQL jobs used to process data in external tables begin to incur fees

As of March 1, 2019, all MaxCompute SQL jobs used to process OSS and Tablestore data begin to incur fees.

Billing standard for SQL computing that involves external tables:
Fee of an SQL computing job = Input data volume × Unit price for SQL computing
The unit price for SQL computing that involves external tables is USD 0.0044 per GB. All fees are charged the next day, and you will receive a bill in your account. For more information, see Billing method. If you have any questions, submit a ticket.

16:00 to 20:00 on January 15, 2019 (UTC+08:00): The underlying structure of MaxCompute in the China (Hong Kong) region is optimized

The underlying metadatabase of MaxCompute in the China (Hong Kong) region is optimized from 16:00 to 20:00 on January 15, 2019 to improve performance and stability of MaxCompute. During optimization, users in the China (Hong Kong) region may encounter submission delays or task failures for about one minute. In severe cases, applications may be unavailable for up to 30 minutes. We recommend that you do not submit any tasks during this period. Users in other regions are not affected. If you have any questions, contact us through DingTalk or submit a ticket.

December 24, 2018 (UTC+08:00): MaxCompute supports the time zone configuration

The default time zone of MaxCompute projects is UTC+8. Time-related built-in functions and the fields of the DATETIME, TIMESTAMP, and DATE types are calculated based on UTC+8. As of December 24, 2018, users can configure time zones in MaxCompute by using one of the following methods:

  • At the session level: Commit the set odps.sql.timezone=<timezoneid>; statement along with a computing statement for execution. For example:
    set odps.sql.timezone=Asia/Tokyo;
    select getdate();
    -- Result:
    output:
    +------------+
    | _c0        |
    +------------+
    | 2018-10-30 23:49:50 |
    +------------+
  • At the project level: Execute the setProject odps.sql.timezone=<timezoneid>; statement as the project owner. Once the time zone of a project is set, it is used for all time computing, and the data of existing tasks is affected. Therefore, exercise caution when you perform this operation. We recommend that you only perform this operation on new projects.
Limits and precautions:
  • SQL built-in date functions, UDFs, UDTs, UDJs, and the SELECT TRANSFORM statement allow you to obtain the timezone attribute of a project to set the time zone.
  • A time zone must be configured in the format such as Asia/Shanghai, which supports daylight saving time. Do not configure it in the GMT+9 format.
  • If the time zone in the SDK differs from that of the project, you must configure the GMT time zone to convert the data type from DATETIME to STRING.
  • After the time zone is configured, there are differences between the real time and the output time of related SQL statements you execute in DataWorks. Between the years of 1900 and 1928, the time difference is 352 seconds. Before the year of 1900, the time difference is 9 seconds.
  • MaxCompute, SDK for Java, and the related client are upgraded to ensure that DATETIME data stored in MaxCompute is correct across time zones. The target versions of SDK for Java and the related client have the -oversea suffix. The upgrade may affect the display of DATETIME data earlier than January 1, 1928 in MaxCompute.
  • If the local time zone is not UTC+8 when you upgrade MaxCompute, we recommend that you upgrade SDK for Java and the related client. This ensures that the SQL-based computing result and data transferred by running Tunnel commands after January 1, 1900 are accurate and consistent. For DATETIME data that is earlier than January 1, 1900, the SQL-based computing result and data transferred by running Tunnel commands might differ up to 343 seconds. For DATETIME data that is earlier than January 1, 1928 and is uploaded before SDK for Java and the related client are upgraded, the time in the new version is 352 seconds earlier.
  • If you do not upgrade SDK for Java and the client to versions with the -oversea suffix, the SQL-based computing result may differ from data transferred by running Tunnel commands. For data earlier than January 1, 1900, the time difference is 9 seconds. For data between January 1, 1900 and January 1, 1928, the time difference is 352 seconds.
    Note Modifying the time zone configuration in SDK for Java or the client does not affect the time zone configuration in DataWorks. Therefore, the time zones are different. You must evaluate how this may affect scheduled tasks in DataWorks. The time zone of a DataWorks server in the Japan (Tokyo) region is GMT+9, and that in Singapore is GMT+8.
  • If you are using a third-party client that is connected to MaxCompute by using Java Database Connectivity, you must set the time zone on the client to ensure that the time of the client and that of the server are consistent.
  • MaxCompute MapReduce supports the time zone configuration.
  • MaxCompute Spark supports the time zone configuration.
    • If tasks are submitted to the MaxCompute computing cluster, the time zone of the project is automatically obtained.
    • If tasks are submitted from spark-shell, spark-sql, or pyspark in yarn-client mode, you must configure parameters in the spark-defaults.conf file of the driver and add spark.driver.extraJavaOptions -Duser.timezone=America/Los_Angeles. The timezone parameter indicates the time zone you want to use.
  • Machine Learning Platform for AI supports the time zone configuration.
  • MaxCompute Graph supports the time zone configuration.