This topic describes the data types supported by MaxCompute V2.0, basic data types and complex data types.

Basic data types

MaxCompute supports various data types. MaxCompute V2.0 supports more basic data types.

To use new data types, you must run set odps.sql.type.system.odps2=true; or setproject odps.sql.type.system.odps2=true;. Otherwise, the error message xxxx type is not enabled in current mode may be displayed. In addition, you need to pay attention to impacts on the original tasks. For more information, see the precautions for using new data types described in this section.
Note The parameter odps.sql.type.system.odps2 for setting new data types supports only lowercase letters.
Type New in MaxCompute V2.0 Constant Description
TINYINT Yes 1Y, -127Y The 8-bit signed integer type.

Value range: -128 to 127.

SMALLINT Yes 32767S, -100S The 16-bit signed integer type.

Value range: -32768 to 32767.

INT Yes 1000, -15645787 The 32-bit signed integer type.

Value range: (-2) 31 to 2 31 -1. See note 1 and note 2.

BIGINT No 100000000000L, -1L The 64-bit signed integer type.

Value range: (-2) 63 +1 to 2 63 -1.

FLOAT Yes N/A The 32-bit binary float type.
DOUBLE No 3.1415926 1E+7 The 64-bit binary float type.
DECIMAL No 3.5BD, 99999999999.9999999BD The decimal precision number type. Integer range: (-10) 36 +1 to 10 36 -1. Decimal place: accurate to 10 -18 . See note 5.
VARCHAR(n) Yes N/A The variable-length character type. n specifies a length ranging from 1 to 65535. See note 3.
STRING No "abc", 'bcd', "alibaba", 'inc' The string type. Maximum length: 8 MB. See note 4.
BINARY Yes N/A The binary data type. Maximum length: 8 MB.
DATETIME No DATETIME '2017-11-11 00:00:00' The datetime type.

Value range: 0000-01-01 to 9999-12-31, accurate to milliseconds. See note 6.

TIMESTAMP Yes TIMESTAMP '2017-11-11 00:00:00.123456789' The timestamp type is independent of time zones. Value range: 0000-01-01 to 9999-12-31 23.59:59.999999999, accurate to nanoseconds.
Note For some time zone-related functions, such as cast(<a timestamp> as string), timestamps that are independent of time zones must be displayed in accordance with the current time zone.
BOOLEAN No True, False The Boolean type. Valid values: True, False.
DATE Yes DATE'2017-11-11' The date type, in the format of yyyy-mm-dd.

Value range: 0000-01-01 to 9999-12-31.

CHAR[(n)] Yes N/A The fixed-length character type. n specifies the length.

Maximum value: 255. If the length does not reach the specified value, all extra spaces are filled in, but are not involved in the comparison.

All the preceding data types can be NULL.
Note
Precautions for new data types:
  • New data types are supported by:
    • MaxCompute SQL
    • MapReduce of the new version
    • SDK 0.27.2-public and later versions, client 0.27.0 and later versions
  • Enabling new data types
    By default, new data types cannot be used due to historical reasons. To use the new data types, run a flag command (the flag command supports only lowercase letters):
    • Session level: When a new data type, such as tinyint, smallint, int, float, varchar, timestamp, or binary, is used in an SQL statement or a MapReduce task, you must insert set odps.sql.type.system.odps2=true; before the SQL statement or the MapReduce task, and commit and run it along with the SQL statement or the MapReduce task. When you use MaxCompute Studio to commit SQL statements, the system automatically inserts this set statement. For more information about examples, see Create and view a table.
    • Project level: You can enable new data types at the project level. The following section describes major impacts caused by enabling new data types. Read the description and proceed with caution. The project owner can run the following command to configure a project:
      setproject odps.sql.type.system.odps2=true;

      For more information about setproject, see Other operations.

  • Major impacts caused by enabling new data types:
    • Note 1: changes in the semantics of the INT keyword. After you enable the new data types, the INT keyword in the SQL statement refers to the 32-bit integer type, for example, cast(a as INT) is to convert the a type to the 32-bit integer type. If new data types are not enabled, it is converted to the BIGINT type, which is a 64-bit value. For example, in an SQL statement, cast(a as INT) is equivalent to cast(a as BIGINT), or create table t(a INT) is equivalent to create table a (BIGINT). To be compatible with the original data types of MaxCompute, if you do not set odps.sql.type.system.odps2=true; in new data types, the preceding conversion is still performed but an error is prompted, indicating that INT is processed as BIGINT. To prevent any confusion that may result from this conversion, we recommend that you change all INT to BIGINT.
    • Changes to the semantics of an integer constant
      Use the integer constant 1 in SELECT 1 + a; as an example.
      • If new data types are not enabled, the integer is processed as the BIGINT type. (If the length of the constant exceeds the value range of BIGINT, for example, 1000000000000000000000000, it is processed as the DOUBLE type.)
      • If new data types are enabled, the integer constant is first interpreted as 1 of a 32-bit integer of the INT type. If the constant value exceeds the value range of the INT type but does not exceed the value range of the BIGINT type, it is interpreted as the BIGINT type. If the value of the constant exceeds the value range of the BIGINT type, it is interpreted as the DOUBLE type.
      • Possible compatibility issues: The INT type may cause inconsistencies in function prototypes during subsequent operations. For example, the actions of peripheral tools and subsequent operations might be altered due to the new-type tables generated after data is written to a disk.
    • Changes to the implicit type conversion rules
      • If new data types are enabled, some implicit type conversions may be disabled. For example, errors may occur or precision may be reduced when the data type is converted from STRING to BIGINT, from STRING to DATETIME, from DOUBLE to BIGINT, from DECIMAL to DOUBLE, or from DECIMAL to BIGINT. In this case, you can use the CAST function to convert the data type.
      • Implicit type conversion has major impacts on function invocations and INSERT statements. For example, the SQL statements can be executed when new data types are not enabled, but an error is reported when new data types are enabled.
    • Changes to the supported operations, built-in functions, and UDFs
      When new data types are not enabled, some operations and built-in functions that use new data types as input parameters and return values are ignored. These operations and functions can be normally executed only after you enable the new data types. The major impact is as follows:
      • Some built-in functions can only be used after new data types are enabled. Such functions include most functions that use the INT type as parameters and subsequently bear BIGINT overload, such as YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, NANOSECOND, DAYOFMONTH, and WEEKOFYEAR. These functions can be implemented by using the DATEPART built-in function.
      • Changes to UDF resolution Assume that a UDF contains two overloads: BIGINT and INT. The BIGINT overload is performed based on the original data types because the INT type cannot be identified. However, the INT overload may be parsed based on the new data types.
    • Changes to the resolution of the BIGINT keyword

      A single integer constant, such as 123, is classified as the BIGINT type based on the original data types, whereas it is classified as the INT type based on the new data types. Possible compatibility issues: The INT type may cause inconsistencies in function prototypes during subsequent operations. For example, the actions of peripheral tools and subsequent operations might be altered due to new-type tables generated after data is written to a disk.

    • Changes to partition column types
      • If new data types are not enabled, the partition column type can only be STRING.
      • If new data types are enabled, the partition column type can be STRING, VARCHAR, CHAR, TINYINT, SMALLINT, INT, or BIGINT.
      • Special processing is used if new data types are not enabled: The partition fields in INSERT operations are processed as the STRING type. For example, enter: insert overwrite table t partition (pt = 045) select ... ;. Note that 045 is not enclosed in quotation marks. Therefore, it should be parsed as the integer type, that is, 045 is actually 45. However, if new data types are not enabled, 045 is still processed as 045. If new data types are enabled, the special processing is not performed, that is, 045 is 45.
    • Changes to the behavior of the LIMIT statement
      Use the SELECT * FROM t1 UNION ALL SELECT * FROM t2 limit 10; statement as an example.
      • If new data types are not enabled, this statement is SELECT * FROM t1 UNION ALL SELECT * FROM ( SELECT * FROM t2 limit 10) t2;.
      • If new data types are enabled, this statement is SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t2 ) t limit 10;.

      Actions of the UNION, INTERSECT, and EXCEPT, INTERSECT, EXCEPT, LIMIT, ORDER BY, DISTRIBUTE BY, SORT BY, and CLUSTER BY statements also change if new data types are enabled.

    • Changes to the resolution of the IN expression

      IN expressions, such as a in (1, 2, 3):

      • If new data types are not enabled, all the values in the parentheses () must be of the same type.
      • If new data type are enabled, all the values in the parentheses () are implicitly converted to the same type.
  • Note 2: If an INT constant exceeds the INT value range, it is converted into the BIGINT type. If an INT constant exceeds the BIGINT value range, it is converted into the DOUBLE type. In earlier MaxCompute versions, all INT types in SQL statements are converted to BIGINT, as shown in the following example:
    create table a_bigint_table(a int); -- The INT in this statement is actually processed as BIGINT.
    select cast(id as int) from mytable; -- The INT in this statement is actually processed as BIGINT.
  • Note 3: VARCHAR constants can be implicitly converted to STRING constants.
  • Note 4: STRING constants can be connected. For example,abcand xyz are parsed as abcxyz, and different parts can be written on different lines.
  • Note 5: When you insert a constant into the DECIMAL field, make sure that it is written in the same format as that in the constant definition, for example, 3.5BD in the following sample code.
    insert into test_tb(a) values (3.5BD)
  • Note 6: Time values in milliseconds are not displayed. Tunnel commands use -dfp to specify the time format, which can be accurate to milliseconds, for example, tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS'. For more information about Tunnel commands, see Tunnel commands.

Complex data types

The following table lists the complex data types supported by MaxCompute V2.0.

Type Definition method Construction method
ARRAY
  • array<int>
  • array<struct<a:int, b:string>>
  • array(1, 2, 3)
  • array(array(1, 2), array(3, 4))
MAP
  • map<string, string>
  • map<smallint, array<string>>
  • map("k1", "v1", "k2", "v2")
  • map(1S, array('a', 'b'), 2S, array('x', 'y'))
STRUCT
  • struct<x:int, y:int>
  • struct<field1:bigint, field2:array<int>, field3:map<int, int>>
  • named_struct('x', 1, 'y', 2)
  • named_struct('field1', 100L, 'field2', array(1, 2), 'field3', map(1, 100, 2, 200))
Note Complex data types can be nested. For the related built-in functions, see ARRAY, MAP, and STRUCT.