This topic describes the Hive-compatible data type edition in terms of its definition, supported data types, and differences with other data type editions. The Hive-compatible data type edition is one of the three data type editions of MaxCompute.

Description

If you want to use the Hive-compatible data type edition in your project, you must run the following code to enable the required data types:
setproject odps.sql.type.system.odps2=true; -- Enable MaxCompute V2.0 data types. 
setproject odps.sql.decimal.odps2=true; -- Enable the DECIMAL data type in MaxCompute V2.0. 
setproject odps.sql.hive.compatible=true; -- Enable Hive-compatible data types. 

Scenarios

The Hive-compatible data type edition is suitable for MaxCompute projects that are migrated from Hadoop and whose dependent components support the MaxCompute V2.0 data type edition.

Basic data types

The basic data types in the Hive-compatible data type edition is similar to those defined in the MaxCompute V2.0 data type edition. The difference between these two editions lies only in the DECIMAL type.
Data typeConstantDescription
TINYINT1Y and -127YThe 8-bit signed integer type.

Valid values: -128 to 127.

SMALLINT32767S and -100SThe 16-bit signed integer type.

Valid values: -32768 to 32767.

INT1000 and -15645787The 32-bit signed integer type.

Valid values: -231 to 231 - 1.

BIGINT100000000000L and -1LThe 64-bit signed integer type.

Valid values: -263 + 1 to 263 - 1.

BINARYunhex('FA34E10293CB42848573A4E39937F479')A binary number. The maximum length is 8 MB.
FLOATcast(3.14159261E+7 as float)The 32-bit binary floating point type.
DOUBLE3.14159261E+7The 64-bit binary floating point type.
DECIMAL(precision,scale)3.5BD and 99999999999.9999999BDThe exact numeric type based on the decimal system.
  • precision: indicates the number of digits in a value. Valid values: 1 to 38.
  • scale: indicates the number of digits to the right of the decimal point in a value. Valid values: 0 to 18.

If the two parameters are not specified, the default expression of this data type is decimal(10,0).

VARCHAR(n)NoneThe variable-length character type, in which n specifies the length.

Valid values: 1 to 65535.

CHAR(n)NoneThe fixed-length character type, in which n specifies the length. The maximum value is 255. If the length does not reach the specified value, extra spaces are automatically filled but are not involved in the comparison.
STRING"abc", 'bcd', "alibaba", and 'inc'The STRING type. The maximum length is 8 MB.
DATEDATE'2017-11-11'The DATE type. The value is in the format of yyyy-mm-dd.

Valid values: 0000-01-01 to 9999-12-31.

DATETIMEDATETIME'2017-11-11 00:00:00'The DATETIME type.

Valid values: 0000-01-01 00:00:00.000 to 9999-12-31 23.59:59.999.The value is accurate to the millisecond.

TIMESTAMPTIMESTAMP'2017-11-11 00:00:00.123456789'The TIMESTAMP type. The timestamp is independent of time zones.

Valid values: 0000-01-01 00:00:00.000000000 to 9999-12-31 23.59:59.999999999. The value is accurate to the nanosecond.

Note For some time zone-related functions, such as cast(<a timestamp> as string), data of the TIMESTAMP type that is independent of time zones must be displayed based on the current time zone.
BOOLEANTrue and FalseThe BOOLEAN type.

Valid values: True and False.

This section describes the data types:
  • All the preceding data types can contain NULL values.
  • The INT keyword in an SQL statement refers to the 32-bit integer type.
    -- Convert the value a into a 32-bit integer. 
    cast(a as INT)
  • By default, an integer constant is processed as the INT type. For example, integer constant 1 in SELECT 1 + a; is processed as the INT type. If a constant exceeds the value range of the INT type but does not exceed the value range of the BIGINT type, the constant is processed as the BIGINT type. If the constant exceeds the value range of the BIGINT type, the constant is processed as the DOUBLE type.
  • Implicit conversions
    • Specific implicit conversions are disabled. For example, if the data type is converted from STRING to BIGINT, from STRING to DATETIME, from DOUBLE to BIGINT, from DECIMAL to DOUBLE, or from DECIMAL to BIGINT, the precision may be reduced, or errors may occur. You can use the CAST function to force the data type conversions.
    • VARCHAR constants can be implicitly converted into STRING constants.
  • Tables, built-in functions, and user-defined functions (UDFs)
    • Built-in functions that require the MaxCompute V2.0 data type edition can be used.
    • The data types that are defined in UDFs are parsed and overloaded based on Hive-compatible data types.
    • The data type of a partition key column can be STRING, VARCHAR, CHAR, TINYINT, SMALLINT, INT, or BIGINT.
    • Specific functions do not support partition pruning in Hive-compatible mode. For more information, see Mappings between built-in functions of MaxCompute and built-in functions of Hive, MySQL, and Oracle.
  • STRING constants can be combined. For example, abc and xyz can be combined as abcxyz.
  • If a constant is inserted into a field of the DECIMAL type, the expression of the constant must conform to the format in the constant definition. For example, 3.5BD is used in the following sample code.
    insert into test_tb(a) values (3.5BD)
  • Time values of the DATETIME type do not include the millisecond component. You can add -dfp to Tunnel commands to display milliseconds in the time values, such as tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS'. For more information about Tunnel commands, see Tunnel commands.

Complex data types

Data typeDefinitionConstructor
ARRAY
  • array<int>
  • array<struct<a:int, b:string>>
  • array(1, 2, 3)
  • array(array(1, 2), array(3, 4))
MAP
  • map<string, string>
  • map<smallint, array<string>>
  • map("k1", "v1", "k2", "v2")
  • map(1S, array('a', 'b'), 2S, array('x', 'y'))
STRUCT
  • struct<x:int, y:int>
  • struct<field1:bigint, field2:array<int>, field3:map<int, int>>
  • named_struct('x', 1, 'y', 2)
  • named_struct('field1', 100L, 'field2', array(1, 2), 'field3', map(1, 100, 2, 200))
Note Complex data types of MaxCompute can be nested. For more information about the related built-in functions, see ARRAY, MAP, or STRUCT.

Differences between the Hive-compatible data type edition and other data type editions

  • The conversion rules for INSERT statements are different.
    • Hive-compatible data type edition: If a source data type can be explicitly converted into the data type of a table, MaxCompute automatically inserts a conversion function and runs it.
    • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: A source data type must be implicitly converted into the data type of a table. Otherwise, an error is returned.
      -- The following operations succeed in Hive-compatible mode but fail in other modes: 
      create table t (a bigint); 
      insert into table select 1.5; 
  • Function behavior is different.
    • +, -, *, /, and POW function
      • Hive-compatible data type edition: If the data exceeds the value range of a data type, the initial value is returned.
      • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: If the data exceeds the value range of a data type, an error is returned. In other modes, the value null is returned.
    • >, >=, =, <, and <=
      • Hive-compatible data type edition: The values of the DOUBLE type are directly compared.
      • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: If the values of the DOUBLE type are compared, they are considered the same if the first 15 digits to the right of the decimal point are the same. Other digits after the decimal point are not compared.
    • Bitwise operators: &, |, and ^
      • Hive-compatible data type edition: A value of the same data type as the input parameter is returned.
      • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: A value of the BIGINT type is returned.
    • LENGTH, LENGTHB, FIND_IN_SET, INSTR, SIZE, HASH, and SIGN functions
      • Hive-compatible data type edition: A value of the INT type is returned.
      • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: A value of the BIGINT type is returned.
    • FLOOR and CEIL
      • Hive-compatible data type edition: If the input parameter is of the DECIMAL type, a value of the DECIMAL type is returned.
      • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: If the input parameter is of the DECIMAL type, a value of the BIGINT type is returned.
    • FROM_UNIXTIME
      • Hive-compatible data type edition: A value of the STRING type is returned.
      • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: A value of the DATETIME type is returned.
    • CONCAT_WS
      • Hive-compatible data type edition: If a connected input string is NULL, the string is ignored.
      • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: If a connected input string is NULL, NULL is returned.
    • FIND_IN_SET
      • Hive-compatible data type edition: An empty string is considered the matching of the tail of the string.
        -- Hive-compatible mode 
        find_in_set("","")  1 is returned. 
        find_in_set("", "a,") 2 is returned. 
      • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: An empty string is considered unmatched, and 0 is returned.
    • REGEXP_(EXTRACT/REPLACE)
      • Hive-compatible data type edition: The REGEXP schema complies with the specifications of Java regular expressions.
      • MaxCompute V1.0 data type edition and MaxCompute V2.0 data type edition: The REGEXP schema complies with MaxCompute specifications.
    • SUBSTR
      string substr(string <str>, bigint <start_position>[, bigint <length>])

      start_position: required. A value of the BIGINT type. The default value is 1.

      • Hive-compatible data type edition: If start_position is set to 0, the return value is the same as that when this parameter is set to 1.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: If start_position is set to 0, null is returned.