This topic describes the Hive-compatible data type edition in terms of the definition, supported data types, and differences with other data type editions. The Hive-compatible data type edition is one of the three data type editions of MaxCompute.

Definition

If the Hive-compatible data type edition is used in your project, the data types are defined as follows:
setproject odps.sql.type.system.odps2=true;-- Enables MaxCompute V2.0 data types.
setproject odps.sql.decimal.odps2=true;-- Enables the DECIMAL type in MaxCompute V2.0.
setproject odps.sql.hive.compatible=true;-- Enables Hive-compatible data types.

Scenarios

The Hive-compatible data type edition is suitable for MaxCompute projects that are migrated from Hadoop and whose dependent components support the MaxCompute V2.0 data type edition.

Basic data types

The basic data types in the Hive-compatible data type edition are basically the same as those defined in the MaxCompute V2.0 data type edition. The difference between these two editions only lies in the DECIMAL type.
Data type Constant Description
TINYINT 1Y and -127Y The 8-bit signed integer type.

Valid values: -128 to 127.

SMALLINT 32767S and -100S The 16-bit signed integer type.

Valid values: -32768 to 32767.

INT 1000 and -15645787 The 32-bit signed integer type.

Valid values: -2 31 to 2 31 -1.

BIGINT 100000000000L and -1L The 64-bit signed integer type.

Valid values: -2 63 +1 to 2 63 -1.

BINARY None A binary number. The maximum length is 8 MB.
FLOAT None The 32-bit binary floating point type.
DOUBLE 3.1415926 1E+7 The 64-bit binary floating point type.
DECIMAL(precision,scale) 3.5BD and 99999999999.9999999BD The precise numeric type based on the decimal system.
  • precision: indicates the number of digits in a value. Valid values: 1 to 38.
  • scale: indicates the number of digits to the right of the decimal point in a value. Valid values: 0 to 38.

If these two parameters are not specified, the default expression of this data type is decimal(10,0).

VARCHAR(n) None The variable-length character type, in which n specifies the length.

Valid values: 1 to 65535.

CHAR(n) None The fixed-length character type, in which n specifies the length. The maximum value is 255. If the length does not reach the specified value, extra spaces are automatically filled but are not involved in the comparison.
STRING "abc", 'bcd', "alibaba", and 'inc' The string type. The maximum length is 8 MB.
DATE DATE'2017-11-11' The date type, in the format of yyyy-mm-dd.

Valid values: 0000-01-01 to 9999-12-31.

DATETIME DATETIME '2017-11-11 00:00:00' The date and time type.

Valid values: 0000-01-01 00:00:00.000 to 9999-12-31 23.59:59.999, accurate to the millisecond.

TIMESTAMP TIMESTAMP '2017-11-11 00:00:00.123456789' The timestamp type, independent of time zones.

Valid values: 0000-01-01 00:00:00.000000000 to 9999-12-31 23.59:59.999999999, accurate to the nanosecond.

Note For some time zone-related functions, such as cast(<a timestamp> as string), data of the TIMESTAMP type that is independent of time zones must be displayed based on the current time zone.
BOOLEAN True and False The BOOLEAN type.

Valid values: True and False.

The data types are described as follows:
  • All the preceding types can contain NULL values.
  • The INT keyword in an SQL statement refers to the 32-bit integer type.
    -- Convert a into a 32-bit integer.
    cast(a as INT)
  • By default, an integer constant is processed as the INT type. For example, integer constant 1 in SELECT 1 + a; is processed as the INT type. If a constant exceeds the value range of the INT type, the constant is processed as the DOUBLE type.
  • Implicit conversions
    • Some implicit type conversions are disabled. For example, if the data type is converted from STRING to BIGINT, from STRING to DATETIME, from DOUBLE to BIGINT, from DECIMAL to DOUBLE, or from DECIMAL to BIGINT, errors may occur, or precision may be reduced. You can use the CAST function to forcibly convert the data type.
    • VARCHAR constants can be implicitly converted to STRING constants.
  • Tables, built-in functions, and user-defined functions (UDFs)
    • Built-in functions that require the MaxCompute V2.0 data type edition can be executed.
    • The data types defined in UDFs are parsed and overloaded based on Hive-compatible data types.
    • The data type of a partitioning column can be STRING, VARCHAR, CHAR, TINYINT, SMALLINT, INT, or BIGINT.
    • Some functions do not support partition pruning in Hive mode. For more information, see Comparison of functions built in MaxCompute, MySQL, and Oracle.
  • STRING constants can be combined. For example, abc and xyz can be combined as abcxyz.
  • If a constant is inserted into a field of the DECIMAL type, the expression of the constant must conform to the format in the constant definition, for example, 3.5BD in the following sample code:
    insert into test_tb(a) values (3.5BD)
  • Time values of the DATETIME type do not include the millisecond component. You can add -dfp in Tunnel commands to display milliseconds in the time values, for example, tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS'. For more information about Tunnel commands, see Tunnel commands.

Complex data types

Data type Definition Constructor
ARRAY
  • array<int>
  • array<struct<a:int, b:string>>
  • array(1, 2, 3)
  • array(array(1, 2), array(3, 4))
MAP
  • map<string, string>
  • map<smallint, array<string>>
  • map("k1", "v1", "k2", "v2")
  • map(1S, array('a', 'b'), 2S, array('x', 'y'))
STRUCT
  • struct<x:int, y:int>
  • struct<field1:bigint, field2:array<int>, field3:map<int, int>>
  • named_struct('x', 1, 'y', 2)
  • named_struct('field1', 100L, 'field2', array(1, 2), 'field3', map(1, 100, 2, 200))
Note Complex data types of MaxCompute can be nested. For more information, see ARRAY, MAP, and STRUCT.

Differences between the Hive-compatible data type edition and other data type editions

  • The conversion rules for INSERT statements are different.
    • Hive-compatible data type edition: If a source data type can be explicitly converted to the data type of a table, MaxCompute automatically inserts a conversion function and executes it.
    • MaxCompute V1.0 and MaxCompute V2.0 data type editions: A source data type must be implicitly converted to the data type of a table. Otherwise, an error is returned.
      -- The following operations succeed in Hive mode but fail in other modes:
      create table t (a bigint); 
      insert into table select 1.5; 
  • Function behavior is different.
    • +, -, *, /, and POW function
      • Hive-compatible data type edition: If the data exceeds the value range of a data type, the initial value is returned.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: If the data exceeds the value range of a data type, an error or NULL is returned.
    • >, , =, <, and
      • Hive-compatible data type edition: The values of the DOUBLE type are directly compared.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: If the values of the DOUBLE type are compared, they are considered the same if the first 15 digits to the right of the decimal point are the same. Other digits after the decimal point are not compared.
    • Bitwise operators: &, |, and ^
      • Hive-compatible data type edition: A value of the same type as the input parameter is returned.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: A value of the BIGINT type is returned.
    • LENGTH, LENGTHB, FIND_IN_SET, INSTR, SIZE, HASH, and SIGN functions
      • Hive-compatible data type edition: A value of the INT type is returned.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: A value of the BIGINT type is returned.
    • FLOOR and CEIL
      • Hive-compatible data type edition: If the input parameter is of the DECIMAL type, a value of the DECIMAL type is returned.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: If the input parameter is of the DECIMAL type, a value of the BIGINT type is returned.
    • ROUND
      • Hive-compatible data type edition: If the input parameter is of the TINYINT, SMALLINT, INT, or BIGINT type, a value of the same type as the input parameter is returned.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: If the input parameter is of the TINYINT, SMALLINT, INT, or BIGINT type, a value of the DATETIME type is returned.
    • FROM_UNIXTIME
      • Hive-compatible data type edition: A value of the STRING type is returned.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: A value of the DATETIME type is returned.
    • CONCAT_WS
      • Hive-compatible data type edition: If a connected input string is NULL, the string is ignored.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: If a connected input string is NULL, NULL is returned.
    • FIND_IN_SET
      • Hive-compatible data type edition: An empty string is considered matching the tail of the string.
        In Hive mode 
        find_in_set("","") returns 1.
        find_in_set("", "a,") returns 2.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: An empty string is considered unmatched, and 0 is returned.
    • REGEXP_(EXTRACT/REPLACE)
      • Hive-compatible data type edition: The REGEXP schema complies with the specifications of Java regular expressions.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: The REGEXP schema complies with MaxCompute specifications.