The Hive-compatible data type edition is designed for MaxCompute projects migrated from Hadoop whose dependent components already support the MaxCompute V2.0 data type edition. It provides data types and function behavior that closely match Hive, reducing migration friction. Understanding where this edition diverges from MaxCompute V1.0 and V2.0 helps you anticipate type-related issues before they reach production.
Enable the Hive-compatible data type edition
Run all three commands at the project level.
setproject odps.sql.type.system.odps2=true; -- Enable MaxCompute V2.0 data types.
setproject odps.sql.decimal.odps2=true; -- Enable the DECIMAL data type in MaxCompute V2.0.
setproject odps.sql.hive.compatible=true; -- Enable Hive-compatible data types.Supported data types
Type overview
Integer types: TINYINT (8-bit), SMALLINT (16-bit), INT (32-bit), BIGINT (64-bit)
Floating-point types: FLOAT (32-bit), DOUBLE (64-bit), DECIMAL(precision,scale) (exact numeric)
String types: STRING, VARCHAR(n), CHAR(n)
Date and time types: DATE, DATETIME, TIMESTAMP
Other types: BOOLEAN, BINARY
Complex types: ARRAY, MAP, STRUCT (all nestable)
Basic data types
Integer types
| Data type | Constant syntax | Range |
|---|---|---|
| TINYINT | 1Y, -127Y | -128 to 127 (8-bit signed) |
| SMALLINT | 32767S, -100S | -32,768 to 32,767 (16-bit signed) |
| INT | 1000, -15645787 | -2<sup>31</sup> to 2<sup>31</sup> - 1 (32-bit signed) |
| BIGINT | 100000000000L, -1L | -2<sup>63</sup> + 1 to 2<sup>63</sup> - 1 (64-bit signed) |
Integer constants use type suffixes:Yfor TINYINT,Sfor SMALLINT,Lfor BIGINT. Unsuffixed integer literals default to INT. If the value exceeds the INT range but fits in BIGINT, it is treated as BIGINT. If it exceeds the BIGINT range, it is treated as DOUBLE.
Floating-point types
| Data type | Constant syntax | Description |
|---|---|---|
| FLOAT | cast(3.14159261E+7 as float) | 32-bit binary floating-point |
| DOUBLE | 3.14159261E+7 | 64-bit binary floating-point |
| DECIMAL(precision,scale) | 3.5BD, 99999999999.9999999BD | Exact numeric type. Default: decimal(10,0). precision: 1–38; scale: 0–18 |
DECIMAL constants use the BD suffix. When inserting into a DECIMAL column:
insert into test_tb(a) values (3.5BD);To extend the DECIMAL scale range from 0–18 to 0–38:
set odps.sql.decimal2.extended.scale.enable=true;String types
| Data type | Constant syntax | Description |
|---|---|---|
| STRING | "abc", 'bcd' | Max length: 8 MB |
| VARCHAR(n) | None | Variable-length. n: 1–65,535 |
| CHAR(n) | None | Fixed-length. Max n: 255. Padded with spaces if shorter; padding is ignored in comparisons |
String constants can be concatenated directly. For example, abc and xyz combine to abcxyz.
Date and time types
| Data type | Constant syntax | Range |
|---|---|---|
| DATE | DATE'2017-11-11' | 0000-01-01 to 9999-12-31 (format: yyyy-mm-dd) |
| DATETIME | DATETIME'2017-11-11 00:00:00' | 0000-01-01 00:00:00.000 to 9999-12-31 23:59:59.999 (millisecond precision) |
| TIMESTAMP | TIMESTAMP'2017-11-11 00:00:00.123456789' | 0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999 (nanosecond precision, timezone-independent) |
DATETIME values do not include the millisecond component. To include milliseconds when uploading data with Tunnel, use the -dfp flag:
tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS'For more information, see Tunnel commands.
TIMESTAMP is timezone-independent, but functions such as cast(<a timestamp> as string) display values based on the current session time zone.
Other types
| Data type | Constant syntax | Description |
|---|---|---|
| BOOLEAN | True, False | Boolean values |
| BINARY | unhex('FA34E10293CB42848573A4E39937F479') | Binary data. Max length: 8 MB |
Partition key columns
The data type of a partition key column can be STRING, VARCHAR, CHAR, TINYINT, SMALLINT, INT, or BIGINT.
NULL values
All basic data types support NULL values.
Implicit conversions
The following conversions are disabled because they can reduce precision or cause errors:
| Source type | Target type | Workaround |
|---|---|---|
| STRING | BIGINT | Use CAST(value AS BIGINT) |
| STRING | DATETIME | Use CAST(value AS DATETIME) |
| DOUBLE | BIGINT | Use CAST(value AS BIGINT) |
| DECIMAL | DOUBLE | Use CAST(value AS DOUBLE) |
| DECIMAL | BIGINT | Use CAST(value AS BIGINT) |
VARCHAR constants can be implicitly converted to STRING.
Complex data types
MaxCompute supports three complex data types in the Hive-compatible edition. Complex types can be nested.
| Data type | Definition examples | Constructor examples |
|---|---|---|
| ARRAY | array<int> / array<struct<a:int, b:string>> | array(1, 2, 3) / array(array(1, 2), array(3, 4)) |
| MAP | map<string, string> / map<smallint, array<string>> | map("k1", "v1", "k2", "v2") / map(1S, array('a', 'b'), 2S, array('x', 'y')) |
| STRUCT | struct<x:int, y:int> / struct<field1:bigint, field2:array<int>, field3:map<int, int>> | named_struct('x', 1, 'y', 2) / named_struct('field1', 100L, 'field2', array(1, 2), 'field3', map(1, 100, 2, 200)) |
For built-in functions that work with complex types, see ARRAY, MAP, and STRUCT.
Differences from other editions
The Hive-compatible edition diverges from the MaxCompute V1.0 and V2.0 editions in INSERT rules, operator behavior, and function return types. Review these differences carefully when migrating from Hadoop or when running the same SQL across multiple editions.
INSERT type conversion rules
In Hive-compatible mode, if a source data type can be explicitly converted to the target column type, MaxCompute automatically inserts a conversion function and runs it. In V1.0 and V2.0 modes, the source type must be implicitly convertible; otherwise an error is returned.
The following succeeds in Hive-compatible mode but fails in V1.0 and V2.0 modes:
create table t (a bigint);
insert into table select 1.5;Arithmetic operator and overflow behavior
For the +, -, *, /, and POW operators:
| Edition | Overflow behavior |
|---|---|
| Hive-compatible | Returns the initial value |
| V1.0 and V2.0 | Returns an error; in other modes, NULL is returned |
DOUBLE comparison behavior
For the >, >=, =, <, and <= operators applied to DOUBLE values:
| Edition | Comparison behavior |
|---|---|
| Hive-compatible | Compares all digits directly |
| V1.0 and V2.0 | Compares only the first 15 digits to the right of the decimal point; remaining digits are ignored |
Bitwise operator return types
For the &, |, and ^ operators:
| Edition | Return type |
|---|---|
| Hive-compatible | Same type as the input |
| V1.0 and V2.0 | Always BIGINT |
Function return type differences
| Function | Hive-compatible | V1.0 and V2.0 |
|---|---|---|
| LENGTH, LENGTHB, FIND_IN_SET, INSTR, SIZE, HASH, SIGN | INT | BIGINT |
| FLOOR, CEIL (when input is DECIMAL) | DECIMAL | BIGINT |
| FROM_UNIXTIME | STRING | DATETIME |
CONCAT_WS behavior with NULL inputs
| Edition | Behavior |
|---|---|
| Hive-compatible | NULL input strings are ignored; remaining strings are concatenated |
| V1.0 and V2.0 | If any input string is NULL, NULL is returned |
FIND_IN_SET behavior with empty strings
In Hive-compatible mode, an empty string matches the tail of the search string:
find_in_set("","") -- Returns 1
find_in_set("", "a,") -- Returns 2In V1.0 and V2.0 modes, an empty string is treated as unmatched and 0 is returned.
REGEXP specification
| Edition | Specification |
|---|---|
| Hive-compatible | Java regular expression specifications |
| V1.0 and V2.0 | MaxCompute specifications |
SUBSTR start position behavior
string substr(string <str>, bigint <start_position>[, bigint <length>])| Edition | Behavior when start_position = 0 |
|---|---|
| Hive-compatible | Same result as start_position = 1 |
| V1.0 and V2.0 | Returns NULL |
Partition pruning limitations
Specific functions do not support partition pruning in Hive-compatible mode. For the full list, see Mappings between built-in functions of MaxCompute and built-in functions of Hive, MySQL, and Oracle.
User-defined functions (UDFs)
In Hive-compatible mode, UDF data types are parsed and overloaded based on Hive-compatible data types. Built-in functions that require the MaxCompute V2.0 data type edition are also available.