Hive-compatible data type edition - MaxCompute - Alibaba Cloud Documentation Center

The Hive-compatible data type edition is designed for MaxCompute projects migrated from Hadoop whose dependent components already support the MaxCompute V2.0 data type edition. It provides data types and function behavior that closely match Hive, reducing migration friction. Understanding where this edition diverges from MaxCompute V1.0 and V2.0 helps you anticipate type-related issues before they reach production.

Enable the Hive-compatible data type edition

Run all three commands at the project level.

setproject odps.sql.type.system.odps2=true;  -- Enable MaxCompute V2.0 data types.
setproject odps.sql.decimal.odps2=true;       -- Enable the DECIMAL data type in MaxCompute V2.0.
setproject odps.sql.hive.compatible=true;     -- Enable Hive-compatible data types.

Supported data types

Type overview

Integer types: TINYINT (8-bit), SMALLINT (16-bit), INT (32-bit), BIGINT (64-bit)

Floating-point types: FLOAT (32-bit), DOUBLE (64-bit), DECIMAL(precision,scale) (exact numeric)

String types: STRING, VARCHAR(n), CHAR(n)

Date and time types: DATE, DATETIME, TIMESTAMP

Other types: BOOLEAN, BINARY

Complex types: ARRAY, MAP, STRUCT (all nestable)

Basic data types

Integer types

Data type	Constant syntax	Range
TINYINT	`1Y`, `-127Y`	-128 to 127 (8-bit signed)
SMALLINT	`32767S`, `-100S`	-32,768 to 32,767 (16-bit signed)
INT	`1000`, `-15645787`	-2<sup>31</sup> to 2<sup>31</sup> - 1 (32-bit signed)
BIGINT	`100000000000L`, `-1L`	-2<sup>63</sup> + 1 to 2<sup>63</sup> - 1 (64-bit signed)

Integer constants use type suffixes: Y for TINYINT, S for SMALLINT, L for BIGINT. Unsuffixed integer literals default to INT. If the value exceeds the INT range but fits in BIGINT, it is treated as BIGINT. If it exceeds the BIGINT range, it is treated as DOUBLE.

Floating-point types

Data type	Constant syntax	Description
FLOAT	`cast(3.14159261E+7 as float)`	32-bit binary floating-point
DOUBLE	`3.14159261E+7`	64-bit binary floating-point
DECIMAL(precision,scale)	`3.5BD`, `99999999999.9999999BD`	Exact numeric type. Default: `decimal(10,0)`. precision: 1–38; scale: 0–18

DECIMAL constants use the BD suffix. When inserting into a DECIMAL column:

insert into test_tb(a) values (3.5BD);

To extend the DECIMAL scale range from 0–18 to 0–38:

set odps.sql.decimal2.extended.scale.enable=true;

String types

Data type	Constant syntax	Description
STRING	`"abc"`, `'bcd'`	Max length: 8 MB
VARCHAR(n)	None	Variable-length. n: 1–65,535
CHAR(n)	None	Fixed-length. Max n: 255. Padded with spaces if shorter; padding is ignored in comparisons

String constants can be concatenated directly. For example, abc and xyz combine to abcxyz.

Date and time types

Data type	Constant syntax	Range
DATE	`DATE'2017-11-11'`	0000-01-01 to 9999-12-31 (format: `yyyy-mm-dd`)
DATETIME	`DATETIME'2017-11-11 00:00:00'`	0000-01-01 00:00:00.000 to 9999-12-31 23:59:59.999 (millisecond precision)
TIMESTAMP	`TIMESTAMP'2017-11-11 00:00:00.123456789'`	0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999 (nanosecond precision, timezone-independent)

DATETIME values do not include the millisecond component. To include milliseconds when uploading data with Tunnel, use the -dfp flag:

tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS'

For more information, see Tunnel commands.

TIMESTAMP is timezone-independent, but functions such as cast(<a timestamp> as string) display values based on the current session time zone.

Other types

Data type	Constant syntax	Description
BOOLEAN	`True`, `False`	Boolean values
BINARY	`unhex('FA34E10293CB42848573A4E39937F479')`	Binary data. Max length: 8 MB

Partition key columns

The data type of a partition key column can be STRING, VARCHAR, CHAR, TINYINT, SMALLINT, INT, or BIGINT.

NULL values

All basic data types support NULL values.

Implicit conversions

The following conversions are disabled because they can reduce precision or cause errors:

Source type	Target type	Workaround
STRING	BIGINT	Use `CAST(value AS BIGINT)`
STRING	DATETIME	Use `CAST(value AS DATETIME)`
DOUBLE	BIGINT	Use `CAST(value AS BIGINT)`
DECIMAL	DOUBLE	Use `CAST(value AS DOUBLE)`
DECIMAL	BIGINT	Use `CAST(value AS BIGINT)`

VARCHAR constants can be implicitly converted to STRING.

Complex data types

MaxCompute supports three complex data types in the Hive-compatible edition. Complex types can be nested.

Data type	Definition examples	Constructor examples
ARRAY	`array<int>` / `array<struct<a:int, b:string>>`	`array(1, 2, 3)` / `array(array(1, 2), array(3, 4))`
MAP	`map<string, string>` / `map<smallint, array<string>>`	`map("k1", "v1", "k2", "v2")` / `map(1S, array('a', 'b'), 2S, array('x', 'y'))`
STRUCT	`struct<x:int, y:int>` / `struct<field1:bigint, field2:array<int>, field3:map<int, int>>`	`named_struct('x', 1, 'y', 2)` / `named_struct('field1', 100L, 'field2', array(1, 2), 'field3', map(1, 100, 2, 200))`

For built-in functions that work with complex types, see ARRAY, MAP, and STRUCT.

Differences from other editions

The Hive-compatible edition diverges from the MaxCompute V1.0 and V2.0 editions in INSERT rules, operator behavior, and function return types. Review these differences carefully when migrating from Hadoop or when running the same SQL across multiple editions.

INSERT type conversion rules
- In Hive-compatible mode, if a source data type can be explicitly converted to the target column type, MaxCompute automatically inserts a conversion function and runs it.
- In V1.0 and V2.0 modes, the source type must be implicitly convertible; otherwise an error is returned.
  The following succeeds in Hive-compatible mode but fails in V1.0 and V2.0 modes:
```
CREATE TABLE t (a BIGINT);
INSERT INTO TABLE SELECT 1.5;
```
Operator and function behavioral differences across data type editions
- For the +, -, *, /, and POW operators:
  - Hive-compatible:Returns the initial value
  - V1.0 and V2.0:Returns an error; in other modes, NULL is returned
- For the >, >=, =, <, and <= operators applied to DOUBLE values:
  - Hive-compatible:Compares all digits directly
  - V1.0 and V2.0:Compares only the first 15 digits to the right of the decimal point; remaining digits are ignored
- For the &, |, and ^ operators:
  - Hive-compatible:Same type as the input
  - V1.0 and V2.0:Always BIGINT
- For function like:LENGTH,LENGTHB,FIND_IN_SET,INSTR,SIZE,HASH,SIGN
  - Hive-compatible:Returns a INT value.
  - V1.0 and V2.0:Returns a BIGINT value.
- FLOOR,CEIL
  - Hive-compatible: If the input parameter is of the DECIMAL type, a DECIMAL value is returned.
  - V1.0 and V2.0: If the input parameter is of the DECIMAL type, a BIGINT value is returned.
- FROM_UNIXTIME
  - Hive-compatible:Returns a STRING value.
  - V1.0 and V2.0:Returns a DATETIME value.
- CONCAT_WS
  - Hive-compatible:NULL input strings are ignored; remaining strings are concatenated.
  - V1.0 and V2.0:If any input string is NULL, NULL is returned.
- FIND_IN_SET
  - Hive-compatible:an empty string matches the tail of the search string:
```
find_in_set("","")    -- Returns 1
find_in_set("", "a,") -- Returns 2
```
  - V1.0 and V2.0:An empty string is treated as unmatched and 0 is returned.
- REGEXP_(EXTRACT/REPLACE)
  - Hive-compatible:Java regular expression specifications
  - V1.0 and V2.0:MaxCompute specifications
- SUBSTR
```
STRING SUBSTR(STRING <string>, BIGINT <start_position>[, BIGINT <length>])
```
  start_position: Required. BIGINT type. The default start position is 1.
  - Hive-compatible data type edition: When start_position is 0, the behavior is the same as when the start position is 1.
  - 1.0 and 2.0 data type editions: When start_position is 0, an empty string is returned.
  - BigQuery-compatible data type edition: When start_position is 0, the behavior is the same as when the start position is 1.
  - BigQuery-compatible data type edition: When start_position is less than the negative length of the string, the behavior is the same as when the start position is 1.