All Products
Search
Document Center

Dataphin:Data type

Last Updated:Jan 21, 2025

Dataphin currently utilizes MaxCompute 2.0 as its compute engine. This topic outlines the data types that MaxCompute 2.0 supports within Dataphin.

Scenarios

The MaxCompute V2.0 data type edition is suitable for scenarios in which your project does not contain data generated before April 2020 and depends on components that support the MaxCompute V2.0 data type edition.

Basic data types

Type

Constant example

Description

TINYINT

1Y and -127Y

An 8-bit signed integer type.

Valid values range from -128 to 127.

SMALLINT

32767S and -100S

A 16-bit signed integer type.

Valid values range from -32768 to 32767.

INT

1000 and -15645787

A 32-bit signed integer type.

Valid values range from -231 to 231-1.

BIGINT

100000000000L and -1L

A 64-bit signed integer type.

Valid values range from -263+1 to 263 -1.

BINARY

  • unhex('FA34E10293CB42848573A4E39937F479')

  • X'616263'

A binary number with a maximum length of 8 MB.

Note
  • In the format X'num [...]', num is a hexadecimal number, which can be 0~9 or A~F. For example, X'616263' represents abc because the ASCII encoding of a is 0x61, b is 0x62, and c is 0x63. X'616263' is equivalent to unhex('616263').

  • If the string length is odd, the system prepends a 0. For instance, X'616' is equivalent to X'0616'.

  • Use single quotes; double quotes are not valid. For example, X"616263" is not recognized as a BINARY constant.

FLOAT

3.14F and cast(3.14159261E+7 as float)

A 32-bit binary floating-point type.

Note

FLOAT data type calculations may result in precision loss due to computer storage and internal computing logic. For high precision requirements, consider converting FLOAT data to DECIMAL data.

DOUBLE

3.14D and 3.14159261E+7

A 64-bit binary floating-point type.

Note

DOUBLE data type calculations may result in precision loss due to computer storage and internal computing logic. For high precision requirements, consider converting DOUBLE data to DECIMAL data.

DECIMAL(precision,scale)

3.5BD and 99999999999.9999999BD

A precise numeric type based on the decimal system.

  • precision defines the maximum number of representable digits. Valid values: 1 <= precision <= 38.

  • scale defines the number of decimal places. Valid values: 0 <= scale <= 18.

The default is decimal(38,18) if precision and scale are not specified.

Note
  • DECIMAL types from different editions cannot coexist in the same table.

  • Enabling Hive compatible mode with the setproject odps.sql.hive.compatible=true; command will round the number of decimal places in the Decimal(precision, scale) type if it exceeds the scale during Tunnel Upload and SQL operations. Should the integer part surpass the limit, an error will be reported.

  • Setting odps.sql.decimal.tostring.trimzero to true removes trailing zeros after the decimal point, while setting it to false retains them. The default is true. This parameter only affects data read from tables and does not apply to static values.

VARCHAR(n)

None

A variable-length character type where n specifies the length.

Valid values: 1 to 65535.

CHAR(n)

None

The fixed-length character type is defined by n, which specifies its length, with a maximum of 255 characters. When the length is insufficient, spaces are added for padding; however, these spaces are not considered during comparisons.

STRING

"abc", 'bcd', "alibaba", and 'inc'

The string type with a maximum length of 8 MB.

DATE

DATE'2017-11-11'

The date type formatted as yyyy-mm-dd.

Valid values span from 0001-01-01 to 9999-12-31.

DATETIME

DATETIME'2017-11-11 00:00:00'

The DATETIME type.

Valid values range from 0001-01-01 00:00:00.000 to 9999-12-31 23:59:59.999, accurate to the millisecond.

TIMESTAMP

TIMESTAMP'2017-11-11 00:00:00.123456789'

The TIMESTAMP data type.

Valid values span from 0001-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999, accurate to the nanosecond.

Note

The timestamp is time zone agnostic. It stores a date offset from Epoch (UTC 1970-01-01 00:00:00) that is consistent across all time zones. TIMESTAMP data can be manipulated for time zone calculations using built-in functions, such as converting TIMESTAMP to STRING data based on the current time zone with cast(<a timestamp> as string).

TIMESTAMP_NTZ

TIMESTAMP_NTZ '2017-11-11 00:00:00.123456789'

A TIMESTAMP data type that does not consider time zones.

Valid values range from 0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999. For more details on the TIMESTAMP_NTZ data type, see the MaxCompute TIMESTAMP_NTZ data type documentation.

BOOLEAN

True and False

The BOOLEAN type.

Valid values: True and False.

Please consider the following points:

  • All the preceding data types support NULL values.

  • The INT keyword in an SQL statement refers to the 32-bit integer type.

    -- Convert a into a 32-bit integer. 
    CAST(a AS INT)
  • By default, an integer constant is processed as the INT type. For example, integer constant 1 in SELECT 1 + a; is processed as the INT type. If a constant exceeds the value range of the INT type but does not exceed the value range of the BIGINT type, the constant is processed as the BIGINT type. If the constant exceeds the value range of the BIGINT type, the constant is processed as the DOUBLE type.

  • Implicit conversions

    • Some implicit conversions are disabled. If the data type is converted from STRING to BIGINT, from STRING to DATETIME, from DOUBLE to BIGINT, from DECIMAL to DOUBLE, or from DECIMAL to BIGINT, precision may be reduced, or errors may occur. You can use the CAST function to force the data type conversions.

    • Constants of the VARCHAR type can be implicitly converted into the STRING type.

  • Tables, built-in functions, and user-defined functions (UDFs)

    • Built-in functions that require the MaxCompute V2.0 data type edition can be run.

    • The data types defined in UDFs are parsed and overloaded based on the MaxCompute V2.0 data type edition.

    • The data type of a partition key column can be STRING, VARCHAR, CHAR, TINYINT, SMALLINT, INT, or BIGINT.

  • Constants of the STRING type support concatenation. Two or more strings are automatically concatenated into one string. For example, if you execute the select 'abc' 'efg' 'ddt';, the return value is abcefgddt.

  • If a constant is inserted into a field of the DECIMAL type, the expression of the constant must conform to the format in the constant definition. Example: 3.5BD in the following sample code:

    INSERT INTO test_tb(a) VALUES (3.5BD)
  • DATETIME values do not include milliseconds. To specify the time format with millisecond precision, use the -dfp parameter in the Tunnel command, as in tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS'. For more information, see Tunnel command documentation.

Complex data types

Data typeDefinitionConstructor
ARRAY
  • array<int>
  • array<struct<a:int, b:string>>
  • array(1, 2, 3)
  • array(array(1, 2), array(3, 4))
MAP
  • map<string, string>
  • map<smallint, array<string>>
  • map("k1", "v1", "k2", "v2")
  • map(1S, array('a', 'b'), 2S, array('x', 'y'))
STRUCT
  • struct<x:int, y:int>
  • struct<field1:bigint, field2:array<int>, field3:map<int, int>>
  • named_struct('x', 1, 'y', 2)
  • named_struct('field1', 100L, 'field2', array(1, 2), 'field3', map(1, 100, 2, 200))
Note

MaxCompute supports complex data types, including nested structures. For detailed information on built-in functions, refer to ARRAY, MAP, and STRUCT.