All Products
Search
Document Center

MaxCompute:Sequence

Last Updated:Mar 26, 2026

SequenceExpr represents a single column in a PyODPS DataFrame.

SequenceExpr objects cannot be created directly. Retrieve one from a DataFrame (collection object).

Prerequisites

Before you begin, ensure that you have:

Retrieve a column

Two syntaxes are supported:

  • collection.column_name — works when the column name is a valid Python identifier.

  • df[column_name] — works for any column name, including names stored in variables.

# Attribute access
print(iris.sepallength.head(5))

# Bracket access — use when the column name is in a variable
col = 'sepallength'
print(iris[col].head(5))

Both return the same result:

   sepallength
0          4.9
1          4.7
2          4.6
3          5.0
4          5.4

Column types

DataFrame has its own type system. When a DataFrame is initialized from a MaxCompute table, MaxCompute data types are automatically mapped to DataFrame types. This abstraction lets the same DataFrame code run across multiple execution backends: MaxCompute SQL, Pandas, MySQL, and PostgreSQL.

Type mappings

MaxCompute type DataFrame type Notes
BIGINT INT64
DOUBLE FLOAT64
STRING STRING
DATETIME DATETIME
BOOLEAN BOOLEAN
DECIMAL DECIMAL
ARRAY\<VALUE_TYPE\> LIST\<VALUE_TYPE\> Element type must be specified
MAP\<KEY_TYPE, VALUE_TYPE\> DICT\<KEY_TYPE, VALUE_TYPE\> Element types must be specified

To enable extended type support, set options.sql.use_odps2_extension=True:

MaxCompute type DataFrame type
TINYINT INT8
SMALLINT INT16
INT INT32
FLOAT FLOAT32

Usage notes

  • LIST and DICT types: Always specify element types when working with LIST or DICT columns. Omitting them raises an error.

  • Unsupported types: TIMESTAMP and STRUCT (introduced in MaxCompute V2.0) are not supported in DataFrame.

Check column type

Use sequence.dtype to inspect the data type of a column:

print(iris.sepallength.dtype)

Output:

FLOAT64

Convert column type

Use astype to cast a column to a different type. It returns a new sequence object with the converted type.

print(iris.sepallength.astype('int').head(5))

Output:

   sepallength
0            4
1            4
2            4
3            5
4            5

Column names

Every sequence object must have a column name. DataFrame automatically generates names for common operations — for example, sepalwidth.max() produces a column named sepalwidth_max.

print(iris.groupby('name').sepalwidth.max().head(5))

Output:

   sepalwidth_max
0             4.4
1             3.4
2             3.8

When DataFrame cannot determine a name automatically — such as when you add a scalar to a sequence — it keeps the original column name. When two columns are involved in a calculation, no name can be inferred, and you must call rename explicitly.

Use rename to assign a name to any sequence:

print(iris.sepalwidth.rename('sepal_width').head(5))

Output:

   sepal_width
0          3.0
1          3.2
2          3.1
3          3.6
4          3.9

Column calculations

Arithmetic operations on a sequence produce a new sequence. For numeric columns, all standard arithmetic operators are supported. For string columns, only string concatenation is supported.

# Add a scalar to a column
print((iris.sepallength + 5).head(5))

Output:

   sepallength
0          9.9
1          9.7
2          9.6
3         10.0
4         10.4

When an operation involves two columns, PyODPS cannot determine a result column name. Call rename to name the result:

print((iris.sepallength + iris.sepalwidth).rename('sum_sepal').head(5))

Output:

   sum_sepal
0        7.9
1        7.9
2        7.7
3        8.6
4        9.3

For more operations, see Column operations.