SequenceExpr represents a single column in a PyODPS DataFrame.
SequenceExpr objects cannot be created directly. Retrieve one from a DataFrame (collection object).
Prerequisites
Before you begin, ensure that you have:
-
A sample table named pyodps_iris. See DataFrame data processing.
-
A DataFrame object. See Create a DataFrame object.
Retrieve a column
Two syntaxes are supported:
-
collection.column_name— works when the column name is a valid Python identifier. -
df[column_name]— works for any column name, including names stored in variables.
# Attribute access
print(iris.sepallength.head(5))
# Bracket access — use when the column name is in a variable
col = 'sepallength'
print(iris[col].head(5))
Both return the same result:
sepallength
0 4.9
1 4.7
2 4.6
3 5.0
4 5.4
Column types
DataFrame has its own type system. When a DataFrame is initialized from a MaxCompute table, MaxCompute data types are automatically mapped to DataFrame types. This abstraction lets the same DataFrame code run across multiple execution backends: MaxCompute SQL, Pandas, MySQL, and PostgreSQL.
Type mappings
| MaxCompute type | DataFrame type | Notes |
|---|---|---|
| BIGINT | INT64 | |
| DOUBLE | FLOAT64 | |
| STRING | STRING | |
| DATETIME | DATETIME | |
| BOOLEAN | BOOLEAN | |
| DECIMAL | DECIMAL | |
| ARRAY\<VALUE_TYPE\> | LIST\<VALUE_TYPE\> | Element type must be specified |
| MAP\<KEY_TYPE, VALUE_TYPE\> | DICT\<KEY_TYPE, VALUE_TYPE\> | Element types must be specified |
To enable extended type support, set options.sql.use_odps2_extension=True:
| MaxCompute type | DataFrame type |
|---|---|
| TINYINT | INT8 |
| SMALLINT | INT16 |
| INT | INT32 |
| FLOAT | FLOAT32 |
Usage notes
-
LIST and DICT types: Always specify element types when working with LIST or DICT columns. Omitting them raises an error.
-
Unsupported types: TIMESTAMP and STRUCT (introduced in MaxCompute V2.0) are not supported in DataFrame.
Check column type
Use sequence.dtype to inspect the data type of a column:
print(iris.sepallength.dtype)
Output:
FLOAT64
Convert column type
Use astype to cast a column to a different type. It returns a new sequence object with the converted type.
print(iris.sepallength.astype('int').head(5))
Output:
sepallength
0 4
1 4
2 4
3 5
4 5
Column names
Every sequence object must have a column name. DataFrame automatically generates names for common operations — for example, sepalwidth.max() produces a column named sepalwidth_max.
print(iris.groupby('name').sepalwidth.max().head(5))
Output:
sepalwidth_max
0 4.4
1 3.4
2 3.8
When DataFrame cannot determine a name automatically — such as when you add a scalar to a sequence — it keeps the original column name. When two columns are involved in a calculation, no name can be inferred, and you must call rename explicitly.
Use rename to assign a name to any sequence:
print(iris.sepalwidth.rename('sepal_width').head(5))
Output:
sepal_width
0 3.0
1 3.2
2 3.1
3 3.6
4 3.9
Column calculations
Arithmetic operations on a sequence produce a new sequence. For numeric columns, all standard arithmetic operators are supported. For string columns, only string concatenation is supported.
# Add a scalar to a column
print((iris.sepallength + 5).head(5))
Output:
sepallength
0 9.9
1 9.7
2 9.6
3 10.0
4 10.4
When an operation involves two columns, PyODPS cannot determine a result column name. Call rename to name the result:
print((iris.sepallength + iris.sepalwidth).rename('sum_sepal').head(5))
Output:
sum_sepal
0 7.9
1 7.9
2 7.7
3 8.6
4 9.3
For more operations, see Column operations.