All Products
Search
Document Center

MaxCompute:Use a PyODPS node to query data based on specific criteria

Last Updated:Mar 26, 2026

PyODPS provides three ways to filter rows from a MaxCompute table inside a DataWorks PyODPS node: raw SQL via execute_sql(), DataFrame boolean indexing, and the query() string method. This tutorial walks through all three using the Iris dataset.

Prerequisites

Before you begin, ensure that you have:

Step 1: Prepare the dataset

  1. Download the Iris dataset, decompress the archive, and rename iris.data to iris.csv.

  2. Create a table named pyodps_iris and upload iris.csv. For details, see Create a table and upload data. Use the following DDL to create the table:

    CREATE TABLE if not exists pyodps_iris
    (
    sepallength  DOUBLE comment 'sepal length (cm)',
    sepalwidth   DOUBLE comment 'sepal width (cm)',
    petallength  DOUBLE comment 'petal length (cm)',
    petalwidth   DOUBLE comment 'petal width (cm)',
    name         STRING comment 'type'
    );

Step 2: Create a PyODPS node

  1. Log on to the DataWorks console.

  2. In the left navigation pane, click Workspace.

  3. In the Actions column, choose Shortcuts > Data Development.

  4. On the Data Development page, right-click the flow and select Create Node > MaxCompute > PyODPS 2.

  5. In the Create Node dialog box, enter a node name and click Confirm.

Step 3: Write the query code

Choose the method that fits your use case:

Method Best for
execute_sql() Existing SQL queries, full result sets, complex joins
DataFrame boolean indexing (iris[condition]) Chaining multiple filter steps in Python
query() Readable multi-condition filters written as a string

Enter the following code in the PyODPS node. All three methods filter rows from the same pyodps_iris table.

import sys
reload(sys)
# Change the default encoding of the system.
sys.setdefaultencoding('utf8')

iris = DataFrame(o.get_table('pyodps_iris'))

# Method 1: Run a SQL query directly.
# Equivalent SQL: SELECT * FROM pyodps_iris WHERE sepallength > 5
with o.execute_sql('select * from pyodps_iris WHERE sepallength > 5 ').open_reader() as reader4:
    print reader4.raw
    for record in reader4:
        print record["sepallength"],record["sepalwidth"],record["petallength"],record["petalwidth"],record["name"]

# Method 2: Filter using DataFrame boolean indexing.
# The expression iris.sepallength > 5 produces a boolean series;
# only rows where the value is True are returned.
print iris[iris.sepallength > 5].head(5)

# Method 3: Filter using the query() string method.
print iris.query("(sepallength < 5) and (petallength > 1.5)").head(5)
Methods 2 and 3 use the PyODPS DataFrame API, which translates Python expressions into SQL and submits them as MaxCompute jobs. Calling .head(n) triggers execution and returns the first n rows.

Step 4: Run the node and view results

  1. Click Run.

    Click the Run button

  2. View the output in the Log panel. Method 1 prints each matching record and shows the compiled SQL:

    Executing user script with PyODPS 0.8.0
    
    "sepallength","sepalwidth","petallength","petalwidth","name"
    5.1,3.5,1.4,0.2,"Iris-setosa"
    5.4,3.9,1.7,0.4,"Iris-setosa"
    ...
    
    Sql compiled:
    CREATE TABLE tmp_pyodps_xxxx LIFECYCLE 1 AS
    SELECT *
    FROM xxx.`pyodps_iris` t1
    WHERE t1.`sepallength` > 5

    Method 2 prints the top 5 rows as a DataFrame table:

       sepallength  sepalwidth  petallength  petalwidth         name
    0          5.1         3.5          1.4         0.2  Iris-setosa
    1          5.4         3.9          1.7         0.4  Iris-setosa
    2          5.4         3.7          1.5         0.2  Iris-setosa
    3          5.8         4.0          1.2         0.2  Iris-setosa
    4          5.7         4.4          1.5         0.4  Iris-setosa

    Method 3 prints the top 5 rows matching both conditions:

       sepallength  sepalwidth  petallength  petalwidth             name
    0          4.8         3.4          1.6         0.2      Iris-setosa
    1          4.8         3.4          1.9         0.2      Iris-setosa
    2          4.7         3.2          1.6         0.2      Iris-setosa
    3          4.8         3.1          1.6         0.2      Iris-setosa
    4          4.9         2.4          3.3         1.0  Iris-versicolor