All Products
Search
Document Center

MaxCompute:Use a PyODPS node to read data from the level-1 partition of the specified table

Last Updated:Mar 26, 2026

This topic describes how to use a PyODPS 2 node in DataWorks to query data from a level-1 partition of a MaxCompute partitioned table. Three methods are covered: synchronous SQL, asynchronous SQL, and the PyODPS DataFrame API.

This example uses the basic mode of DataWorks. When creating a workspace, Participate in Public Preview of Data Studio is disabled by default. This example does not apply to workspaces enrolled in the Data Studio public preview.

Prerequisites

Before you begin, ensure that you have:

Step 1: Prepare test data

  1. Create a partitioned table and a source table, and import data into the source table. For details, see Create tables and upload data. Use the following statements and sample data for this example.

    • Create the user_detail partitioned table:

      create table if not exists user_detail
      (
      userid    BIGINT comment 'user ID',
      job       STRING comment 'job type',
      education STRING comment 'education level'
      ) comment 'user information table'
      partitioned by (dt STRING comment 'date',region STRING comment 'region');
    • Create the user_detail_ods source table:

      create table if not exists user_detail_ods
      (
        userid    BIGINT comment 'user ID',
        job       STRING comment 'job type',
        education STRING comment 'education level',
        dt STRING comment 'date',
        region STRING comment 'region'
      );
    • Create a file named user_detail.txt with the following content and import it into user_detail_ods:

      0001,Internet,bachelor,20190715,beijing
      0002,education,junior college,20190716,beijing
      0003,finance,master,20190715,shandong
      0004,Internet,master,20190715,beijing
  2. Right-click the workflow and choose Create Node > MaxCompute > ODPS SQL.

  3. In the Create Node dialog box, specify Name and click Confirm.

  4. In the code editor on the configuration tab, enter the following SQL to insert data from user_detail_ods into the partitioned table:

    insert overwrite table user_detail partition (dt,region)
    select userid,job,education,dt,region from user_detail_ods;
  5. Click the Run icon in the toolbar to insert the data from the user_detail_ods table into the user_detail partitioned table.

Step 2: Read data from the level-1 partition

Choose the method that fits your use case:

MethodAPIUse when
Synchronous modeo.execute_sql().open_reader()Query runs quickly; you want results immediately
Asynchronous modeo.run_sql() + wait_for_success() + open_reader()Query is long-running; you want non-blocking submission
PyODPS DataFrameDataFrame(o.get_table().get_partition())You need DataFrame operations after reading
  1. Log on to the DataWorks console.

  2. In the left-side navigation pane, click Workspace.

  3. Find your workspace, then choose Shortcuts > Data Development in the Actions column.

  4. On the DataStudio page, right-click the workflow and choose Create Node > MaxCompute > PyODPS 2.

  5. In the Create Node dialog box, specify Name and click Confirm.

  6. In the code editor on the configuration tab, enter the following code:

    import sys
    reload(sys)
    # Set UTF-8 as the default encoding format.
    sys.setdefaultencoding('utf8')
    
    # Method 1: Synchronous mode — use for short-running queries.
    with o.execute_sql('select * from user_detail WHERE dt=\'20190715\'').open_reader() as reader4:
        print reader4.raw
        for record in reader4:
            print record["userid"],record["job"],record["education"]
    
    # Method 2: Asynchronous mode — use for long-running queries.
    # The call returns immediately; wait_for_success() blocks until the query completes.
    instance = o.run_sql('select * from user_detail WHERE dt=\'20190715\'')
    instance.wait_for_success()
    for record in instance.open_reader():
        print record["userid"],record["job"],record["education"]
    
    # Method 3: PyODPS DataFrame — use when you need DataFrame operations after reading.
    pt_df = DataFrame(o.get_table('user_detail').get_partition('dt=20190715'))
    print pt_df.head(10)

    All three methods filter on the level-1 partition key dt to read only the rows where dt='20190715'.

  7. Click the Run icon in the toolbar.

  8. View the output on the Run Log tab.