This topic describes how to use a PyODPS node to read data from a partitioned table.
Prerequisites
The following operations are completed:
- MaxCompute is activated. For more information, see Activate MaxCompute.
- DataWorks is activated. For more information, see Activate DataWorks.
- A workflow is created in the DataWorks console. In this example, a workflow is created
for a DataWorks workspace in basic mode. For more information, see Create a workflow.
Procedure
- Prepare test data.
- Create a partitioned table and a source table, and import data to the source table.
For more information, see Create tables and import data.
In this example, use the following table creation statements and source data.
- Execute the following statement to create a partitioned table named user_detail:
create table if not exists user_detail
(
userid BIGINT comment 'user ID',
job STRING comment 'job type',
education STRING comment 'education level'
) comment 'user information table'
partitioned by (dt STRING comment 'date',region STRING comment 'region');
- Execute the following statement to create a source table named user_detail_ods:
create table if not exists user_detail_ods
(
userid BIGINT comment 'user ID',
job STRING comment 'job type',
education STRING comment 'education level',
dt STRING comment 'date',
region STRING comment 'region'
);
- Create a source data file named user_detail.txt and save the following data to the file. Import the data to the user_detail_ods table.
0001,Internet,bachelor,20190715,beijing
0002,education,junior college,20190716,beijing
0003,finance,master,20190715,shandong
0004,Internet,master,20190715,beijing
- Log on to the DataWorks console and click Workspaces in the left-side navigation pane.
On the Workspaces page, find the target workspace and click Data Analytics in the
Actions column. On the Data Development tab, right-click the target workflow in the
Business process section and choose .
- In the New node dialog box, set the Node name parameter and click Submit.
- On the configuration tab of the ODPS SQL node, enter the following code in the code
editor:
insert overwrite table user_detail partition (dt,region)
select userid,job,education,dt,region from user_detail_ods;
- Click the Run icon in the toolbar to insert the data from the user_detail_ods table to the user_detail table.

- Use a PyODPS node to read data from a partitioned table.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
- On the Data Development tab, right-click the target workflow in the Business process
section and choose .
- In the New node dialog box, set the Node name parameter and click Submit.
- On the configuration tab of the PyODPS 2 node, enter the following code in the code
editor:
import sys
from odps import ODPS
reload(sys)
print('dt=' + args['dt'])
# Set UTF-8 as the default encoding format.
sys.setdefaultencoding('utf8')
# Obtain the partitioned table.
t = o.get_table('user_detail')
# Check whether the specified partition exists.
print t.exist_partition('dt=20190715,region=beijing')
# View all partitions in the partitioned table.
for partition in t.partitions:
print partition.name
# You can use one of the following methods to query data in the partitioned table:
# Method 1
with t.open_reader(partition='dt=20190715,region=beijing') as reader1:
count = reader1.count
print("Query data in the partitioned table by using Method 1:")
for record in reader1:
print record[0],record[1],record[2]
# Method 2
print("Query data in the partitioned table by using Method 2:")
reader2 = t.open_reader(partition='dt=20190715,region=beijing')
for record in reader2:
print record["userid"],record["job"],record["education"]
# Method 3
print("Query data in the partitioned table by using Method 3:")
for record in o.read_table('user_detail', partition='dt=20190715,region=beijing'):
print record["userid"],record["job"],record["education"]
- Click Advanced run (run with parameters) in the toolbar.

- In the Parameters dialog box, set the parameters and click Confirm.
Parameter description:
- Scheduling Resource Group: Set this parameter to Common scheduler resource group.
- dt: Set this parameter to dt=20190715.
- View the running result of the PyODPS 2 node on the Run Log tab.
