This tutorial shows how to sort DataFrame data using the PyODPS sort() method in a DataWorks PyODPS 2 node. Each call to sort() compiles to an ORDER BY clause in MaxCompute SQL.
Prerequisites
Before you begin, ensure that you have:
Activated MaxCompute and DataWorks. See Activate MaxCompute and DataWorks and Purchase guide
Created a workflow in a DataWorks workspace. This tutorial uses a workspace in basic mode. See Create a workflow
Step 1: Create a table and import data
Download the iris dataset from iris and rename
iris.datatoiris.csv.Create a table named
pyodps_irisand uploadiris.csvto the table. See Create tables and upload data. Use the following DDL statement to create the table:CREATE TABLE if not exists pyodps_iris ( sepallength DOUBLE comment 'sepal length (cm)', sepalwidth DOUBLE comment 'sepal width (cm)', petallength DOUBLE comment 'petal length (cm)', petalwidth DOUBLE comment 'petal width (cm)', name STRING comment 'type' );
Step 2: Create a PyODPS 2 node
Log on to the DataWorks console.
In the left-side navigation pane, click Workspace.
Find your workspace and choose Shortcuts > Data Development in the Actions column.
On the DataStudio page, right-click your workflow and choose Create Node > MaxCompute > PyODPS 2.
In the Create Node dialog box, enter a name and click Confirm.
Step 3: Write and run the sorting code
On the configuration tab of the PyODPS 2 node, enter the following code in the code editor, then click the Run icon in the toolbar.
The sort() method accepts a column name string or a column reference object. All examples below call .head(5) to return only the first five rows.
Sort by a single column in ascending order
Ascending order is the default. Pass the column name as a string.
from odps.df import DataFrame
iris = DataFrame(o.get_table('pyodps_iris'))
print iris.sort('sepalwidth').head(5)Expected output:
Sql compiled:
CREATE TABLE tmp_pyodps_d1b06785_dc18_4288_ad34_de860de1be08 LIFECYCLE 1 AS
SELECT *
FROM WB_BestPractice_dev.`pyodps_iris` t1
ORDER BY sepalwidth
LIMIT 10000
Instance ID: 20191010061554817gwml0lim
sepallength sepalwidth petallength petalwidth name
0 5.0 2.0 3.5 1.0 Iris-versicolor
1 6.0 2.2 5.0 1.5 Iris-virginica
2 6.2 2.2 4.5 1.5 Iris-versicolor
3 6.0 2.2 4.0 1.0 Iris-versicolor
4 5.5 2.3 4.0 1.3 Iris-versicolorSort by a single column in descending order
Two syntax options are available. Choose based on how you reference the column:
Use
ascending=Falsewhen the column name is a string.Use a hyphen prefix (
-iris.column) when you have a column reference object.
Option 1: `ascending=False`
print iris.sort('sepalwidth', ascending=False).head(5)Expected output:
Sql compiled:
CREATE TABLE tmp_pyodps_3cb90bb2_fb95_43fb_ae84_f2b5a27d72dc LIFECYCLE 1 AS
SELECT *
FROM WB_BestPractice_dev.`pyodps_iris` t1
ORDER BY sepalwidth DESC
LIMIT 10000
Instance ID: 20191010061601287gs086792
sepallength sepalwidth petallength petalwidth name
0 5.7 4.4 1.5 0.4 Iris-setosa
1 5.5 4.2 1.4 0.2 Iris-setosa
2 5.2 4.1 1.5 0.1 Iris-setosa
3 5.8 4.0 1.2 0.2 Iris-setosa
4 5.4 3.9 1.3 0.4 Iris-setosaOption 2: Hyphen prefix on a column reference
print iris.sort(-iris.sepalwidth).head(5)Expected output:
Sql compiled:
CREATE TABLE tmp_pyodps_97b080bb_e014_48e8_a310_4b45fcd6a2ed LIFECYCLE 1 AS
SELECT *
FROM WB_BestPractice_dev.`pyodps_iris` t1
ORDER BY sepalwidth DESC
LIMIT 10000
Instance ID: 20191010061606927g6emz192
sepallength sepalwidth petallength petalwidth name
0 5.7 4.4 1.5 0.4 Iris-setosa
1 5.5 4.2 1.4 0.2 Iris-setosa
2 5.2 4.1 1.5 0.1 Iris-setosa
3 5.8 4.0 1.2 0.2 Iris-setosa
4 5.4 3.9 1.3 0.4 Iris-setosaBoth options produce the same SQL (ORDER BY sepalwidth DESC) and the same results.
Sort by multiple columns in the same direction
Pass a list of column name strings. All columns sort in ascending order by default.
print iris.sort(['sepalwidth', 'petallength']).head(5)Expected output:
Sql compiled:
CREATE TABLE tmp_pyodps_6fe37b6e_6705_4052_b733_211eb9bd16ac LIFECYCLE 1 AS
SELECT *
FROM WB_BestPractice_dev.`pyodps_iris` t1
ORDER BY sepalwidth, petallength
LIMIT 10000
Instance ID: 20191010061611714gn586792
sepallength sepalwidth petallength petalwidth name
0 5.0 2.0 3.5 1.0 Iris-versicolor
1 6.0 2.2 4.0 1.0 Iris-versicolor
2 6.2 2.2 4.5 1.5 Iris-versicolor
3 6.0 2.2 5.0 1.5 Iris-virginica
4 4.5 2.3 1.3 0.3 Iris-setosaSort by multiple columns in different directions
Two syntax options are available:
Use
ascendingwith a list of boolean values — one value per column.Use a hyphen prefix on column references for descending columns within the list.
Option 1: Boolean list
print iris.sort(['sepalwidth', 'petallength'], ascending=[True, False]).head(5)Expected output:
Sql compiled:
CREATE TABLE tmp_pyodps_a52c805c_94a1_4a75_a6af_4fc9ed06ae68 LIFECYCLE 1 AS
SELECT *
FROM WB_BestPractice_dev.`pyodps_iris` t1
ORDER BY sepalwidth, petallength DESC
LIMIT 10000
Instance ID: 20191010061616553gw3m9592
sepallength sepalwidth petallength petalwidth name
0 5.0 2.0 3.5 1.0 Iris-versicolor
1 6.0 2.2 5.0 1.5 Iris-virginica
2 6.2 2.2 4.5 1.5 Iris-versicolor
3 6.0 2.2 4.0 1.0 Iris-versicolor
4 6.3 2.3 4.4 1.3 Iris-versicolorOption 2: Hyphen prefix on column references
print iris.sort(['sepalwidth', -iris.petallength]).head(5)Expected output:
Sql compiled:
CREATE TABLE tmp_pyodps_aac5538e_9b40_4078_b3c6_852b99c663c1 LIFECYCLE 1 AS
SELECT *
FROM WB_BestPractice_dev.`pyodps_iris` t1
ORDER BY sepalwidth, petallength DESC
LIMIT 10000
Instance ID: 20191010061621329gvmkc292
sepallength sepalwidth petallength petalwidth name
0 5.0 2.0 3.5 1.0 Iris-versicolor
1 6.0 2.2 5.0 1.5 Iris-virginica
2 6.2 2.2 4.5 1.5 Iris-versicolor
3 6.0 2.2 4.0 1.0 Iris-versicolor
4 6.3 2.3 4.4 1.3 Iris-versicolorBoth options produce identical SQL and results. The boolean list is easier to read when column names are long; the hyphen prefix is more concise when you already have column reference objects.
View the run log
After the node runs, view the output on the Run Log tab.

Each sort operation generates a separate SQL statement and instance ID.