All Products
Search
Document Center

MaxCompute:Getting started

Last Updated:Jun 24, 2026

You can run PyODPS as a node in data development platforms like DataWorks. These platforms manage the PyODPS runtime and scheduling, so you do not need to create a MaxCompute entry object manually. The PyODPS DataFrame API allows for pandas-style data processing. This topic uses DataWorks to demonstrate how to get started with PyODPS in your projects.

Prerequisites

Procedure

  1. Create a PyODPS node.

    This topic uses a DataWorks PyODPS node. For details, see Develop a PyODPS 3 task.

    • This topic uses a PyODPS 3 node as an example. The underlying Python version for PyODPS 3 nodes is 3.7.

    • A PyODPS node can process a maximum of 50 MB of local data and use up to 1 GB of memory at runtime. If a task exceeds these limits, the system terminates it. Therefore, avoid writing Python code that processes large amounts of data in a PyODPS task.

    • Writing and debugging code in DataWorks is less efficient. To improve efficiency, we recommend developing your code locally in IntelliJ IDEA.

    1. Create a workflow.

      Go to the DataStudio page, right-click Business Flow, and then select Create Workflow.

    2. Create a PyODPS node.

      Right-click the new workflow, choose Create Node > MaxCompute > PyODPS 3, enter a name for the node, and then click Submit.

  2. Edit the PyODPS node.

    1. Write the code.

      Enter the test code in the PyODPS node editor. The following example demonstrates using PyODPS APIs for table operations. For more information about table operations and SQL operations, see Tables and SQL.

      from odps import ODPS
      # In DataWorks PyODPS nodes, a global variable (`o` or `odps`) is available by default as the MaxCompute entry. You can use it directly without defining it manually.
      table = o.create_table('my_new_table', 'num bigint, id string', if_not_exists=True)
      # Insert data into the non-partitioned table my_new_table.
      records = [[111, 'aaa'],
                [222, 'bbb'],
                [333, 'ccc'],
                [444, '中文']]
      o.write_table(table, records)
      # Read data from the non-partitioned table my_new_table.
      for record in o.read_table(table):
          print(record[0],record[1])
      # Read data from the table by executing an SQL statement.
      result = o.execute_sql('select * from my_new_table;',hints={'odps.sql.allow.fullscan': 'true'})
      # Read the SQL execution results.
      with result.open_reader() as reader:    
          for record in reader:            
              print(record[0],record[1])
      # Drop the table to release resources.
      table.drop()
    2. Run the code.

      After editing the code, click the 运行 icon. When the run completes, you can view the results on the Runtime Log tab. The following log output indicates success.

      2023-07-21 15:06:41 INFO ========================================================
      Executing user script with PyODPS 0.11.2.3
      Tunnel session created: <TableUploadSession xxx                                          >
      Tunnel session created: <TableDownloadSession xxx                                        >
      111 aaa
      222 bbb
      333 ccc
      444 中文
      Tunnel session created: <InstanceDownloadSession id=xxx          project_name=xxx                              >
      111 aaa
      222 bbb
      333 ccc
      444 中文
      2023-07-21 15:06:49 INFO ========================================================
      2023-07-21 15:06:49 INFO Exit code of the Shell command 0
      2023-07-21 15:06:49 INFO --- Invocation of Shell command completed ---
      2023-07-21 15:06:49 INFO Shell run successfully!
      2023-07-21 15:06:49 INFO Current task status: FINISH
      2023-07-21 15:06:49 INFO Cost time is: 7.507s
      /home/admin/alisatasknode/taskinfo/xxx              xxx        .log-END-EOF