All Products
Search
Document Center

MaxCompute:Use PyODPS in a local environment

Last Updated:Dec 02, 2025

PyODPS is a Python SDK from Alibaba Cloud that allows developers to use MaxCompute for big data processing and analysis. This topic describes how to use PyODPS in a local environment to perform table operations, load data, and run SQL queries.

Prerequisites

  • PyODPS is installed locally, and the environment variables are set.

  • The examples in this document demonstrate basic operations using the pyodps_iris table. To prepare this table, see Usage examples for instructions on how to download the dataset, create the pyodps_iris table, and write data to it.

Procedure

  1. Open a Python editor and create a Python file.

    Note

    If a Python editor is not installed locally, you can create a file with a .py extension.

  2. Develop the PyODPS task code.

    After you create the file, you can refer to the following sections for simple examples that demonstrate the main features of PyODPS.

    For more information about using PyODPS, see Overview of basic operations and DataFrame (Not recommended). You can also refer to the Use a PyODPS node for Jieba Chinese word segmentation topic for a simple end-to-end operation example.

  3. Save the Python file and run it locally.

Initialize the ODPS entry point

Important

Manually define the ODPS entry point. The following code provides an example.

import os
from odps import ODPS

o = ODPS(
    # (Recommended) Make sure that the environment variables are set.
    # Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is set to your AccessKey ID.
    access_id=os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
    # Make sure that the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is set to your AccessKey secret.
    secret_access_key=os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
    
    # (Not recommended) If the environment variables are not set, use the following format instead.
    # access_id='your-aliyun-access-key-id',
    # secret_access_key= 'your-aliyun-access-key-secret',
    
    project='your-default-project',
    endpoint='your-end-point',
)

Parameter

Description

your-default-project

Enter the name of your MaxCompute project. Log on to the MaxCompute console to obtain a list of projects.

your-end-point

Enter the Endpoint for your region and network environment.

For example, for the public Endpoint in the China (Hangzhou) region, enter https://service.cn-hangzhou.maxcompute.aliyun.com/api.

Important

Make sure to select the correct network type. Otherwise, the connection will fail.

After you complete these configurations, you can use PyODPS in your local environment to perform basic operations on ODPS objects. These operations include list, get, exist, create, and delete. For more information about using PyODPS, see Overview of basic operations and DataFrame (Not recommended).

Execute SQL

  1. Execute SQL commands in a PyODPS node.

    In a PyODPS node, you can execute SQL commands using the traditional mode or MaxCompute Query Acceleration (MCQA). This feature currently supports Data Definition Language (DDL) and Data Manipulation Language (DML) commands. The accelerated query mode (MCQA) writes job results to a temporary cache. When you run the same query job again, MaxCompute returns the cached results to accelerate execution. For more information about the billing rules, see or Compute costs (pay-as-you-go). You can select an execution mode based on your requirements.

    Note

    You cannot use methods such as execute_sql() and run_sql() on the entry point object to run all types of SQL statements.

    You must use other methods to call non-DDL or non-DML statements. For example, use the create_table method to call a `CREATE TABLE` statement, and use the run_xflow or execute_xflow method to call an API command.

    Use the traditional mode to execute SQL commands

    You can use execute_sql()/run_sql() to execute SQL commands. The following code provides an example:

    Use the create_table method to create a new table.

    o.create_table('my_t', 'num bigint, id string', if_not_exists=True)

    Use the execute_sql method to execute an SQL query.

    result = o.execute_sql('SELECT * FROM pyodps_iris LIMIT 3')
    with result.open_reader() as reader:
        for record in reader:
            print(record)
    
  2. Read the SQL execution result in a PyODPS node.

    Use open_reader() to read the result of an SQL command. For more information, see Read SQL execution results.

For more information about SQL-related operations in PyODPS nodes, see SQL.

DataFrame

PyODPS provides a DataFrame API that you can use for data processing. For more examples of DataFrame operations, see DataFrame.

  • Execution

    To execute a DataFrame, you must explicitly call an immediate execution method, such as execute or persist. The following code provides an example.

    # Call an immediate execution method to process each record and print all data where iris.sepalwidth is less than 3 in the pyodps_iris table.
    from odps.df import DataFrame
    iris = DataFrame(o.get_table('pyodps_iris'))
    for record in iris[iris.sepalwidth < 3].execute():  
      print(record)
    
  • Print detailed information

    By default, the running process of a PyODPS node in a local environment does not print detailed information, such as the Logview. To print detailed information, such as the Logview, you must manually set the options.verbose option.

    from odps import options
    options.verbose = True
    

Set runtime parameters (hints)

To set runtime parameters for a task, use the hints parameter, which is a dictionary.

o.execute_sql('SELECT * FROM pyodps_iris', hints={'odps.sql.mapper.split.size': 16})

You can also configure global SQL settings. After the configuration is complete, the related runtime parameters are added to each run.

from odps import options
options.sql.settings = {'odps.sql.mapper.split.size': 16}
# Hints are added based on the global configuration.
o.execute_sql('SELECT * FROM pyodps_iris')

Complete example

  1. Create a file named test-pyodps-local.py on your local machine.

  2. Write the sample code.

    import os
    from odps import ODPS
    
    o = ODPS(
        # (Recommended) Make sure that the environment variables are set.
        # Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is set to your AccessKey ID.
        access_id=os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
        # Make sure that the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is set to your AccessKey secret.
        secret_access_key=os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
        
        project='your-default-project',
        endpoint='your-end-point',
    )
    
    # Create a non-partitioned table named my_new_table by directly specifying the field names and types.
    table = o.create_table('my_new_table', 'num bigint, id string', if_not_exists=True)
    
    # Insert data into the non-partitioned table my_new_table.
    records = [[111, 'aaa'],
               [222, 'bbb'],
               [333, 'ccc'],
               [444, 'Chinese']]
    o.write_table(table, records)
    
    # Read data from the non-partitioned table my_new_table.
    for record in o.read_table(table):
        print(record[0], record[1])
    
    # Read data from the table by running an SQL statement.
    result = o.execute_sql('SELECT * FROM pyodps_iris LIMIT 3;', hints={'odps.sql.allow.fullscan': 'true'})
    
    # Drop the table to release resources.
    table.drop()
    
    print('Read data from the pyodps_iris table using open_reader:')
    
    # Read the SQL execution result.
    with result.open_reader() as reader:
        for record in reader:
            print(record[0], record[1])
    
  3. Run the Python code.

    python test-pyodps-local.py

    The output is:

    111 aaa
    222 bbb
    333 ccc
    444 Chinese
    Read data from the pyodps_iris table using open_reader:
    4.9 3.0
    4.7 3.2
    4.6 3.1