All Products
Search
Document Center

MaxCompute:Use PyODPS in a local environment

Last Updated:Mar 26, 2026

PyODPS is a Python SDK that lets you use MaxCompute for big data processing and analysis directly from your local machine. This topic shows you how to connect to MaxCompute, run SQL queries, use the DataFrame API, and set runtime parameters.

This topic covers local Python environments only. If you are using PyODPS in a DataWorks node, see the DataWorks documentation instead.

Prerequisites

Before you begin, ensure that you have:

Get started

  1. Open a Python editor and create a new .py file.

    If no Python editor is installed, create a file with a .py extension using any text editor.
  2. Write your PyODPS code. The following sections cover the main features with runnable examples.

  3. Save the file and run it with python <your-file>.py.

Initialize the ODPS entry point

The ODPS class is the entry point for all PyODPS operations. Initialize it with your credentials and project details before calling any other API.

Important

Set the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables before running your code. Hardcoding credentials in source files is a security risk. (Not recommended) If the environment variables are not set, you can explicitly specify the keys. To do this, obtain an access_id and secret_access_key.

import os
from odps import ODPS

o = ODPS(
    # Read credentials from environment variables
    access_id=os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
    secret_access_key=os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),

    # (Not recommended) To specify credentials explicitly instead:
    # access_id='your-aliyun-access-key-id',
    # secret_access_key='your-aliyun-access-key-secret',

    project='<your-default-project>',
    endpoint='<your-endpoint>',
)

Replace the following placeholders:

PlaceholderDescriptionExample
<your-default-project>Name of your MaxCompute project. Log in to the MaxCompute console to get a list of projects.my_project
<your-endpoint>Endpoint for your region and network type.https://service.cn-hangzhou.maxcompute.aliyun.com/api
Important

Select the correct network type for your endpoint. An incorrect network type causes connection failures.

After initialization, the o object supports list, get, exist, create, and delete operations on MaxCompute resources. For a full reference, see Overview of basic operations.

Execute SQL

MaxCompute supports two SQL execution modes: traditional mode and MaxCompute Query Acceleration (MCQA). Both support Data Definition Language (DDL) and Data Manipulation Language (DML) statements. MCQA caches job results and returns cached results on repeated runs to speed up execution. For billing details, see Compute costs (pay-as-you-go).

execute_sql() and run_sql() do not support all SQL statement types. For non-DDL and non-DML statements, use the corresponding method directly — for example, use create_table() for CREATE TABLE statements, and use run_xflow() or execute_xflow() for API commands.

Use the traditional mode to execute SQL commands

Use execute_sql() to run a SQL statement and open_reader() to read the results:

# Create a table using the dedicated method
o.create_table('my_t', 'num bigint, id string', if_not_exists=True)

# Run a SELECT query
result = o.execute_sql('SELECT * FROM pyodps_iris LIMIT 3')
with result.open_reader() as reader:
    for record in reader:
        print(record)

For more SQL operations, see SQL.

DataFrame

PyODPS provides a DataFrame API for data processing. Unlike pandas, a PyODPS DataFrame is lazy — it only runs when you call an immediate execution method such as execute() or persist().

from odps.df import DataFrame

# Load the pyodps_iris table as a DataFrame
iris = DataFrame(o.get_table('pyodps_iris'))

# Filter rows and print results — .execute() triggers the actual computation
for record in iris[iris.sepalwidth < 3].execute():
    print(record)

By default, PyODPS does not print execution details such as the Logview URL. To enable verbose output:

from odps import options
options.verbose = True

For more DataFrame examples, see DataFrame (Not recommended).

Set runtime parameters (hints)

Pass a hints dictionary to any SQL call to set per-query runtime parameters:

o.execute_sql('SELECT * FROM pyodps_iris', hints={'odps.sql.mapper.split.size': 16})

To apply hints to every SQL call in the session, configure them globally:

from odps import options

# All subsequent execute_sql() calls include these hints automatically
options.sql.settings = {'odps.sql.mapper.split.size': 16}
o.execute_sql('SELECT * FROM pyodps_iris')

Complete example

The following example covers the full workflow: connect to MaxCompute, create a table, write data, read it back, query the pyodps_iris table, and clean up.

Warning

This example calls table.drop() at the end, which permanently deletes the table my_new_table. Do not use an existing table name as the value of create_table() if you want to keep it.

  1. Create a file named test-pyodps-local.py.

  2. Add the following code:

    import os
    from odps import ODPS
    
    o = ODPS(
        access_id=os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
        secret_access_key=os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
        project='<your-default-project>',
        endpoint='<your-endpoint>',
    )
    
    # Create a non-partitioned table with two columns
    table = o.create_table('my_new_table', 'num bigint, id string', if_not_exists=True)
    
    # Write four records to the table
    records = [[111, 'aaa'],
               [222, 'bbb'],
               [333, 'ccc'],
               [444, 'Chinese']]
    o.write_table(table, records)
    
    # Read back the records
    for record in o.read_table(table):
        print(record[0], record[1])
    
    # Query the pyodps_iris table using SQL
    result = o.execute_sql('SELECT * FROM pyodps_iris LIMIT 3;', hints={'odps.sql.allow.fullscan': 'true'})
    
    # Drop the table to release resources
    table.drop()
    
    print('Read data from the pyodps_iris table using open_reader:')
    
    # Print query results
    with result.open_reader() as reader:
        for record in reader:
            print(record[0], record[1])
  3. Run the script:

    python test-pyodps-local.py

    The expected output is:

    111 aaa
    222 bbb
    333 ccc
    444 Chinese
    Read data from the pyodps_iris table using open_reader:
    4.9 3.0
    4.7 3.2
    4.6 3.1

What's next