All Products
Search
Document Center

Platform For AI:Use PyODPS to read data from and write data to MaxCompute tables

Last Updated:Jan 19, 2026

In Data Science Workshop (DSW) of Platform for AI (PAI), you can use PyODPS to read data from MaxCompute tables.

Prerequisites

Before you perform the operations that are described in this topic, make sure that the following requirements are met:

  • MaxCompute is activated. For more information, see Activate MaxCompute and DataWorks.

  • The account you are using is granted permissions on MaxCompute projects. If you are using the Alibaba Cloud account, you do not need to authorize. If you are using an RAM user, you can perform the following steps to authorize:

    Steps

    1. Log on to the MaxCompute console with the Alibaba Cloud account. Select the desired region in the upper left corner.

    2. In the left-side navigation pane, choose Workspace > Projects.

    3. On the Projects page, click Manage in the Actions column.

    4. On the Role Permissions tab, find role_project_dev. Click Manage Members and add your RAM user.

    For more information about the permission management for MaxCompute, see Manage user permissions in the MaxCompute console.

  • Ensure that you have Python 3.6 or later installed.

Procedure

You can use PyODPS to read data from MaxCompute or Machine Learning Designer. For more information, see PyODPS documentation.

  1. Install PyODPS.

    In the DSW terminal, run the following command:

    pip install pyodps
  2. Run the following command to verify the installation. The installation is successful if no value or error message is returned.

    # For Windows, use python -c "from odps import ODPS"
    python3 -c "from odps import ODPS"
  3. If the Python version you want to use is not the default version of the system, run the following command to use the required version:

    /home/tops/bin/python3.7 -m pip install setuptools>=3.0
    #/home/tops/bin/python3.7 is the path of the installed Python.
  4. Execute SQL statements to read data from MaxCompute tables.

    import numpy as np
    import pandas as pd
    import os
    
    from odps import ODPS
    from odps.df import DataFrame
    # Establish a connection. 
    o = ODPS(
        os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
        os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
        project='your-default-project',
        endpoint='your-end-point',
    )
    
    # Read data from MaxCompute tables. 
    sql = '''
    SELECT  
        *
    FROM
        your-default-project.<table>
    LIMIT 100
    ;
    '''
    query_job = o.execute_sql(sql)
    result = query_job.open_reader(tunnel=True)
    df = result.to_pandas(n_process=1) # You can configure the n_process parameter based on the server configuration. If you set the n_process parameter to a value greater than 1, multiple threads are used to accelerate data reading.

    Parameters in the preceding code:

    • ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET: The AccessKey ID and AccessKey Secret of your Alibaba Cloud account. We recommend that you set them as environment variables to prevent leakage.

    For information about how to use PyODPS to perform other operations, such as write data to MaxCompute tables, see Tables.

References

DSW provides the SQL File feature to help you quickly query data from MaxCompute data sources by using SQL statements. For more information, see Use SQL files to query MaxCompute tables.