This topic shows you how to use PyODPS to read data from and write data to MaxCompute tables.

PyODPS

You can use PyODPS to read data from and write data to MaxCompute or Machine Learning Studio. PyODPS is an SDK for Python provided by Alibaba Cloud. For more information, see PyODPS documentation.

  1. Install PyODPS.
    In the Terminal interface of Data Science Workshop (DSW), run the following command:
    pip install pyodps
  2. Run the following command to check whether PyODPS is installed:
    python -c "from odps import ODPS"
  3. Execute SQL statements to read data from MaxCompute tables. The following sample code provides an example:
    import numpy as np
    import pandas as pd
    
    from odps import ODPS
    from odps.df import DataFrame
    # Establish a connection. 
    o = ODPS(
        '<your_AccessKey_ID>', 
        '<your_AccessKey_Secret>', 
        '<your_MaxCompute_project>',
        endpoint='<your_Project_Endpoint>')
        
    # Read data from the source MaxCompute table. 
    sql = '''
    SELECT  
        *
    FROM
        project.table
    LIMIT 100
    ;
    '''
    query_job = o.execute_sql(sql)
    result = query_job.open_reader(tunnel=True) 
    df = result.to_pandas(n_process=1) # You can set the n_process parameter based on the server configuration. If the n_process parameter is set to a value greater than 1, multiple threads are used to accelerate data reading. 
    Set the parameters that are described in the following table based on your requirements.
    Parameter Description
    <your_AccessKey_ID> The AccessKey ID of your Alibaba Cloud account.
    <your_AccessKey_Secret> The AccessKey secret of your Alibaba Cloud account.
    <your_MaxCompute_project> The name of the MaxCompute project.
    <your_Project_Endpoint> The endpoint of the MaxCompute project. For more information, see Configure endpoints. For example, if your MaxCompute project is deployed in the China (Hangzhou) region, the endpoint is http://service.cn-hangzhou.maxcompute.aliyun.com/api.