This topic describes how to use PyODPS to read data from MaxCompute tables.

PyODPS

You can use PyODPS to read data from MaxCompute or Machine Learning Studio. PyODPS is an SDK for Python provided by Alibaba Cloud.

  1. Install PyODPS.
    In the Terminal interface of Data Science Workshop (DSW), run the following command:
    pip install pyodps
  2. Run the following command to check whether PyODPS is installed:
    python -c "from odps import ODPS"
  3. Execute SQL statements to read data from MaxCompute tables.
    import numpy as np
    import pandas as pd
    
    from odps import ODPS
    from odps.df import DataFrame
    # Establish a connection. 
    o = ODPS(
        '<your_AccessKey_ID>', 
        '<your_AccessKey_Secret>', 
        '<your_MaxCompute_project>',
        endpoint='<your_Project_Endpoint>')
        
    # Read data from MaxCompute tables. 
    sql = '''
    SELECT  
        *
    FROM
        project.table
    LIMIT 100
    ;
    '''
    query_job = o.execute_sql(sql)
    result = query_job.open_reader(tunnel=True) 
    df = result.to_pandas(n_process=1) # You can set the n_process parameter based on the server configuration. If the n_process parameter is set to a value greater than 1, multiple threads are used to accelerate data reading. 
    Set the parameters that are described in the following table based on your requirements.
    Parameter Description
    <your_AccessKey_ID> The AccessKey ID of your Alibaba Cloud account.
    <your_AccessKey_Secret> The AccessKey secret of your Alibaba Cloud account.
    <your_MaxCompute_project> The name of the MaxCompute project.
    <your_Project_Endpoint> The endpoint of the MaxCompute project. For more information, see Endpoints. For example, if your MaxCompute project is deployed in the China (Hangzhou) region, the endpoint is http://service.cn-hangzhou.maxcompute.aliyun.com/api.
    For more information about how to use PyODPS to perform other operations such as data writes on MaxCompute tables, see Tables.