This topic describes how to use PyODPS to read data from MaxCompute tables.
You can use PyODPS to read data from MaxCompute or Machine Learning Studio. PyODPS is an SDK for Python provided by Alibaba Cloud.
- Install PyODPS. In the Terminal interface of Data Science Workshop (DSW), run the following command:
pip install pyodps
- Run the following command to check whether PyODPS is installed:
python -c "from odps import ODPS"
- Execute SQL statements to read data from MaxCompute tables.
Set the parameters that are described in the following table based on your requirements.
import numpy as np import pandas as pd from odps import ODPS from odps.df import DataFrame # Establish a connection. o = ODPS( '<your_AccessKey_ID>', '<your_AccessKey_Secret>', '<your_MaxCompute_project>', endpoint='<your_Project_Endpoint>') # Read data from MaxCompute tables. sql = ''' SELECT * FROM project.table LIMIT 100 ; ''' query_job = o.execute_sql(sql) result = query_job.open_reader(tunnel=True) df = result.to_pandas(n_process=1) # You can set the n_process parameter based on the server configuration. If the n_process parameter is set to a value greater than 1, multiple threads are used to accelerate data reading.
Parameter Description <your_AccessKey_ID> The AccessKey ID of your Alibaba Cloud account. <your_AccessKey_Secret> The AccessKey secret of your Alibaba Cloud account. <your_MaxCompute_project> The name of the MaxCompute project. <your_Project_Endpoint> The endpoint of the MaxCompute project. For more information, see Endpoints. For example, if your MaxCompute project is deployed in the China (Hangzhou) region, the endpoint is