This topic shows you how to use PyODPS to read data from and write data to MaxCompute tables.
You can use PyODPS to read data from and write data to MaxCompute or Machine Learning Studio. PyODPS is an SDK for Python provided by Alibaba Cloud. For more information, see PyODPS documentation.
- Install PyODPS.
In the Terminal interface of Data Science Workshop (DSW), run the following command:
pip install pyodps
- Run the following command to check whether PyODPS is installed:
python -c "from odps import ODPS"
- Execute SQL statements to read data from MaxCompute tables. The following sample code provides an example:
Set the parameters that are described in the following table based on your requirements.
import numpy as np import pandas as pd from odps import ODPS from odps.df import DataFrame # Establish a connection. o = ODPS( '<your_AccessKey_ID>', '<your_AccessKey_Secret>', '<your_MaxCompute_project>', endpoint='<your_Project_Endpoint>') # Read data from the source MaxCompute table. sql = ''' SELECT * FROM project.table LIMIT 100 ; ''' query_job = o.execute_sql(sql) result = query_job.open_reader(tunnel=True) df = result.to_pandas(n_process=1) # You can set the n_process parameter based on the server configuration. If the n_process parameter is set to a value greater than 1, multiple threads are used to accelerate data reading.
Parameter Description <your_AccessKey_ID> The AccessKey ID of your Alibaba Cloud account. <your_AccessKey_Secret> The AccessKey secret of your Alibaba Cloud account. <your_MaxCompute_project> The name of the MaxCompute project. <your_Project_Endpoint> The endpoint of the MaxCompute project. For more information, see Configure endpoints. For example, if your MaxCompute project is deployed in the China (Hangzhou) region, the endpoint is