Install PyODPS in DSW instances to read data from and write data to MaxCompute tables using SQL.
Prerequisites
Ensure the following requirements are met:
MaxCompute is activated. For more information, see Activate MaxCompute.
Your account has the required permissions for MaxCompute projects. Alibaba Cloud accounts require no authorization. For RAM users, follow these steps to grant the required permissions.
Before you install PyODPS, install Python3.6 or later.
Procedure
Use PyODPS to interact with data in MaxCompute or Machine Learning Designer. For more information, see the PyODPS documentation.
Install PyODPS.
In the DSW terminal, run the following command:
pip install pyodpsRun the following command to verify the installation. If the command runs without producing any output or errors, the installation is successful.
# On Windows, use python -c "from odps import ODPS" python3 -c "from odps import ODPS"To install packages for a non-default Python version, run the following command for that specific Python environment.
/home/tops/bin/python3.7 -m pip install setuptools>=3.0 # Example: /home/tops/bin/python3.7 is the installation path.Use SQL to read data from MaxCompute.
import numpy as np import pandas as pd import os from odps import ODPS from odps.df import DataFrame # Establish connection. o = ODPS( os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'), os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'), project='your-default-project', endpoint='your-end-point', ) # Read data from MaxCompute. sql = ''' SELECT * FROM your-default-project.<table> LIMIT 100 ; ''' query_job = o.execute_sql(sql) result = query_job.open_reader(tunnel=True) # Set n_process > 1 to enable multithreading for faster data reading. df = result.to_pandas(n_process=1)Configuration parameters:
ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET: AccessKey ID and AccessKey Secret. Set them as environment variables to prevent credential leakage.
Obtain an AccessKey pair. See Create an AccessKey pair.
Configure environment variables. See Configure environment variables.
your-default-project and your-end-point: Replace with your default project name and endpoint. For regional endpoints, see Endpoints.
For other PyODPS operations on MaxCompute tables, such as writing data, see Tables.