This topic describes how to read data from and write data to MaxCompute tables by using PyODPS or dswmagic.

PyODPS

You can use PyODPS to read data from and write data to MaxCompute or Machine Learning Studio. PyODPS is an SDK for Python provided by Alibaba Cloud. For more information, see PyODPS documentation.

  1. Install PyODPS.
    In the Terminal of Data Science Workshop (DSW), run the following command:
    pip install --user pyodps
  2. Read data from MaxCompute tables. In this topic, read the first 10 rows of a table from a MaxCompute project.
    from odps import ODPS
    from odps.df import DataFrame
    o = ODPS('<your_AccessKey_ID>', '<your_AccessKey_Secret>',project='<your_MaxCompute_project>', endpoint='http://service-all.ext.odps.aliyun-inc.com/api')
    users = DataFrame(o.get_table('<your_table_name>'))
    print(users.head(10))
    Set the following parameters based on your requirements.
    Parameter Description
    <your_AccessKey_ID> The AccessKey ID of your Alibaba Cloud account.
    <your_AccessKey_Secret> The AccessKey secret of your Alibaba Cloud account.
    <your_MaxCompute_project> The name of the MaxCompute project.
    http://service-all.ext.odps.aliyun-inc.com/api The endpoint of GPU M40 instances deployed in China (Shanghai) and subscription P100 instances deployed in China (Beijing). For other DSW instances, the endpoint is http://service.cn.maxcompute.aliyun.com/api.
    <your_table_name> The name of the MaxCompute table.

dswmagic

dswmagic is a built-in Notebook command in DSW. After you load the command, you can write SQL statements to read data from MaxCompute tables.

  1. Create a .ipynb file.
    1. In the top navigation pane of the Data Science Workshop page, choose File > New > Notebook.
    2. In the Select Kernel dialog box, select a kernel version and click SELECT.
  2. Load dswmagic.
    1. Enter the following load command:
      %load_ext dswmagic
    2. On the top of the page, click the Run icon to run the command.
  3. Set Cell to sql.
    Add a cell for the .ipynb file. Select sql from the Cell list, then you can use the SQL editor to write SQL statements for the file.Select a type from the Cell list
  4. Configure the data source and endpoint.
    1. Click the Add icon on the right side of New DataSource.
    2. In the Config DataSource dialog box, set the following parameters.
      Parameter Description
      AccessKey ID The AccessKey ID of your Alibaba Cloud account.
      AccessKey Secret The AccessKey secret of your Alibaba Cloud account.
      ProjectName The name of the project where the MaxCompute table is stored.
      Endpoint
      • Endpoint of P100 instances deployed in China (Beijing) and M40 instances deployed in China (Shanghai): http://service-all.ext.odps.aliyun-inc.com/api
      • Endpoint of instances deployed in other regions: http://service.cn.maxcompute.aliyun.com/api
    3. Click Submit.
    4. From the New DataSource list, select the configured data source.
  5. Write and run SQL statements.
    1. The following is an example about how to write SQL statements:
       SELECT * FROM <your_project>.<your_table> LIMIT 100;
      <your_project> indicates the name of the MaxCompute project. <your_table> indicates the name of the table in the MaxCompute project. Set the parameters based on your requirements.
    2. On the top of the page, click the Run icon to run SQL statements.Run SQL statements