Alibaba Cloud continuously improves the capabilities of Data Science Workshop (DSW) for algorithm development and model training. More features are added to DSW for big data development. This enables DSW to provide an all-in-one and AI-aided development environment for interactive experience. DSW allows you to streamline machine learning workflows, including data ingestion, data exploring and analytics, algorithm development, model training, and model deployment.

Features

You can work with the development environment of DSW to write SQL statements. The built-in SQL editor supports multiple features, including syntax highlighting, input prompt, and automatic completion of SQL statements. After you configure the data source, you can run SQL statements to read data from MaxCompute tables in all projects. Then, you can perform data visualization to display the data in charts.Features

Use dswmagic for big data development

dswmagic is a built-in Notebook command in DSW. After you load the command, you can write SQL statements to read data from MaxCompute tables. The command also allows you to use other features of DSW for big data development.

  1. Create a .ipynb file.
    1. In the top navigation pane on the Data Science Workshop page, choose File > New > Notebook.
    2. In the Select Kernel dialog box, select a kernel version and click SELECT.
  2. Load dswmagic.
    1. Enter the load command:
      %load_ext dswmagic
    2. On the top of the page, click the Run icon to run the command.
  3. Set Cell to sql.
    Add a cell for the .ipynb file. Select sql from the Cell list. Then, you can use the SQL editor to write SQL statements for the file.Select a type from the Cell list
  4. Configure the data source and endpoint.
    1. Click the Add icon on the right side of New DataSource.
    2. In the Config DataSource dialog box, set the following parameters:
      • AccessKey ID: the AccessKey ID of your Alibaba Cloud account.
      • AccessKey Secret: the AccessKey secret of your Alibaba Cloud account.
      • ProjectName: the name of the project where the MaxCompute table is stored.
      • Endpoint: the endpoints of resource types may vary in different regions.
        Region Resource type Endpoint
        China (Beijing) P100 GPU http://service-all.ext.odps.aliyun-inc.com/api
        China (Beijing) Resources other than P100 GPUs http://service.cn.maxcompute.aliyun.com/api
        China (Shanghai) M40 GPU http://service-all.ext.odps.aliyun-inc.com/api
        China (Shanghai) Resources other than M40 GPUs http://service.cn.maxcompute.aliyun.com/api
        China (Shenzhen) No limits http://service.cn.maxcompute.aliyun.com/api
        China (Hangzhou) No limits http://service.cn.maxcompute.aliyun.com/api
        Singapore (Singapore) No limits http://service.ap-southeast-1.maxcompute.aliyun.com/api
        India (Mumbai) No limits http://service.ap-south-1.maxcompute.aliyun.com/api
    3. Click Submit.
    4. From the New DataSource list, select the configured data source.
  5. Write and run SQL statements.
    1. The following example shows how to write SQL statements:
       SELECT * FROM <your_project>.<your_table> LIMIT 100;
      <your_project> indicates the name of the MaxCompute project. <your_table> indicates the name of the table in the MaxCompute project. Set the parameters based on your requirements.
    2. On the top of the page, click the Run icon to run SQL statements.Run SQL statements
      The built-in SQL editor allows you to run more than one SQL statement at a time. Separate SQL statements with semicolons (;). The results are displayed in rows. The results can be displayed in Excel tables, histograms, pie charts, curve charts, line charts, and scatter charts. Click the Set icon to set the X axis and Y axis of the chart. Click the Edit icon to edit the chart by using WebExcel. The generated results are saved in the df0 parameter. df0.values is a standard schema of Pandas DataFrame. The output data of Pandas DataFrame can be edited by using WebExcel and displayed in charts.