All Products
Search
Document Center

AnalyticDB:Use Notebook to develop Spark SQL jobs

Last Updated:Mar 28, 2026

Notebook is an interactive development environment built into Data Management (DMS) that lets you write, run, and iterate on Spark SQL jobs directly in your browser — without leaving the AnalyticDB for MySQL ecosystem. Use it to explore data, build pipelines, and visualize results in a single interface connected to your cluster.

Note

Notebook is available only in the China (Hangzhou) region.

Prerequisites

Before you begin, ensure that you have:

Set up the environment

Step 1: Create a workspace

  1. Log on to the DMS console V5.0.

  2. In the upper-left corner, hover over the 2023-01-28_15-57-17.png icon and choose All Features > Data+AI > Notebook.

    Note

    In normal mode, choose Data+AI > Notebook in the top navigation bar.

  3. Click Create Workspace. In the dialog box, set the Workspace Name and Region, then click OK.

  4. In the Actions column of the workspace, click Go to Workspace.

Step 2: (Optional) Add workspace members

If multiple users share the workspace, add members and assign roles before proceeding.

Step 3: Configure code storage

  1. Open the image tab and click Storage Management.

  2. In the Code Storage section, configure the OSS path where notebook code will be stored.

Step 4: Add a compute resource

  1. Open the image tab and click Resource Configuration.

  2. Click Add Resource and configure the following parameters.

ParameterRequiredDescription
Resource NameYesA custom name for this resource
Resource IntroductionYesA description of the resource's purpose
ImageYesOnly Spark3.5+Python3.9 is supported
AnalyticDB InstanceYesThe ID of the AnalyticDB for MySQL cluster. If the cluster is not listed, register it with DMS first
AnalyticDB Resource GroupYesThe name of the job resource group
Executor SpecYesThe resource specification for Spark executors. This example uses the default medium specification. For all available specifications, see Spark application configuration parameters
Max Executors / Min ExecutorsYesThe number of Spark executors. After you select Spark3.5+Python3.9, Min Executors is automatically set to 2 and Max Executors to 8
Notebook SpecYesThe Notebook instance specification. This example uses General_Tiny_v1 (1 core, 4 GB)
VPC IDYesThe virtual private cloud (VPC) of the AnalyticDB for MySQL cluster, required for Notebook-to-cluster communication
Zone IDYesThe zone of the AnalyticDB for MySQL cluster
vSwitch IDYesThe vSwitch connected to the AnalyticDB for MySQL cluster
Security Group IDYesA security group that allows Notebook to communicate with the cluster
Release ResourceYesThe idle period after which the resource is automatically released
Dependent JarsNoThe OSS path of the JAR package. Required only when submitting Python jobs that use a JAR package
SparkConfNoSpark configuration parameters in key: value format. For AnalyticDB-specific parameters, see Spark application configuration parameters
Important

If you change the VPC and vSwitch of the AnalyticDB for MySQL cluster, update VPC ID and vSwitch ID in this resource to match. Otherwise, job submissions will fail.

  1. Click Save.

  2. In the Actions column of the resource, click Start.

Step 5: Register an OSS instance

This step connects your OSS bucket to the workspace so that Notebook can read and write data lake files.

  1. Hover over the 2023-01-28_15-57-17.png icon and choose All Features > Data Assets > Instances.

  2. Click +New. In the Add Instance dialog box, configure the following parameters.

SectionParameterValue
Data SourceData sourceOn the Alibaba Cloud tab, select OSS
Basic InformationFile and log storageAutomatically set to OSS
Instance regionThe region of the AnalyticDB for MySQL cluster
Connection methodAutomatically set to Connection String Address
Connection string addressoss-cn-hangzhou.aliyuncs.com
BucketThe name of your OSS bucket
Access modeThis example uses Security Hosting - Manual
AccessKey IDThe AccessKey ID of an Alibaba Cloud account or RAM user with OSS access. See Accounts and permissions
AccessKey SecretThe AccessKey secret of the same account. See Accounts and permissions
Advanced Information(Optional)See Register an Alibaba Cloud database instance for details
  1. Click Test Connection in the lower-left corner.

    Note

    If the connection test fails, check the instance information based on the error message.

  2. After the Successful connection message appears, click Submit.

  3. Go back to the workspace and open the image tab.

  4. On the Data Lake Data tab, click Add OSS and select the bucket you registered.

Develop a Spark SQL job

Step 6: Create a notebook

On the image tab, click the image icon and select Notebook.

image

Step 7: Write and run Spark SQL

Each notebook cell runs either Python code or SQL. Switch the cell type using the language selector in the cell toolbar.

Note

For a full reference of Notebook buttons and keyboard shortcuts, see the Use Notebook to query and analyze data topic.

  1. In a Code cell, install the Delta Lake Python library:

    pip install delta
  2. Switch the cell type to SQL and create a database with an OSS storage path:

    Note

    The db_delta database and the sample_data external table created in the next step automatically appear in the AnalyticDB for MySQL cluster. You can query the sample_data table directly from the AnalyticDB for MySQL console.

    CREATE DATABASE db_delta
    LOCATION 'oss://testBucketName/db_delta/';    -- Specify the storage path for data in the db_delta database.

    image

  3. Switch the cell type to Code and run the following script to create the sample_data external table and load sample data. The data is stored in the OSS path specified by the db_delta database location.

    # -*- coding: utf-8 -*-
    
    import pyspark
    from delta import *
    from pyspark.sql.types import *
    from pyspark.sql.functions import *
    
    
    print("Starting Delta table creation")
    
    data = [
        ("Robert", "Baratheon", "Baratheon", "Storms End", 48),
        ("Eddard", "Stark", "Stark", "Winterfell", 46),
        ("Jamie", "Lannister", "Lannister", "Casterly Rock", 29),
        ("Robert", "Baratheon", "Baratheon", "Storms End", 48),
        ("Eddard", "Stark", "Stark", "Winterfell", 46),
        ("Jamie", "Lannister", "Lannister", "Casterly Rock", 29),
        ("Robert", "Baratheon", "Baratheon", "Storms End", 48),
        ("Eddard", "Stark", "Stark", "Winterfell", 46),
        ("Jamie", "Lannister", "Lannister", "Casterly Rock", 29)
            ]
    
    schema = StructType([
        StructField("firstname", StringType(), True),
        StructField("lastname", StringType(), True),
        StructField("house", StringType(), True),
        StructField("location", StringType(), True),
        StructField("age", IntegerType(), True)
    ])
    
    sample_dataframe = spark.createDataFrame(data=data, schema=schema)
    
    sample_dataframe.delta').mode("overwrite").option('mergeSchema','true').saveAsTable("db_delta.sample_data")
  4. Switch the cell type to SQL and query the table to verify the data was loaded:

    SELECT * FROM db_delta.sample_data;

Analyze data from the AnalyticDB for MySQL console

The sample_data external table is automatically visible in the AnalyticDB for MySQL cluster. To run Spark SQL queries against it from the console:

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, select an edition tab, find the cluster, and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development. Select the Spark engine and an interactive resource group.

  3. Query the table:

    SELECT * FROM db_delta.sample_data LIMIT 1000;

What's next

  • Notebook: describes the information about Notebook.