All Products
Search
Document Center

DataWorks:Query and analyze data with Notebooks

Last Updated:Nov 24, 2025

DataWorks Notebooks support multiple cell types and provide an interactive, modular environment for data processing, analysis, visualization, and model building.

Features

In DataWorks, you can use Notebook nodes to build an interactive, modular, and reusable analysis environment.

  • Multi-engine development: DataWorks Notebooks support SQL development and analysis for multiple big data engines.

  • Interactive analysis:

    • Interactive SQL queries: You can write widgets in Python to select or set parameter values. You can then reference these parameters and values in SQL to perform interactive queries between Python and SQL.

    • Write SQL query results to a DataFrame: You can store SQL query results directly in a Pandas DataFrame or MaxFrame DataFrame object. These objects can be passed as variables to subsequent cells.

    • Generate visualizations: You can read the DataFrame variable in a Python cell to plot charts based on the data. This enables efficient interaction between Python and SQL.

  • Integrated big data and AI development: You can use libraries such as Pandas in DataWorks Notebooks for data cleaning and preparation. This ensures your data meets the input requirements for algorithm models. You can then use the cleaned data to develop, train, and evaluate models, seamlessly integrating big data and AI.

  • Intelligent code generation: DataWorks Notebooks include a built-in programming assistant. You can use DataWorks Copilot to generate SQL and Python code and improve development efficiency.

Supported cell types

  • SQL cell:

    • Supported cell types: MaxCompute SQL, Hologres SQL, EMR SPARK SQL, StarRocks SQL, Flink SQL Batch, and Flink SQL Streaming.

    • Supported computing resources: MaxCompute, Hologres, EMR Serverless Spark, EMR Serverless StarRocks, and Fully Managed Flink.

  • Python cell.

  • Markdown cell.

Create a Notebook

  1. Log on to DataWorks DataAnalysis. Switch to the destination region and click DataAnalysis.

    1. If you see Go To The New DataAnalysis in the navigation bar, click it to switch to the new DataAnalysis page.

    2. If you see Back To The Old DataAnalysis in the navigation bar, you are already on the new DataAnalysis page.

  2. Hover over Personal Directory > My Files. Click the image > New Notebook File icon on the right.

    You can also click New Folder to create a custom folder structure for your Notebook files.

Prepare the Notebook environment

1. Create a personal development environment instance

Notebooks run on personal development environment instances. Before you start, create and switch to a target instance. You can use the personal development environment instance to install dependencies for Notebook node development, such as third-party Python libraries.

2. Select a personal development environment

At the top of the DataAnalysis page, select the personal development environment instance that the Notebook will run on.

3. (Optional) Switch the Python kernel

You can click the image icon in the upper-right corner of the Notebook node to confirm the Python kernel version for the current Python cell and switch to another version if needed.

Develop a Notebook

1. Add a cell

In the Notebook node toolbar, you can click the SQL, Python, or Markdown button to quickly create the corresponding cell.

You can hover over the top or bottom edge of a cell and click the add button that appears to insert a new cell above or below the current one.

To reorder cells, you can hover over the blue line in front of a cell and drag it to a new position.

2. (Optional) Switch the cell type

In a cell, you can use the Cell type button in the lower-right corner to switch between cell types. For more information, see Supported cell types.

  • You can change a MaxCompute SQL cell to a Hologres SQL cell or another SQL cell type.

  • You can change a SQL cell to a Python or Markdown cell, or change a Python or Markdown cell to a SQL cell.

Note

When you switch the cell type, the content is retained. You must manually adjust the code to match the new cell type.

3. Develop cell code

You can edit SQL, Python, and Markdown code in the corresponding cells. You can use DataWorks Copilot Ask to assist with programming. You can access the intelligent assistant in the following ways:

  • From the cell toolbar: Click the image icon in the upper-right corner of the cell to open the Copilot in-editor chat box and obtain programming assistance.

  • From the context menu: Right-click a cell and select Copilot > Editor Inline ChatEditor to obtain programming assistance.

  • Using keyboard shortcuts:

    • macOS: Press Command+I to open the intelligent assistant chat box.

    • Windows: Press Ctrl+I to open the intelligent assistant chat box.

Python development

By default, Python cells use the kernel of the personal development environment instance to run code. To access a specific computing resource service, you can also establish a connection to the computing resource using a built-in magic command.

SQL development

When you develop in a SQL cell, ensure that the SQL syntax matches the selected cell type, which corresponds to the computing resource type. You can click the image icon in the lower-right corner of the SQL cell to specify an attached computing resource. When you run the cell, the SQL statement runs on the specified computing resource.

Markdown development

You can use Markdown syntax to write and format text.

Run a Notebook

After you finish developing the cells in a Notebook, you can run all cells or run a single cell.

  • Run all cells: After you finish editing the Notebook, you can click the image icon at the top to run all cells in the Notebook node.

  • Run a single cell: After you finish editing a cell, you can click the image icon to the left of the cell to run it.

SQL cells

You can write different types of SQL scripts in a cell. After you run a SQL script, the results are displayed below the cell.

  • Scenario 1: If the SQL script does not contain a SELECT statement, only the run log is displayed by default.

    CREATE TABLE IF NOT EXISTS product (
        product_id BIGINT,
        product_name STRING,
        product_type STRING,
        price DECIMAL(10, 2)
    )
    LIFECYCLE 30; -- This sets the data lifecycle to 30 days. After 30 days, the data is automatically deleted. This setting is optional.
  • Scenario 2: If the SQL script contains a SELECT statement, the run log is displayed. The results can be viewed as a table or a visualization. The system also automatically generates a DataFrame variable from the query results.

    SELECT 
    product_id,
    product_name,
    product_type,
    price 
    FROM product;
    • Generate a DataFrame data object:

      The SQL cell automatically generates a return variable. You can click the df_* variable name in the lower-left corner of the SQL cell to rename the generated DataFrame variable.

      image

    • View the SQL query table:

      After the SQL query runs, the results are displayed as a table in the log area by default.

      image

    • View the SQL query visualization

      After the SQL query runs, you can click the image icon on the left side of the log area to view a visualization of the table data generated by the query.

      image

Python cells

You can write Python scripts in a cell. After you run a Python script, the results are displayed below the cell.

  • Scenario 1: Display only text output.

    print("Hello World")
  • Scenario 2: Display the contents of a Pandas DataFrame.

    import pandas as pd
    
    # Define product data, including details: product name, region, and login frequency.
    product_data = {
        'Product_Name': ['DataWorks', 'RDS MySQL', 'EMR Spark', 'MaxCompute'],
        'Product_Region': ['East China 2 (Shanghai)', 'North China 2 (Beijing)', 'South China 1 (Shenzhen)', 'Hong Kong'],
        'Login_Frequency': [33, 22, 11, 44]
    }
    
    # Create a DataFrame from the given data.
    df_products = pd.DataFrame(product_data)
    
    # Print the DataFrame to display the product information.
    print(df_products)

    image

  • Scenario 3: Plot a chart.

    import matplotlib.pyplot as plt
    
    # Data
    categories = ['DataWorks', 'RDS MySQL', 'MaxCompute', 'EMR Spark', 'Hologres']
    values = [23, 45, 56, 78, 30]
    
    # Create a bar chart
    plt.figure(figsize=(10, 6))
    plt.bar(categories, values, color=['blue', 'green', 'red', 'purple', 'orange'])
    
    # Add a title and labels
    plt.title('Example Bar Chart')
    plt.xlabel('category')
    plt.ylabel('value')
    
    # Display the graph
    plt.show()

    image

Markdown cells

After you finish writing, you can click the image icon to display the formatted Markdown text.

# DataWorks Notebook
Note

In a Markdown cell that is displaying formatted text, you can click the image icon to continue editing the cell.

References

For more information, see Scenarios and practices.