DataWorks Notebooks support multiple cell types and provide an interactive, modular environment for data processing, analysis, visualization, and model building.
Features
In DataWorks, you can use Notebook nodes to build an interactive, modular, and reusable analysis environment.
Multi-engine development: DataWorks Notebooks support SQL development and analysis for multiple big data engines.
Interactive analysis:
Interactive SQL queries: You can write widgets in Python to select or set parameter values. You can then reference these parameters and values in SQL to perform interactive queries between Python and SQL.
Write SQL query results to a DataFrame: You can store SQL query results directly in a Pandas DataFrame or MaxFrame DataFrame object. These objects can be passed as variables to subsequent cells.
Generate visualizations: You can read the DataFrame variable in a Python cell to plot charts based on the data. This enables efficient interaction between Python and SQL.
Integrated big data and AI development: You can use libraries such as Pandas in DataWorks Notebooks for data cleaning and preparation. This ensures your data meets the input requirements for algorithm models. You can then use the cleaned data to develop, train, and evaluate models, seamlessly integrating big data and AI.
Intelligent code generation: DataWorks Notebooks include a built-in programming assistant. You can use DataWorks Copilot to generate SQL and Python code and improve development efficiency.
Supported cell types
SQL cell:
Supported cell types:
MaxCompute SQL,Hologres SQL,EMR SPARK SQL,StarRocks SQL,Flink SQL Batch, andFlink SQL Streaming.Supported computing resources:
MaxCompute,Hologres,EMR Serverless Spark,EMR Serverless StarRocks, andFully Managed Flink.
Python cell.
Markdown cell.
Create a Notebook
Log on to DataWorks DataAnalysis. Switch to the destination region and click DataAnalysis.
If you see Go To The New DataAnalysis in the navigation bar, click it to switch to the new DataAnalysis page.
If you see Back To The Old DataAnalysis in the navigation bar, you are already on the new DataAnalysis page.
Hover over . Click the icon on the right.
You can also click New Folder to create a custom folder structure for your Notebook files.
Prepare the Notebook environment
1. Create a personal development environment instance
Notebooks run on personal development environment instances. Before you start, create and switch to a target instance. You can use the personal development environment instance to install dependencies for Notebook node development, such as third-party Python libraries.
2. Select a personal development environment
At the top of the DataAnalysis page, select the personal development environment instance that the Notebook will run on.
3. (Optional) Switch the Python kernel
You can click the
icon in the upper-right corner of the Notebook node to confirm the Python kernel version for the current Python cell and switch to another version if needed.
Develop a Notebook
1. Add a cell
In the Notebook node toolbar, you can click the SQL, Python, or Markdown button to quickly create the corresponding cell.
You can hover over the top or bottom edge of a cell and click the add button that appears to insert a new cell above or below the current one.
To reorder cells, you can hover over the blue line in front of a cell and drag it to a new position.
2. (Optional) Switch the cell type
In a cell, you can use the Cell type button in the lower-right corner to switch between cell types. For more information, see Supported cell types.
You can change a MaxCompute SQL cell to a Hologres SQL cell or another SQL cell type.
You can change a SQL cell to a Python or Markdown cell, or change a Python or Markdown cell to a SQL cell.
When you switch the cell type, the content is retained. You must manually adjust the code to match the new cell type.
3. Develop cell code
You can edit SQL, Python, and Markdown code in the corresponding cells. You can use DataWorks Copilot Ask to assist with programming. You can access the intelligent assistant in the following ways:
From the cell toolbar: Click the
icon in the upper-right corner of the cell to open the Copilot in-editor chat box and obtain programming assistance.From the context menu: Right-click a cell and select Editor to obtain programming assistance.
Using keyboard shortcuts:
macOS: Press
Command+Ito open the intelligent assistant chat box.Windows: Press
Ctrl+Ito open the intelligent assistant chat box.
Python development
By default, Python cells use the kernel of the personal development environment instance to run code. To access a specific computing resource service, you can also establish a connection to the computing resource using a built-in magic command.
SQL development
When you develop in a SQL cell, ensure that the SQL syntax matches the selected cell type, which corresponds to the computing resource type. You can click the
icon in the lower-right corner of the SQL cell to specify an attached computing resource. When you run the cell, the SQL statement runs on the specified computing resource.
Markdown development
You can use Markdown syntax to write and format text.
Run a Notebook
After you finish developing the cells in a Notebook, you can run all cells or run a single cell.
Run all cells: After you finish editing the Notebook, you can click the
icon at the top to run all cells in the Notebook node.Run a single cell: After you finish editing a cell, you can click the
icon to the left of the cell to run it.
SQL cells
You can write different types of SQL scripts in a cell. After you run a SQL script, the results are displayed below the cell.
Scenario 1: If the SQL script does not contain a SELECT statement, only the run log is displayed by default.
CREATE TABLE IF NOT EXISTS product ( product_id BIGINT, product_name STRING, product_type STRING, price DECIMAL(10, 2) ) LIFECYCLE 30; -- This sets the data lifecycle to 30 days. After 30 days, the data is automatically deleted. This setting is optional.Scenario 2: If the SQL script contains a SELECT statement, the run log is displayed. The results can be viewed as a table or a visualization. The system also automatically generates a DataFrame variable from the query results.
SELECT product_id, product_name, product_type, price FROM product;Generate a DataFrame data object:
The SQL cell automatically generates a return variable. You can click the
df_*variable name in the lower-left corner of the SQL cell to rename the generated DataFrame variable.
View the SQL query table:
After the SQL query runs, the results are displayed as a table in the log area by default.

View the SQL query visualization
After the SQL query runs, you can click the
icon on the left side of the log area to view a visualization of the table data generated by the query.
Python cells
You can write Python scripts in a cell. After you run a Python script, the results are displayed below the cell.
Scenario 1: Display only text output.
print("Hello World")Scenario 2: Display the contents of a Pandas DataFrame.
import pandas as pd # Define product data, including details: product name, region, and login frequency. product_data = { 'Product_Name': ['DataWorks', 'RDS MySQL', 'EMR Spark', 'MaxCompute'], 'Product_Region': ['East China 2 (Shanghai)', 'North China 2 (Beijing)', 'South China 1 (Shenzhen)', 'Hong Kong'], 'Login_Frequency': [33, 22, 11, 44] } # Create a DataFrame from the given data. df_products = pd.DataFrame(product_data) # Print the DataFrame to display the product information. print(df_products)
Scenario 3: Plot a chart.
import matplotlib.pyplot as plt # Data categories = ['DataWorks', 'RDS MySQL', 'MaxCompute', 'EMR Spark', 'Hologres'] values = [23, 45, 56, 78, 30] # Create a bar chart plt.figure(figsize=(10, 6)) plt.bar(categories, values, color=['blue', 'green', 'red', 'purple', 'orange']) # Add a title and labels plt.title('Example Bar Chart') plt.xlabel('category') plt.ylabel('value') # Display the graph plt.show()
Markdown cells
After you finish writing, you can click the
icon to display the formatted Markdown text.
# DataWorks NotebookIn a Markdown cell that is displaying formatted text, you can click the
icon to continue editing the cell.
References
For more information, see Scenarios and practices.