You can use SQL statements to quickly query and analyze data from sources such as MaxCompute, EMR Hive, and Hologres. This topic describes how to query data sources using SQL statements.
DataWorks recommends that you try the new version of DataAnalysis to access the latest features and enjoy a better user experience.
Supported data sources
SQL query supports data sources such as MaxCompute, Hologres, EMR, CDH, StarRocks, ClickHouse, SelectDB, Doris, AnalyticDB for MySQL 3.0, AnalyticDB for PostgreSQL, Tablestore, MySQL, PostgreSQL, Oracle, and SQL Server.
Only data sources that are added to a workspace are supported.
Data source permissions
Data source scope
You can query data only from data sources within workspaces for which you have permissions. You can contact an administrator to add you to a workspace as a Data Analyst, Model Designer, Developer, O&M, Workspace Administrator, or Project Owner.
Data source access permissions
Data source access supports the following two identity modes.
Access identity mode | Description | Supported data sources | Authorization |
Executor Identity | The identity of the Alibaba Cloud account that is currently logged on to DataWorks. | MaxCompute and Hologres. | Ask the administrator of the specified MaxCompute project or Hologres instance to grant you member access permissions. |
Data Source Default Access Identity | The access identity configured when the data source was created. | All data sources. | If your current account is not the default access identity for the data source, ask a user with Workspace Administrator permissions to grant the permission to your current Alibaba Cloud account. |
If whitelist-based access control is enabled for the MaxCompute project, you must add the DataAnalysis whitelist to the IP address whitelist of the MaxCompute project.
Entry point
Log on to DataWorks DataAnalysis, switch to the destination region, and then click Enter DataAnalysis.
If you see Go To New DataAnalysis in the navigation bar, you are on the legacy DataAnalysis page.
(Not recommended) If you see Return To Legacy DataAnalysis in the navigation bar, you can click it to switch back to the legacy DataAnalysis page.
Step 1: Add a folder
You can add system-recommended folders and tables from authorized data sources to your folder list. After a folder is added, you can quickly view its tables and table schemas, and generate SQL query statements.
On the SQL Analysis page, click the
icon to the right of the search box above the folder list to add a folder.Data Map - Metadata: Table metadata collected in Data Map. Each data source or computing resource can be added as a dataset.
Data Map - Data Album: Table management from the business perspective: Data albums in Data Map that group tables by subject. Each data album can be added as a dataset.
My Favorites: Tables that you have added to your favorites in Data Map.
My MaxCompute Tables: All MaxCompute tables for which the currently logged-on account is the owner.
Public Tabels: Public datasets provided by MaxCompute that you can use to quickly generate test data.
NoteYou can add up to 12 datasets. You can remove datasets that you no longer need.
Step 2: Create an SQL query
Query based on the data catalog
After you add data folders, the corresponding datasets appear in the folder tree on the left.
In the folder tree on the left, click to open an added dataset, such as My MaxCompute Tables.
Right-click the name of the table you want to analyze and select Generate SQL Statement. A temporary file that contains a recommended SQL statement for the table is generated.
Modify the SQL statement as needed and click Save to save the temporary file to My Files.
Query based on a data source
In the folder tree on the left, move your mouse pointer over My Files and click the
icon on the right to create a file.Write an SQL query statement in the new file and save it to My Files.
NoteWhen you edit the SQL statement, DataWorks automatically completes the names of MaxCompute tables for which you have permissions.
Query based on a shared SQL file
In the folder tree on the left, click Other People's Files From to view SQL files shared by other users. Click an SQL file and click Copy The SQL on the details page that appears on the right.
Query based on a public dataset
After you add a public dataset to the folder list, click the dataset. On the details page that appears on the right, you can select a DPI engine from the top bar and click Generate SQL Statement. You can use public datasets for testing.
Step 3: Configure the query engine and run the query
Click the
icon in the upper-right corner of the SQL details page to configure the SQL query engine.Configuration item
Description
Workspace
The workspace where the execution engine is located.
ImportantMake sure you have access permissions to the workspace. If you do not, contact a workspace administrator to add you as a workspace member.
Data Source Type
The type and name of the execution engine.
ImportantIf no project is specified in the SQL statement, the execution engine is the default data source.
Data Source Name
Access Identity Mode
Select an access mode for the SQL query:
Executor Identity: Only supports MaxCompute and Hologres engines. This mode is recommended if you are a member of the MaxCompute project or Hologres instance and have Select permissions.
Data Source Default Access Identity: If your current account is not the same as the default access identity set when the data source was created, grant this identity to your current account.
After you write the SQL statement, you can either click Run All or select a part of the statement and click Run Selected.
Before a MaxCompute SQL statement is run, an estimated cost is displayed. You can also click in the toolbar above the SQL file to obtain an estimate.
After the SQL statement is executed, you can view the Run Log, Run Result, and the corresponding SQL content on the query result page.
You can click the button in the upper-right corner of the query results to switch the page layout between a side-by-side layout and a top-and-bottom layout.
Step 4: Visualize query results
In the toolbar on the left of the query results, you can click the button to automatically generate a visual chart from the results.
You can click the Copilot button above the chart to try the DataWorks Copilot Ask feature.
Step 5: Export and share
To export data to a local file and then import it into another data source, you can use an offline sync task in Data Integration for more efficient and stable data migration and synchronization.
You can export SQL query results in the following formats:
When you need to download large amounts of data from MaxCompute, we recommend that you click theicon in the lower-left corner of the SQL Query menu to change the SQL execution mode to Run And Generate Temporary Table
Local file: You can download query results to a local device in CSV, TXT, or XLS format. The key points are described below:
Item
Description
Download limits
Only MaxCompute and EMR engines are supported. For more information, see Number of data rows that can be downloaded.
If the data protection mechanism is enabled for the MaxCompute project (data download is prohibited), downloading data through DataAnalysis will fail.
ImportantIf this option is not available, see Why are my query results or download options limited? for troubleshooting.
Download Scope
You can choose to download Only Data Displayed In The Table or All Data.
Data Displayed In The Table Only: Downloads only the data on the current page, up to a default maximum of
10000records.All Data: Exports all queried result data within the download limit.
Download method
Supports Download After Approval and Download Without Approval.
Download After approval: You can set Fraud Detection rules to detect risks in data download operations. When downloading data, you must submit a download approval request to ensure data compliance and security.
NoteOnly DataWorks Enterprise Edition supports setting and enabling Fraud Detection rules.
Download without approval: This is the default method. No permission request is required during the download process.
Object Storage Service (OSS): You can export query results in a specified format, such as CSV or Parquet, to an Alibaba Cloud Object Storage Service (OSS) space. This is suitable for archiving large data volumes or integrating with other cloud products.
The first time you use this feature, you must grant permissions to DataWorks to access your OSS resources. In the File path drop-down list, click the one-click authorization link in the prompt and follow the on-screen instructions to complete the RAM authorization.
Configuration item
Description
File path
Click the folder icon on the right to select the OSS Bucket and folder where you want to store the result file.
File name
The system automatically generates a file name. You can also modify it manually.
Text type
Select the export file format. Supported formats are
csv,text,orc, andparquet.Separator
Specifies the delimiter between columns. The default is a comma (
,).Encoding format
Select the encoding format for the file, such as
UTF-8orGBK.CU
Configure the number of computing units (CUs) for this export task. The default value is 1 CU.
Resource group
Select the Serverless resource group to run this export task. If you do not select one, the data integration resource group set in DataAnalysis > System Administration is used by default.
After the configuration is complete, click OK to start the export task. You can view the export progress, run log, and configuration details on the task running page. After the task succeeds, go to the OSS console to download the exported object file to your local device.
MaxCompute table: You do not need to download data to a local device and then upload it to a MaxCompute table. You can set the table lifecycle as needed.
This option appears only when you query content from a MaxCompute engine.
Workbook: You can save the results to a workbook for further analysis of the query data. You can also share the latest analysis results from the workbook with others.
DingTalk Sheet: You can export results to a DingTalk Sheet .
More operations
Manage SQL file versions
On the SQL file editing page, you can also click in the top toolbar to view the differences between automatically saved code and manually saved code, and select the version you want to save.
Search code
Above the folder tree on the left, click
and enter a keyword to search for code. This feature is available only in DataWorks Standard Edition and later.
View run history
Above the folder tree on the left, click
to view the historical execution records of SQL queries.
FAQ
How do I grant the default access identity permission for a data source?
Go to Security Center.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Security Center.
In the navigation pane on the left, click to go to the DataAnalysis permission control page.
Switch the workspace, find the destination data source, and click the Authorize button on the right to grant permissions.

Why does an SQL query execution fail?
If you encounter the This node can only run on exclusive resource groups error during execution, go to and configure the Schedule Resource Group and Data Integration Resource Group for the corresponding engine.
Why are my query results or download options limited?
Only a portion of the SQL query results can be displayed. Follow the steps below to adjust the display limit to the maximum. You can also manage download capabilities. For more information, see Data query and analysis control.
Go to Security Center.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Security Center.
In the navigation pane on the left, click to go to the DataAnalysis permission control page.
Switch to the Query Result Control tab and adjust Maximum value of single display record, Maximum value of single copy record, Maximum value of single download record, and Allow Downloads.
