Run Cross-Source SQL Queries for Instant Insights - DataWorks

Use SQL statements to quickly query and analyze data from sources such as MaxCompute, EMR Hive, and Hologres. This topic describes how to run SQL queries against data sources.

Important

This document applies to the new version of DataAnalysis. For the previous version, see SQL query (Old). You can switch between the new and old versions of DataAnalysis using the navigation bar.

Supported data sources

SQL query supports the following data source types: MaxCompute, Hologres, EMR, CDH, ADB for PostgreSQL, ADB for MySQL, ClickHouse, StarRocks, MySQL, PostgreSQL, Oracle, SQL Server, Doris, and SelectDB.

Note

Only MaxCompute supports both direct connections and data source connections. All other data source types support connections only to data sources that have been added to your workspace.

Data source permissions

Data source scope

Data source connection mode: You can select data only from data sources within a workspace where you have appropriate permissions. A workspace administrator must first add you to the workspace as a Data Analyst, Developer, O&M Engineer, or Workspace Administrator.
Direct connection mode: You can select only MaxCompute projects that your current Alibaba Cloud account has joined. For more information about MaxCompute project permissions, see MaxCompute users and permissions.

Data source access permissions

You can access data sources using one of two identity modes.

Access identity mode	Description	Supported data sources	Authorization
Executor identity	The identity of the Alibaba Cloud account currently logged on to DataWorks.	MaxCompute and Hologres.	Ask the administrator of the specified MaxCompute project or Hologres instance to grant you member access permissions.
Default access identity of the data source	The access identity configured when the data source was created.	All data sources supported by this feature.	If your current account is not the default access identity for the data source, ask a user with Workspace Administrator permissions to grant permissions to your current Alibaba Cloud account.

Important

If an IP address whitelist is enabled for the MaxCompute project, add the DataAnalysis whitelist to the MaxCompute project's IP address whitelist.

Feature access

Log on to the DataWorks DataAnalysis console, switch to the destination region, and click Go to DataAnalysis.

If you see Go to New DataAnalysis in the navigation bar, click it to switch to the new version.
If you see Back to Old DataAnalysis in the navigation bar, you are already on the new version page.

Create an SQL query

Hover over Personal Folder > My Files and click > New SQL File on the right.
You can also click New Folder to create a custom folder structure for your SQL query files.
On the SQL editor page, write your SQL query statement.
Important
You can also generate SQL statements in the following ways:
- On the Data Catalog page, after you add a data catalog, find the target table, right-click it, and select Generate Query SQL.
- Copy an SQL query shared with you from the Shared Files folder.
- After editing the SQL, click the Format button at the top to format the code.
- SQL query supports Copilot. Click the Copilot icon () in the upper-right corner of the navigation bar to use features such as code generation and code correction to help you complete your code. Copilot also supports automatic code completion in the editor to improve your writing efficiency.
Example SQL
The following SQL statement queries the public GitHub event data in MaxCompute:
```
-- Enable session-level schema syntax.
SET odps.namespace.schema=true; 
-- Query 100 rows from the dwd_github_events_odps table.
SELECT * FROM bigdata_public_dataset.github_events.dwd_github_events_odps WHERE ds='${dt}' LIMIT 100;
```
Note
When you query a MaxCompute project that has schema syntax enabled, add SET odps.namespace.schema=true; before the query statement. This enables session-level schema syntax and prevents query failures.
By default, queries run on the data source specified in the current Run Settings. To query other MaxCompute projects that you have joined, you must explicitly specify the project name in the query statement. For example, if the computing resource in the run settings is MaxCompute A, and the query statement specifies MaxCompute B (SELECT * FROM B.schema_name.table_name WHERE ****), the system uses A as the execution engine to query data in B.
After you write the SQL query code, click Run Settings on the right. Configure the Data Source, Script Parameters, and other settings for the SQL query.
- Type: Select the target data source type for the SQL query as needed.
- Computing Resource: Configure the target data source for the SQL query. Only MaxCompute queries support direct connections or data source connections. Other data source types support querying data only from data sources within workspaces where you have permissions. For more information about permissions, see Data source permissions.
- Script Parameters: If your SQL query contains parameter variables, assign values to them here at runtime.
  Important
  If parsing fails, go to Management > Settings in the lower-left corner. Search for parsing and enable the Enable dataworks Isp code parameter function configuration item.

In the toolbar at the top of the SQL Editor page, click Run. Click the drop-down icon to the right of the Run button to switch the run mode. Select a mode based on your scenario.

Run mode	User scenario	Trigger condition	Applicable engine
Query mode (LIMIT 10000)	Quickly preview data and validate query logic. Suitable for preliminary data exploration where you only need to see a small sample of results.	Query result displays ≤ 10,000 rows and ≤ 10 MB	No limit
Demand mode (full data)	Get the complete result set for analysis or export. The system automatically triggers this mode when you need to process and view all data.	Query result displays > 10,000 rows or > 10 MB	No limit
Temporary table mode	Reuse results in multi-step complex queries. Use the output of one query as the input for the next to improve development and debugging efficiency.	Query result displays ≤ 10,000 rows and ≤ 10 MB, and is automatically written to a temporary table	MaxCompute only

After the SQL execution completes, you can view the Operational Log, Results, and the corresponding SQL content on the query results page.
Click the button in the upper-right corner of the query results to switch the page layout between side-by-side and top-and-bottom views.

Visualize query results

In the toolbar on the left of the query results, click the button to automatically generate a visual chart from the results.

Note

Click the Copilot button above the chart to use the DataWorks Copilot intelligent chart and insight generation feature.
Click the Edit Chart button to adjust the chart style.

Export and share

Important

To export data to a local file and then import it into another data source, use an offline sync task in Data Integration. This method provides more efficient and stable data migration and synchronization.

On the right side of the SQL query results, click Export. The following export options are supported:

Local File: Downloads the query results to your local computer as a CSV file. Key points are as follows:

Item	Description
Download limits	Only the MaxCompute and EMR engines are supported. For more information, see Data download limits. If the data protection mechanism is enabled for the MaxCompute project (which prohibits data downloads), downloading data through DataAnalysis will fail.
Download scope	You can choose to download Only data displayed in the table or All data. Only data displayed in the table: Downloads only the data displayed on the current page, with a default maximum of `10000` rows. All data: Exports all query results within the export limit.
Download method	Supports Download with Approval and Download without Approval. Download with Approval: Allows you to identify risks in data download operations by setting Fraud Detection rules. When downloading data, you must submit a download approval request to ensure data compliance and security. Note Only DataWorks Enterprise Edition supports setting and enabling Fraud Detection rules. Download without Approval: This is the default method. No permission request is required during the download process.

Object Storage Service (OSS): Exports the query results to an Alibaba Cloud Object Storage Service (OSS) bucket in a specified format, such as CSV or Parquet. This option is suitable for archiving large data volumes or integrating with other cloud products.

The first time you use this feature, grant DataWorks permission to access your OSS resources. In the File Path drop-down list, click the one-click authorization link in the prompt and follow the on-screen instructions to complete the RAM authorization.

Configuration Item	Description
File path	Click the folder button on the right to select the OSS Bucket and directory where you want to store the result file.
File name	The system automatically generates a file name. You can also modify it manually.
Text Type	Select the file format for export. Supported formats include `csv`, `text`, `orc`, and `parquet`.
Separator	Specify the delimiter between columns. The default is a comma (`,`).
Encoding format	Select the encoding format for the file, such as `UTF-8` or `GBK`.
CU	Configure the number of computing units (CUs) for this export task. The default value is 1 CU.
Resource group	Select the Serverless resource group to run this export task. If you do not select one, the data integration resource group set in DataAnalysis > System Administration is used by default.

After completing the configuration, click OK to start the export task. You can view the export progress, operational log, and configuration details on the task running page. After the task succeeds, go to the OSS console to download the exported object file to your local computer.

DingTalk Sheet: You can export results to a DingTalk Sheet.
Workbook/Workbook and Share: You can save the results to a workbook for further analysis of query data. You can also share the latest analysis results from the workbook with others.