Query data of data sources by executing SQL statements - DataWorks

To query and analyze data in an efficient manner, you can execute SQL statements. For example, you can execute SQL statements to query data of data sources such as MaxCompute, E-MapReduce (EMR) Hive, and Hologres. This topic describes how to execute SQL statements to query data of data sources.

Data source types that support queries based on SQL statements

The following types of data sources support queries based on SQL statements: MaxCompute, Hologres, EMR Hive, EMR Spark SQL, EMR Impala, EMR Presto, EMR Trino, CDH Hive, StarRocks, ClickHouse, AnalyticDB for MySQL 3.0, AnalyticDB for PostgreSQL, and MySQL.

Prerequisites

A compute engine is associated with your workspace. For more information, see Create and manage workspaces.
A data source is added to your workspace. Data sources are classified into data sources that are automatically generated when you associate compute engines with a workspace and data sources that are added to DataWorks on the Data Source page. For more information, see Add and manage data sources.
You are granted query permissions on specific data sources in the DataAnalysis service.
Your account is added as a member of the workspace, and the Data Analyst, Model Developer, Development, O&M, Workspace Administrator, or Project Owner role is assigned to the member. For more information, see Add workspace members and assign roles to them.
Note
You can use the SQL query feature to query only the data of the data sources in a workspace on which you have query permissions. Before you can query data by using the feature, you must contact Workspace Manager to add your account to the workspace as a member and assign one of the preceding roles to your account.

Precautions

Data source permissions
For a workspace in standard mode, you can be authorized to use only the data sources in the development environment. For a workspace in basic mode, you can be authorized by the administrator to use any data sources.
Note
- For information about the data sources on which the administrator can grant permissions to your account, see supported data source types.
- For information about how to request permissions on data sources, see Security Center.
- For information about workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.
Data source query
When you use the SQL query feature to query data of the MaxCompute data source that is automatically generated when you associate a MaxCompute compute engine with your workspace, your logon account is automatically used to access data of the MaxCompute data source. If you want to access data of tables in the production environment, you must specify the name of the project to which the tables belong in SQL commands. If you do not have permissions to query the data of the tables in the production environment, go to Security Center to request the permissions.
Whitelist-based access control for data sources
If whitelist-based access control is enabled for a MaxCompute project, errors may occur when you use features provided by DataAnalysis, such as fee calculation, data download, dimension table usage, and data upload, to perform operations on tables in the MaxCompute project. To ensure that DataAnalysis can access the MaxCompute project, you must add the IP address information about DataAnalysis in a specific region to the IP address whitelist of the MaxCompute project in advance. For more information, see Appendix: IP address whitelist for DataAnalysis.
The upgraded SQL query feature is available. If the new GUI of the SQL query feature displayed to you is inconsistent with the GUI described in this topic or specific functionalities are missing, you can refer to the FAQ about the upgraded SQL query feature section in this topic.

Features

Feature	Description
Add a directory	You can add the recommended directories or add tables in the data sources on which you have query permissions as a directory. After you add tables as a directory, you can view the SQL files or data tables in the directory. You can also view the schema of a table and perform simple operations to generate SQL statements for the table.
Create an SQL query file	You can create an SQL query file, write SQL query statements, and then commit the SQL query statements to a specific data source for execution.
Execute SQL query statements	After you write SQL query statements, you can execute the SQL query statements to query the data of the data source.
Perform operations on the query results	After the SQL query statements are executed, you can view the run logs, running results, and SQL statements that correspond to the query results in the Result section. You can also view the query results in line charts, column charts, bar charts, pie charts, and cross tables.
Manage SQL query files	You can manage SQL query files in a centralized manner.

Go to the SQL Query page

You can use one of the following methods to go to the SQL Query page:

On the homepage of DataAnalysis, click the SQL Query card in the Shortcuts section to go to the SQL Query page.
In the left-side navigation pane of the DataAnalysis page, click SQL Query to go to the SQL Query page.

Add a directory

By default, the SQL Query page displays only the My Files and Other People's Files directories. You can click the 数据集 icon to the right of the search box to add directories based on your business requirements. After you add a directory, you can view SQL files or data tables in the directory. You can also view the schema of a table and perform simple operations to generate SQL statements for the table.

You can add different types of directories on the following tabs:

Recommended Directory: This tab displays the recommended directories that you can add, such as My Favorites, My MaxCompute tables, and Public Tables.
DataMap - Metadata: You can add a compute engine project that is associated with a workspace as a directory. For more information, see Create and manage workspaces and Add and manage data sources.
DataMap - Data Album: On this tab, you can add the data albums that you manage, create, and follow as directories. For more information, see Table management from the business perspective: Data albums.

Note

Others People's Files: stores the SQL query files that are saved by other members. The files can be viewed by all members of the current workspace.
You can add up to eight directories.

Create an SQL query file

You can create SQL query files based on the workspace mode and the compute engine type.

Create an SQL query file.
You can create an SQL query file by using one of the following methods:
- Manually create an SQL query file.
  In the left-side navigation pane of the SQL Query page, move the pointer over the icon to the right of My Files and select Create File.
- The first time you go to the SQL Query page or if no SQL query file is displayed on the page, click Create SQL Query in the right-side area to create an SQL query file.
- If the welcome page or an existing SQL query file is opened, click the plus icon to the right of the name of the existing file to create a new SQL query file.
Select a data source for the SQL query file.
Select the desired workspace, the compute engine type, and the data source whose data you want to query.
Note
- You can select only the data sources on which you have query permissions. If no data source of a specific compute engine type is available, contact Workspace Administrator to add your account to the workspace as a member and assign the Data Analyst, Model Developer, Development, O&M, or Workspace Administrator role to your account.
- You can modify the resource group that is used to run tasks on the System Management page.

Execute SQL query statements

In the SQL editor, write the code that you want to use to query the data of the data source. Then, click the Run icon in the toolbar. You can view the query results in the Result section.

The SQL query feature allows you to perform the following operations:

Execute SQL statements by clicking the Run icon marked with 1 or the Run icon marked with 2 in the preceding figure. After the SQL statements are executed, you can view the execution results. You can execute SQL statements by using one of the following methods:
- Method 1: Select multiple SQL statements and click the icon to execute all the selected statements.
- Method 2: Click the Run icon to the left of an SQL statement to execute the statement.
View the code version by clicking the icon marked with 3 in the preceding figure and choosing More > Version. Then, you can view the differences between the automatically saved code and the manually saved code, and select one to save.
Estimate costs. When you execute a single SQL statement, the Estimated Costs dialog box appears and displays the estimated costs. You can also click the icon marked with 4 in the preceding figure, and select Estimate Costs from the More drop-down list to view the estimated costs before you execute all SQL statements that you write.
Note
You can estimate the costs of executing SQL statements only if the compute engine type that you select is MaxCompute.
Format code by clicking the icon marked with 5 in the preceding figure. You can click the Format icon to standardize the SQL statements that you write.

Perform operations on the query results

After the SQL statements are executed, you can perform the operations that are described in the following table on the query results.

Operation	Screenshot	Description
View the query results		You can click Export to export the query results by using one of the following methods: Local File: The query results are downloaded to your on-premises machine. You can perform the download operation without approval. Note Only DataWorks advanced editions allow you to export query results to your on-premises machine. The maximum number of SQL query result records that can be downloaded to your on-premises machine varies based on the DataWorks advanced edition. DataWorks Standard Edition: 200,000. DataWorks Professional Edition: 2,000,000. DataWorks Enterprise Edition: 5,000,000. The file size cannot exceed 1 GB. The tenant administrator, tenant security administrator, and RAM users who are assigned the Workspace Administrator role can go to the Data query and analysis control tab of Security Center to specify the upper limit for the number of SQL query result records that can be viewed, the number of SQL query result records that can be downloaded, and whether to allow users to download data. Only the MaxCompute and EMR compute engines allow you to export query results to your on-premises machine. If the project data protection feature is enabled for a MaxCompute project, the query results will fail to be downloaded. For more information about the project data protection feature, see Project data protection. MaxCompute table: The query results are saved as an online MaxCompute table. You do not need to download the query results to your on-premises machine and then upload the results. You can specify the lifecycle of a table based on your business requirements. Spreadsheet: You can synchronize the query results to a workbook and perform more operations on the query results. For more information, see Analyze data. Spreadsheet and share: You can synchronize the query results to a workbook and share the results to specific users. For more information, see Share a workbook.
Visualize the query results		You can view the query results in different types of charts, such as line charts, column charts, bar charts, and pie charts. You can also save the query results as data cards to build your own visual knowledge base. You can select a data card to quickly create data reports and create personalized products with ease based on your business requirements.
View the SQL statements that correspond to the query result		You can view the SQL statements that correspond to the query result. You can also copy and paste the SQL statements to the code editor.

Manage SQL query files

On the SQL Query page, you can perform the following operations on SQL query files. SQL查询

Item	Description
Search for a table	In the area that is marked with 1, you can enter the table name in the search box and click the icon to search for a table.
Search code	In the area that is marked with 1, you can click the icon and enter a keyword to search for code.
View the running history of SQL statements	In the area that is marked with 1, you can click the icon to view the running history of SQL statements.
Add a directory	In the area that is marked with 1, you can click the icon to add a directory. You can add the recommended directories or add tables in the data sources on which you have query permissions as a directory. After you add tables as a directory, you can view the SQL files or data tables in the directory. You can also view the schema of a table and perform simple operations to generate SQL statements for the table.
My Files	In the area that is marked with 2, the My Files directory stores the SQL query files that you saved. You can right-click a file and select Share to share the file with a specific workspace. : indicates that the file is being shared. All members of the workspace to which the file is shared can view the file in the Other People's Files directory.
Other People's Files	In the area that is marked with 2, the Other People's Files directory stores the SQL query files that are saved by other members. The files can be viewed by all members of the current workspace.
My MaxCompute tables	In the area that is marked with 2, you can click the icon to add the My MaxCompute tables directory. The My MaxCompute tables directory stores the MaxCompute tables within your account. Tables in the development and production environments are included.
My Favorites	In the area that is marked with 2, you can click the icon to add the My Favorites directory. The My Favorites directory stores the tables that are added to favorites. To add a table to your favorites, right-click the table and select Add to Favorites.
Public Tables	In the area that is marked with 2, you can click the icon to add the Public Tables directory. The Public Tables directory stores the tables that are provided by DataAnalysis. You can use the tables to experience the SQL query feature without the need to apply for required permissions.

Custom settings

You can click the icon in the lower-left corner of the SQL Query page to go to the Settings tab and customize the settings:

Theme: In this section, you can specify the theme for the SQL query.
Settings for Code Editor: In this section, you can specify the code style and code hints, and specify whether to allow the system to enter the prompted suggestions on code writing when you press Enter.
Settings for SQL-related Operations: In this section, you can specify the default mode in which each SQL statement is executed.

FAQ about the upgraded SQL query feature

How do I add a data source on the SQL Query page?

For information about how to add a data source, refer to the following animation. The upgraded SQL query feature allows you to add a data source for each SQL file.

Why is the number of available data sources reduced?

The upgraded SQL query feature supports only the data sources on which the tenant administrator or tenant security administrator can grant permissions to your account in Security Center. If you do not have query permissions on specific data sources, you can contact the administrator to grant your account the query permissions on the data sources in Security Center.
Default authorization logic of data sources:
- After you associate a MaxCompute or Hologres compute engine with a workspace in standard mode on the Workspace page, the system assigns the Data Analyst role to your account. Then, you can use the related MaxCompute or Hologres data source in the development environment on the SQL Query page.
- To use data sources in a workspace in standard mode that is not associated with a MaxCompute or Hologres compute engine or data sources in a workspace in basic mode, you must be manually authorized by the tenant administrator or tenant security administrator in Security Center.

Why do I have a large number of SQL files in the My Files directory?

The upgraded SQL query feature allows you to manage all SQL files that you created for different workspaces and data sources in a centralized manner. The SQL files that you created for different workspaces and data sources by using the original SQL query feature are displayed in the My Files directory.

Why is the All Tables directory that is provided by the original SQL query feature unavailable?

The upgraded SQL query feature allows you to add tables in the data sources on which you have query permissions as a directory. If a directory contains a large number of tables, you can search for a table by using specific conditions. For frequently used tables, you can add the tables to My Favorites and view them in My Favorites.

The System Management page no longer provides the parameters that can be configured to view and download SQL query results or the parameter that can be used to specify the maximum number of SQL query result records that can be downloaded. How do I configure the settings that are related to SQL query results?

To configure the settings for downloading, replicating, and viewing SQL query result records, perform the following operations: Go to Security Center. In the top navigation bar, click the Security policy tab. In the left-side navigation pane, click Data query and analysis control. On the Data query and analysis control tab, click Query result control. Then, find the desired policy and click Edit in the Operation column.