The data catalog is your unified metadata workbench in DataAnalysis. It integrates metadata from MaxCompute, Hologres, and Data Lake Formation (DLF), letting you create tables, manage views, and generate query SQL without leaving DataWorks.
Access the data catalog
Log on to DataWorks DataAnalysis, switch to the target region, and click Enter Data Analysis.
-
If you see Go To New DataAnalysis in the navigation bar, click it to switch to the new DataAnalysis page.
-
If you see Return To Legacy DataAnalysis in the navigation bar, you are already on the new DataAnalysis page.
Add a data catalog
The steps differ depending on whether you are a new or existing DataAnalysis user.
Add a data catalog as a new user
-
In the data catalog, find the data source type to add. Click the
icon next to the data source type name to open the Add Data Catalog page. -
Find the instance or data source to add. In the Operation column, click Add.
To remove an instance or data source you no longer need, use the same Operation column.
Add a data catalog as an existing user
Click the
icon in the upper-right corner of the catalog, then choose a catalog type:
| Catalog type | What it adds |
|---|---|
| DataMap - Metadata | Table metadata collected in Data Map. Each data source or computing resource is added as one dataset. |
| DataMap - Data Album | Data albums from Data Map, which group tables by subject. Each data album is added as one dataset. |
| My Favorites | Tables you have favorited in the data catalog. |
| My MaxCompute tables | All MaxCompute tables owned by the current logon account. |
| Public Tables | Public datasets provided by DataWorks, for use with EMR Spark SQL, MaxCompute, and Hologres. |
The maximum number of datasets is 12. Remove an existing dataset before adding a new one if you have reached this limit.
Manage a data catalog
The following applies only to the data catalog for new users.
Use the data catalog to manage data objects including tables, views, external tables, resources, and functions. Instructions vary by engine:
| Engine | Reference |
|---|---|
| MaxCompute | Manage a MaxCompute data catalog |
| Hologres | Manage a Hologres data catalog |
| DLF | Manage a DLF Catalog data catalog |
Generate query SQL
You can quickly generate a query SQL statement based on a data table, and then configure and execute the statement. For more information, see Create an SQL query.
-
In the data catalog, find the data source to query. Click the
icon next to the data source, then find the table in the Table list. -
Right-click the table and select Generate SQL Statement. The generated SQL script opens in the SQL editing page.
-
Adjust the SQL script as needed.
-
Click Run Configuration in the right-side panel. Set parameters such as Computing Resource and Script Parameters, then run the query.
Appendix
MaxCompute authentication and authorization
If you use a Resource Access Management (RAM) user or a RAM role to view MaxCompute data in the data catalog, grant the required MaxCompute permissions first. If the Layer 3 model is enabled for the MaxCompute data source or project, also grant permission to view schema metadata.
If a MaxCompute project contains multiple schemas, grant metadata permissions for all schemas to display the complete schema list on the project details page in the data catalog.
Grant permissions to a RAM user:
GRANT DESCRIBE ON SCHEMA <Schema_Name> TO USER RAM$<Alibaba_Cloud_Account_Name>:<RAM_User_Name>;
Grant permissions to a RAM role:
GRANT DESCRIBE ON SCHEMA <Schema_Name> TO USER `RAM$<Alibaba_Cloud_Account_Name>:role/<RAM_Role_Name>`;