To use instances such as MaxCompute and Hologres for Data Studio in DataWorks, you must first attach them as computing resources. This topic describes how to create and manage computing resources to establish a foundation for task development and scheduling.
Relationship between computing resources and data sources
DataWorks supports attaching and using various computing resources. After you attach a computing resource, you can perform complex data processing and develop scheduled tasks directly in DataWorks Data Studio. When you attach most types of computing resources to DataWorks, a data source with the same name is automatically created. You can use this data source in the Data Integration module to perform operations such as data synchronization. The differences between a compute engine and a data source are as follows:
A computing resource is an instance of a compute engine that is used to execute data processing and analysis tasks.
A data source is used to connect to different data storage services to store and manage data.
Supported computing resources
In DataWorks, you can attach the following computing resources for Data Studio.
Category | Computing resource type | Instructions for attaching computing resources | Data Studio (new version) | DataStudio (legacy version) |
Offline computing | ||||
Real-time query | ||||
Real-time computing | ||||
Multimodal search | ||||
Cluster management | ||||
When you attach a MaxCompute, AnalyticDB for MySQL, AnalyticDB for PostgreSQL, AnalyticDB for Spark, ClickHouse, Hologres, Lindorm, EMR Serverless StarRocks, or OpenSearch computing resource, a data source with the same name is created in the current workspace.
Permissions
Only workspace members with the O&M or Administrator role and members with the AliyunDataWorksFullAccess or AdministratorAccess access policy can create computing resources. For more information, see Control permissions on modules in a workspace and Grant permissions to a RAM user.
In addition to the preceding permissions, other access controls may apply when you create certain computing resources. Grant permissions as prompted on the interface.
Attach a computing resource
You can attach a computing resource from different entry points depending on whether your workspace is in the public preview of Data Studio.
Attach a computing resource in a workspace in public preview
Log on to the DataWorks console. Switch to the destination region. In the navigation pane on the left, choose . Find the workspace and click Go To Management Center.
In the navigation pane on the left, click Computing Resources to go to the Computing Resources page. Follow the instructions in the corresponding document based on the type of computing resource that you want to attach.
Data Studio (new version): Attach a MaxCompute computing resource
Data Studio (new version): Attach an AnalyticDB for MySQL (V3.0) computing resource
Data Studio (new version): Attach an AnalyticDB for PostgreSQL computing resource
Data Studio (new version): Attach a ClickHouse computing resource
Data Studio (new version): Attach a Hologres computing resource
Attach a computing resource in a workspace not in public preview
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the navigation pane on the left, click the
icon to go to the Computing Resources page. Follow the instructions in the corresponding document based on the type of computing resource that you want to attach.Resource Management: Click Create Computing Resource in the upper-right corner to create a computing resource.
DataStudio (legacy version): Attach a MaxCompute computing resource
DataStudio (legacy version): Attach an AnalyticDB for MySQL (V3.0) computing resource
DataStudio (legacy version): Attach an AnalyticDB for PostgreSQL computing resource
DataStudio (legacy version): Attach a ClickHouse computing resource
DataStudio (legacy version): Attach a Hologres computing resource
Cluster Management: Click Create Cluster in the upper-right corner of the Computing Resources page to create a compute engine cluster.
Cluster management
Supported cluster versions/types
References for attaching clusters
Attach a CDH/CDP cluster
DataWorks provides CDH 5.16.2, CDH 6.1.1, CDH 6.2.1, CDH 6.3.2, and CDP 7.1.7. You can select one of these versions. The component versions for these cluster versions are fixed. For more information, see Cluster connection information. If these cluster versions do not meet your business requirements, you can select Custom Version.
DataStudio (legacy version): Attach a CDH computing resource
Attach an EMR cluster
Supported EMR cluster types: DataLake cluster (new data lake): EMR on ECS, Custom cluster: EMR on ECS, Hadoop cluster (old data lake): EMR on ECS, Spark cluster: EMR on ACK, and EMR Serverless Spark cluster.
ImportantYou can use the following EMR versions of Hadoop clusters (old data lake) in DataWorks:
EMR-3.38.2, EMR-3.38.3, EMR-4.9.0, EMR-5.6.0, EMR-3.26.3, EMR-3.27.2, EMR-3.29.0, EMR-3.32.0, EMR-3.35.0, EMR-4.3.0, EMR-4.4.1, EMR-4.5.0, EMR-4.5.1, EMR-4.6.0, EMR-4.8.0, EMR-5.2.1, and EMR-5.4.3
Hadoop clusters (old data lake) are no longer recommended. Migrate to DataLake clusters as soon as possible. For more information, see Migrate a Hadoop cluster to a DataLake cluster.
DataStudio (legacy version): Attach an EMR computing resource
Detach a computing resource
Use caution when you detach a computing resource. This operation also deletes the associated data source that has the same name. This may affect tasks that reference this computing resource or data source in multiple modules, such as Data Integration, Operation Center, DataAnalysis, DataService Studio API, and Data Quality. To ensure that your business runs as expected, read the prompts on the interface carefully before you detach the resource. Also, migrate all tasks from the computing resource to another one.
You can detach a computing resource as needed. On the Computing Resources page, find the computing resource that you want to detach and click Detach in the Actions column.
Appendix: Task execution environments
In a standard mode workspace, a computing resource instance has two environment configurations: development and production. You can specify a different database or instance for each environment. The system automatically maps and accesses the appropriate computing resource based on the runtime environment. This isolates development and testing activities from production scheduling. For example, when you execute an offline sync task, the development environment automatically accesses the preconfigured development database, and the production schedule accesses the production database.
A basic mode workspace has only one environment and cannot isolate development from production. For more information, see Comparison of basic mode and standard mode.
If you upgrade a basic mode workspace to standard mode, the original computing resource is split into two separate computing resources: one for the development environment and one for the production environment. Workspaces in the public preview of DataStudio do not support upgrades. For more information, see Upgrade a workspace mode.