All Products
Search
Document Center

DataWorks:Plan and configure resource groups

Last Updated:Feb 06, 2024

If you use DataService Studio provided by DataWorks to call an API, the API call consumes resources in resource groups. You must ensure the network connectivity and high performance of the resource groups. Otherwise, various issues may occur. For example, a resource group may fail to access a data source, and API call exceptions or throttling on frequent API calls may occur due to insufficient CPU or memory resources. This topic describes the precautions for planning resource groups and suggestions for using different types of resource groups.

Basic concepts

A resource group provides the computing resources that are required to initiate API calls in DataService Studio. In most cases, a resource group consists of one or more servers that provide CPU, memory, and network resources. Process of API calls: A user initiates an API call. API Gateway receives the request and forwards the request to a server of DataService Studio. Then, the request is forwarded to the destination data source for data query.

image

Types

Resource groups are classified into shared resource groups and exclusive resource groups.

Shared resource groups

Shared resource groups are shared by all users of DataWorks. Users may compete for resources during peak hours. For more information about shared resource groups, see Use a shared resource group.

Exclusive resource groups

Exclusive resource groups can be used only by the users who purchase the exclusive resource groups. If highly concurrent and frequent API calls are initiated in DataService Studio, we recommend that you use exclusive resource groups. For more information about exclusive resource groups, see Exclusive resource groups for DataService Studio. For more information about how to use exclusive resource groups for DataService Studio, see Create and use an exclusive resource group for DataService Studio.

Note

Exclusive resource groups for DataService Studio are available only in the China (Shanghai) region.

Key to resource planning: connectivity and performance

When you use resource groups, take note of the connectivity and performance of the resource groups.

  • Connectivity

    After an API call is initiated, the API call is first sent to a server of DataService Studio and then to the destination data source for data query. Make sure that the resource group that is used to process the API call can access the destination data source and the network on which the data source resides. Before you use DataService Studio, make sure that the resource group is connected to the data source. Otherwise, the API call fails.

  • Performance

    API call nodes consume CPU, memory, and network resources of the servers on which the nodes are run. Insufficient resources may lead to various issues. For example, an exception may occur during an API call, throttling may be imposed on frequent API calls, or query results may not be returned at the earliest opportunity. Before you initiate API calls, make sure that you have sufficient resources. We recommend that you use exclusive resource groups to run API call nodes. This way, the nodes do not compete for resources in the public resource pool. For information about the performance metrics of exclusive resource groups, see Billing of exclusive resource groups for DataService Studio (subscription).

Differences between resource groups and recommendations

The two types of resource groups are suitable for different scenarios. The following table describes the differences between the two types of resource groups based on resource ownership, network connectivity, billing methods, and performance. Select a resource group based on your business requirements when you create an API.

Item

Exclusive resource group

Shared resource group

Ownership of resources

The resources are maintained by DataWorks and exclusively used by each tenant.

The resources are maintained by DataWorks and shared among all tenants.

Network connectivity

Can connect to data sources that are deployed on the Internet, in Alibaba Cloud virtual private clouds (VPCs), and in data centers. For a data source that is deployed in a VPC, you can connect an exclusive resource group to the data source by using the instance ID or connection string.

Can connect to data sources that are deployed on the Internet, in Alibaba Cloud VPCs, and on the classic network. For a data source that is deployed in a VPC, you can connect a shared resource group to the data source by using only the instance ID.

Note

You cannot connect a shared resource group to a data source that is deployed on the classic network in the China South 1 Finance region.

Billing method

Subscription based on the resource group specifications

Tiered pricing based on the number of calls and call duration

Supported data sources

ClickHouse, Hologres, ApsaraDB RDS, MySQL, PostgreSQL, SQL Server, Oracle, Tablestore, AnalyticDB for MySQL 2.0, AnalyticDB for MySQL 3.0, AnalyticDB for PostgreSQL, MongoDB, DRDS, StarRocks, and Doris (More data sources will be supported in the future.)

Hologres, ApsaraDB RDS, MySQL, PostgreSQL, SQL Server, Oracle, Tablestore, AnalyticDB for MySQL 2.0, AnalyticDB for MySQL 3.0, AnalyticDB for PostgreSQL, MongoDB, and DRDS

Maximum QPS1

The queries per second (QPS) thresholds vary based on the specifications of the exclusive resource groups. The minimum QPS is 500. You can select resource groups of different specifications based on your QPS requirement.

One exclusive resource group can be associated with multiple workspaces and multiple APIs.

If the number of API calls exceeds the QPS threshold of an exclusive resource group of specific specifications, throttling is triggered and the API calls fail.

A maximum of 200 QPS is supported for each tenant in each region. To increase the QPS threshold, you can use exclusive resource groups.

If the number of API calls exceeds 200 QPS, throttling is triggered and the API calls fail.

Timeout

30s

10s

Reliability

High

Low

Security

High

High

Scenarios

Used for highly concurrent and frequent online API calls in which complex query statements are used and large amounts of data needs to be returned

Used for less concurrent or frequent API calls

Recommend rating

★★★★★

★★★

Note
  • Note1: The maximum queries per second (QPS) for exclusive resource groups is calculated based on actual business scenarios. You can estimate the QPS threshold by using the following information:

    • Whether to generate an API in script mode.

    • Whether the pagination feature is enabled for an API call so that the returned results are displayed on multiple pages.

    • The average runtime of SQL statements configured for an API call is 100 milliseconds in a data source.

    • The average size of data returned by a single API call is 3,000 bytes.

    If your business scenario is different from the preceding scenario, join the DataWorks DingTalk group to obtain the appropriate specifications that suit your business scenarios.

We recommend that you use exclusive resource groups to run API call nodes based on the preceding comparison results.

Instructions on resource group configuration

If you use a shared resource group, add the CIDR block of the vSwitch to which the shared resource group is bound in the specified region to a whitelist of the data source. For more information, see Configure network connectivity between the shared resource group for DataService Studio and a data source.

If you use an exclusive resource group, select a network connectivity solution based on the network on which the data source resides and configure a whitelist for the data source. For more information, see Configure network connectivity between an exclusive resource group for DataService Studio and a data source.