This topic describes how to choose a purchase plan in business scenarios where nodes are run at a scheduled time and data output is required in a timely manner.

Note
  • Shared resource groups are default resource groups.
  • The peak hours for DataWorks tenants to run nodes are from 00:00 to 09:00 each day. If you use default resource groups during the peak hours, you share resources with other tenants.
  • When tenants share resources, some tenants may preempt the resources. If your nodes must be completed in time, use exclusive resource groups to run the nodes. DataWorks does not charge you additional fees for the node instances that are run on exclusive resource groups. For more information, see Exclusive resource mode.

Scenario 1: Run nodes at a scheduled time every day

  • Description

    After the data warehouse of an enterprise is migrated to the cloud, a basic scheduling system is required to schedule hundreds of nodes, and the cost needs to be controlled.

  • Analysis

    When using big data computing engines, such as MaxCompute and Flink of Alibaba Cloud, most enterprises require a stable and robust scheduling system to schedule and run their data production nodes (code) based on the node dependency and scheduled time. If an enterprise develops the system on its own, it consumes a lot of labor and maintenance costs.

  • Purchase plan

    Required: DataWorks (pay-as-you-go). For more information, see Pay-as-you-go.

    After you purchase DataWorks (pay-as-you-go), you can use the features of DataWorks Basic Edition for free. In this case, you can use not only the basic scheduling features for nodes, but also the basic features of all DataWorks services to complete the all-in-one data development process at a low cost. For more information about the features of DataWorks services, see Feature comparison among DataWorks editions.

Scenario 2: Run a specific number of instances concurrently on a daily basis

  • Description

    A report needs to be viewed at 9 o'clock every morning due to business needs.

  • Analysis

    In a business scenario with strong demand for timeliness of data output, a descendant node must be run at the specified time after the ancestor node is run.

  • Purchase plan
    • Required: DataWorks (pay-as-you-go) and DataWorks exclusive resources for scheduling (subscription).
    • Optional: DataWorks advanced editions. You can purchase Standard Edition, Professional Edition, Enterprise Edition, or Ultimate Edition as needed.

Scenario 3: Run a specific number of instances concurrently on a daily basis, and transmit data concurrently through multiple threads

  • Description

    A report needs to be viewed at 9 o'clock every morning due to business needs. The main content includes Content Delivery Network (CDN) access logs and client device types. The raw data is stored in the Relational Database Service (RDS) databases managed by O&M engineers. The daily data increment is about 30 GB. Therefore, data synchronization is required.

  • Analysis

    Based on Scenario 2, Scenario 3 adds the timeliness requirement for a large number of sync nodes. Therefore, in addition to making sure that the sync nodes are run as scheduled, you also need to deploy fixed computing and network resources to support concurrent data transmission through multiple threads.

  • Purchase plan
    • Required: DataWorks (pay-as-you-go), DataWorks exclusive resources for scheduling, and DataWorks exclusive resources for Data Integration.

      Assume that 1,500 computing nodes and 600 data integration nodes are run every day, and different types of nodes are run in different periods. You are billed as follows:

      Computing nodes

      • Business volume to support: 1,500 instances
      • Normal running duration: 30 minutes per instance
      • Expected running period: 03:00 to 08:00, 5 hours in total
      • Billing:

        Number of instances to run simultaneously: (1,500 × 30)/(5 × 60) = 150

        Required exclusive resources for scheduling: 5 × 8c16g (calculated based on Billing standards of exclusive resources for scheduling)

      Data integration nodes

      • Business volume to support: 600 instances and two concurrent threads per instance, 1,200 threads in total
      • Normal running duration: 30 minutes per instance
      • Expected running period: 00:30 to 03:00, 2.5 hours in total
      • Billing:

        Number of instances to run simultaneously: (600 × 30)/(2.5 × 60) = 120

        Required exclusive resources for scheduling: 4 × 8c16g (calculated based on Billing standards of exclusive resources for scheduling)

        Number of threads to run simultaneously: (1,200 × 30)/(2.5 × 60) = 240

        Required exclusive resources for Data Integration: 4 × 32c64g (calculated based on Billing standards of exclusive resources for Data Integration)

      Note The preceding results are calculated based on the business volume and the expected running period. We recommend that you adjust the purchase quantity based on your actual business volume.
    • Optional: DataWorks advanced editions (subscription). You can purchase Standard Edition, Professional Edition, Enterprise Edition, or Ultimate Edition as needed.

Purchase description

  • A node that runs in Operation Center of DataWorks requires computing resources for scheduling. If the node is a data integration node, you must add scheduling resources for data transmission. Therefore, you can purchase both exclusive resources for scheduling and exclusive resources for Data Integration to guarantee the proper running of nodes.
  • DataWorks exclusive resources for Data Integration can guarantee that a sufficient number of concurrent threads for data integration nodes can start at the same time, but it cannot guarantee the synchronization rate.
  • DataWorks (pay-as-you-go) uses shared resource groups for scheduling. If you purchase it, you cannot guarantee that all nodes can be run as scheduled during peak hours. For more information, see Pay-as-you-go.
  • DataWorks Standard Edition and higher support the intelligent monitoring feature. After you configure monitoring rules, you can monitor large workflows globally and guarantee that all nodes are completed on time. For more information, see Overview.