When a data pipeline spans multiple systems, you need a single scheduler to coordinate all jobs -- including DataWorks tasks -- on one timeline. SchedulerX integrates with DataWorks so that you can periodically schedule DataWorks jobs and orchestrate them alongside other SchedulerX jobs in a unified workflow. For example, you can trigger a DataWorks job only after an upstream SchedulerX task completes.
Prerequisites
DataWorks Enterprise Edition activated in the DataWorks console
An AccessKey pair (AccessKey ID and AccessKey secret) for an Alibaba Cloud account that has permissions to operate DataWorks Enterprise Edition
Connect SchedulerX to DataWorks
Connect SchedulerX to DataWorks through one of the following methods:
| Method | When to use |
|---|---|
| SDK | You already run a Spring Boot application with the SchedulerX SDK |
| Agent | You need a standalone process, or your workload runs as a script or HTTP job |
SDK
Upgrade the SchedulerX SDK client to V1.3.4 or later, then add the following properties to your Spring Boot startup configuration:
# AccessKey pair for DataWorks access
spring.schedulerx2.aliyunAccessKey=<your-access-key-id>
spring.schedulerx2.aliyunSecretKey=<your-access-key-secret>| Placeholder | Description |
|---|---|
<your-access-key-id> | AccessKey ID of your Alibaba Cloud account |
<your-access-key-secret> | AccessKey secret of your Alibaba Cloud account |
Agent
Deploy the SchedulerX agent by following the instructions in Use the SchedulerX agent to connect an application to SchedulerX (Script or HTTP jobs). You can deploy the agent from an installation package or a Docker image.
Installation package
Download the agent installation package.
Add the following AccessKey pair to the
agent.propertiesfile:Placeholder Description <your-access-key-id>AccessKey ID of your Alibaba Cloud account <your-access-key-secret>AccessKey secret of your Alibaba Cloud account # AccessKey pair for DataWorks access aliyunAccessKey=<your-access-key-id> aliyunSecretKey=<your-access-key-secret>
Docker image
Select a Docker image based on your network type and CPU architecture.
Network type x86_64 arm64 Internet registry.cn-hangzhou.aliyuncs.com/schedulerx/agent:1.10.13-dataworks-amd64registry.cn-hangzhou.aliyuncs.com/schedulerx/agent:1.10.13-dataworks-arm64VPC in China (Hangzhou) registry-vpc.cn-hangzhou.aliyuncs.com/schedulerx/agent:1.10.13-dataworks-amd64registry-vpc.cn-hangzhou.aliyuncs.com/schedulerx/agent:1.10.13-dataworks-arm64Set the
ALIYUN_ACCESS_KEYandALIYUN_SECRET_KEYenvironment variables when you start the Docker container.
Create and schedule a DataWorks job
Creating a scheduled DataWorks job requires work in two consoles: prepare the workflow in DataWorks, then build the orchestration in SchedulerX.
Step 1: Prepare the workflow in DataWorks
Create a manually triggered workflow. See Create a manually triggered workflow.
Create a node without configuring dependencies. See Create nodes and configure node dependencies.
Commit the workflow. See Commit a workflow.
Step 2: Build the orchestration in SchedulerX
Create a workflow and add DataWorks nodes to it. See Create a workflow.
Define dependencies between jobs by dragging the connection points from one job to another.
Configure scheduled triggering for the workflow. See Cron.
Verify the workflow
After the workflow triggers, open the workflow instance list to check the progress of the workflow and the status of each node. To view execution details for a specific node, right-click the node.
What to do next
Monitor job execution: In the SchedulerX job instance list, view execution details, terminate a job, or rerun a job.
Track DataWorks node instances: In DataWorks Operation Center, query information about the DataWorks node instances scheduled through SchedulerX.