Use the Serverless Kyuubi node in DataWorks to write and schedule SQL tasks that run on EMR Serverless Spark computing resources.
Prerequisites
Before you begin, ensure that you have:
Only EMR Serverless Spark computing resources can be attached to your workspace. Ensure that network connectivity is available between the resource group and the computing resource.
A Serverless resource group (required to run this node type)
(Optional) If you are a Resource Access Management (RAM) user, confirm that you have been added to the workspace and assigned the Developer or Workspace Administrator role. The Workspace Administrator role carries extensive permissions — grant it with caution. For instructions, see Add members to a workspace.
If you use an Alibaba Cloud account, the RAM user step does not apply.
Create a node
For instructions, see Create a node.
Developer Node
Write the task SQL in the editor. To pass dynamic values at runtime, define variables using the ${variable_name} format, then assign values in the Scheduling Parameters section of the Scheduling Configuration panel. For more information, see Sources and expressions of scheduling parameters.
SHOW TABLES;
SELECT * FROM kyuubi040702 WHERE age >= '${a}'; -- Use scheduling parameters to substitute this value at runtime.The maximum size of a single SQL statement is 130 KB.
Debug the node
In the Debug Configuration section, configure the following parameters.
Parameter Description Computing resource Select an attached EMR Serverless Spark computing resource. If no resources appear, select Create Computing Resource from the drop-down list to attach one. Resource group Select a resource group attached to the workspace. Script parameters Assign values to the variables defined in your SQL. Configure the Parameter Name and Parameter Value for each variable — the task substitutes the actual value at runtime. For more information, see Sources and expressions of scheduling parameters. ServerlessSpark Node Parameters Native Spark properties. Use the format spark.eventLog.enabled : false. For available properties, see Open source Spark properties and List of custom Spark Conf parameters. DataWorks also supports workspace-level global Spark parameters — you can control whether they take precedence over node-level parameters. See Set global Spark parameters.In the toolbar, click Run to execute the task.
Before publishing, sync the ServerlessSpark Node Parameters in Debug Configuration with those in Scheduling Configuration.
What's next
Schedule the node: Configure the Scheduling Policy and related properties in the Scheduling pane to run the node on a recurring schedule.
Publish the node: Click the
icon to publish. Periodic scheduling only takes effect after the node is published to the production environment.Node O&M: After publishing, monitor auto-triggered task runs in Operation Center. See Get started with Operation Center.