The ADB Spark SQL node in DataWorks lets you develop, periodically schedule, and integrate AnalyticDB for MySQL Spark SQL tasks with other jobs. This topic explains how to develop a task with an ADB Spark SQL node.
Background
AnalyticDB for MySQL Serverless Spark is a managed big data analytics and compute service built on Apache Spark. It supports submitting Spark SQL statements directly in the AnalyticDB for MySQL console, simplifying data analysis with Spark for data developers. After you bind AnalyticDB for MySQL Serverless Spark as a compute resource in DataWorks, you can use an ADB Spark SQL node to develop and run Spark SQL tasks. For more information, see Overview.
Prerequisites
AnalyticDB for MySQL prerequisites:
You have created an AnalyticDB for MySQL AnalyticDB for MySQLBasic Edition cluster in the same
regionas yourworkspace. For more information, see Create a cluster.You have created an
interactive resource groupin the AnalyticDB for MySQLcluster with itsenginetype set to Spark to run Spark SQL tasks in DataWorks. For more information, see Create an interactive resource group.If you plan to use
Object Storage Service (OSS)in an ADB Spark SQL node, ensure that the OSS bucket and theclusterare in the sameregion.
DataWorks prerequisites:
You have a
workspacewith a boundresource group.The
resource groupis bound to the sameVirtual Private Cloud (VPC)as the AnalyticDB for MySQLcluster. AnalyticDB for MySQLThe IP address of theresource groupis added to the whitelist of thecluster. For more information, see Configure a whitelist.You have added the AnalyticDB for MySQLcluster instance to DataWorks as an
AnalyticDB for Sparkcompute resourceand tested the connectivity through theresource group. For more information, see Associate a computing resource.You have created an
ADB Spark SQL node. For more information, see Create a node for a scheduled workflow.
Step 1: Develop the ADB Spark SQL node
Create an external database.
Develop SQL code in the ADB Spark SQL node editor. The following example creates an external database. To create a managed table, see Use Spark SQL to create an internal table.
CREATE DATABASE IF NOT EXISTS `adb_spark_db` location 'oss://dw-1127/db_home';Develop the task.
Develop the task code in the SQL editor. You can define variables in the format ${variable_name} and assign values to them in the
scheduling parameterssection on thePropertiestab. This enables dynamic parameter passing for scheduled jobs. For more information aboutscheduling parameters, see Sources and expressions of scheduling parameters. The following is an example:CREATE TABLE IF NOT EXISTS adb_spark_db.tb_order_${var}(id int, name string, age int) USING parquet location 'oss://dw-1127/db_home/tb1' tblproperties ('parquet.compress'='SNAPPY'); CREATE TABLE IF NOT EXISTS adb_spark_db.tb_order_result_${var}(id int, name string, age int) USING parquet location 'oss://dw-1127/db_home/tb2' tblproperties ('parquet.compress'='SNAPPY'); INSERT INTO adb_spark_db.tb_order_result_${var} SELECT * FROM adb_spark_db.tb_order_${var};NoteIn this example, you can set the
${var}variable to$[yyyymmdd]for daily incremental data processing.
Step 2: Debug the ADB Spark SQL node
Configure the debug properties for the ADB Spark SQL node.
On the
Propertiestab in the right pane, configure parameters such asCompute Engine,ADB Compute Resource Group,Resource Group, andCompute CUs. For more information, see the following table.Parameter type
Parameter
Description
Compute resource
Compute engine
Select the
AnalyticDB for Sparkcompute resourcethat you bound.ADB Compute Resource Group
Select the
interactive resource groupthat you created in the AnalyticDB for MySQLcluster. For more information, see Create and manage a resource group.NoteThe engine of the
interactive resource groupmust be Spark.Resource group
Resource group
Select the
resource groupthat passed the connectivity test when you bound theAnalyticDB for Sparkcompute resource.Compute CUs
The node uses the default CU value. You can leave this parameter unchanged.
Debug and run the ADB Spark SQL node.
To run the task, click Save and then Run.
Step 3: Schedule the ADB Spark SQL node
Configure the scheduling properties for the ADB Spark SQL node.
To run the task periodically, configure the following parameters in the Scheduling Policy section on the Properties tab. For more information about other parameters, see Node scheduling configuration.
Parameter
Description
Compute engine
Select the
AnalyticDB for Sparkcompute resourcethat you bound.ADB Compute Resource Group
Select the
interactive resource groupthat you created in the AnalyticDB for MySQLcluster. For more information, see Create and manage a resource group.NoteThe engine of the
interactive resource groupmust be Spark.Resource group
Select the
resource groupthat passed the connectivity test when you bound theAnalyticDB for Sparkcompute resource.Compute CUs
The node uses the default CU value. You can leave this parameter unchanged.
Deploy the ADB Spark SQL node.
After configuring the node, deploy it. For more information, see Node and workflow deployment.
Next steps
After the task is deployed, you can view its status in Operation Center. For more information, see Getting started with Operation Center.