All Products
Search
Document Center

AnalyticDB:XIHE BSP SQL development

Last Updated:May 07, 2024

AnalyticDB for MySQL Data Lakehouse Edition (V3.0) allows you to submit XIHE bulk synchronous parallel (BSP) SQL jobs by using the SQL editor or Java Database Connectivity (JDBC). This topic describes the applicable scenarios, submission methods, and configuration parameters of XIHE BSP SQL development, and provides answers to some frequently asked questions.

Prerequisites

  • A job resource group is created in the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. For more information, see Create a resource group.

  • A database account is created for the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster.

Scenarios

XIHE BSP SQL jobs are executed by using the XIHE BSP engine. XIHE BSP SQL development is suitable for extract-transform-load (ETL) jobs, large queries, and burst low-priority queries. For information about the XIHE BSP engine, see the "Compute engines" section of the Functions and features topic.

ETL jobs

The following figure shows a typical ETL process.

image

In most cases, operations such as data cleansing and data transforming that are performed during an ETL process from a data source to the Application Data Service (ADS) layer involve large amounts of data and requires an extended period of time to complete. For ETL jobs, response speeds are not a major concern, but the system must provide capabilities like automatic retry to ensure reliability. The XIHE BSP engine is an ideal choice due to its high throughput, reliability, and cost-efficiency.

After an ETL job is complete, data applications can query data from the ADS layer, which may require a second-level or millisecond-level response time. In this case, the faster XIHE massively parallel processing (MPP) engine is more suitable.

Large queries

In XIHE MPP mode, large queries may encounter out-of-memory (OOM) errors or query exceptions. Scaling up resources can resolve these issues but is not cost-effective. In this case, you can use the XIHE BSP engine to perform large queries in a job resource group. It provides higher reliability by caching intermediate results of queries in disks, and enables better cost-efficiency with on-demand resource requests and billing.

Burst low-priority queries

Low-priority queries do not need to be returned at a fast speed. However, when low-priority queries are abruptly submitted in large numbers, the system may not have enough resources for the queries. As a result, the execution of other queries may be affected. To resolve these issues, you can use the XIHE BSP engine to perform these queries in a job resource group.

Limits

  • You cannot write Hudi tables in XIHE BSP mode.

  • You cannot read and write Delta tables in XIHE BSP mode.

Develop a XIHE BSP job

You can use one of the following methods to develop a XIHE BSP job:

Use the SQL editor to submit a XIHE BSP job

On the SQL Development page, you can select a job resource group and the XIHE engine to submit a XIHE BSP job.

Procedure

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select a job resource group and the XIHE engine.

  4. Enter an SQL statement and click Execute.

  5. After the SQL statement is executed, view the execution result on the Execution Results tab.

    On the Execution Records tab, you can click Result in the Actions column corresponding to an SQL statement to download its execution result.

Use JDBC or a MySQL client to synchronously submit a XIHE BSP job

You can add a hint and specify a job resource group to synchronously submit a XIHE BSP job.

Syntax

/*+ resource_group=<resource_group_name>*/ <SQL Statement>;
  • resource_group_name: the name of the job resource group.

  • SQL Statement: the SQL statement. You must add a hint to each SQL statement that you want to execute.

Example

/*+ resource_group=bsptest*/SELECT count(*) from test_db.ods_hudi;

Use JDBC or a MySQL client to asynchronously submit a XIHE BSP job

You can add a hint, specify a job resource group, and set the submission type to asynchronous to asynchronously submit a XIHE BSP job.

Syntax

/*+ resource_group=<resource_group_name>, query_submission_type=async*/ <SQL Statement>;
  • resource_group_name: the name of the job resource group.

  • query_submission_type=async: sets the submission type to asynchronous.

  • SQL Statement: the SQL statement. You must add a hint to each SQL statement that you want to execute.

Example

/*+ resource_group=bsptest, query_submission_type=async*/SELECT count(*) from test_db.ods_hudi;

After you submit an asynchronous job, a job ID is returned. After the job is complete, you can execute the SHOW job result WHERE job='Job_id'; statement to query the execution result of the SQL statement. For information about how to query the status of an asynchronous job, see the "Query the status of an asynchronous job" section of the Asynchronously submit an import job topic.

Configure a XIHE BSP job

You can configure parameters for XIHE BSP jobs, such as the amount of resources, timeout period, and priority.

Configuration methods

The BSP job configuration can take effect for a single job, all jobs within a single job resource group, or all jobs within a cluster.

Effective for a single job

To make the BSP job configuration effective for a single job, add the /*+ resource_group=<resource_group_name>,<config_name>*/ hint to an SQL statement.

In the hint, resource_group_name specifies the name of the resource group. For more information about config_name, see the Parameter column in the "Configuration parameters" section of this topic.

Example: Configure up to 20 AnalyticDB compute units (ACUs) of available resources for a job that is run in the job resource group bsptest.

/*+ resource_group=bsptest,elastic_job_max_acu=20*/SELECT count(*) from test_db.ods_hudi;

Effective within a job resource group

To make the BSP job configuration effective for all jobs within a job resource group, execute the SET adb_config <resource_group_name>.<config_name> statement.

In the hint, resource_group_name specifies the name of the resource group. For more information about config_name, see the Parameter column in the "Configuration parameters" section of this topic.

Example: Configure up to 20 ACUs of available resources for each job that is run in the job resource group bsptest.

SET adb_config bsptest.elastic_job_max_acu=20;

Check whether the configuration takes effect

To check whether the configuration takes effect for all jobs within the resource group, execute the SHOW ADB_CONFIG KEY=<resource_group_name>.<config_name> statement.

Effective within a cluster

To make the BSP job configuration effective for all jobs within a cluster, execute the SET adb_config <config_name> statement. For more information about config_name, see the Parameter column in the "Configuration parameters" section of this topic.

Example: Configure up to 20 ACUs of available resources for each job that is run within a cluster.

SET adb_config elastic_job_max_acu=20;

Check whether the configuration takes effect

To check whether the configuration takes effect for all jobs within the cluster, execute the SHOW ADB_CONFIG KEY=<config_name> statement.

Configuration parameters

The following table describes the parameters that you can configure for XIHE BSP jobs.

Category

Parameter

Description

Default value

Resources

elastic_job_max_acu

The maximum number of ACUs that can be used per XIHE BSP job for AppMaster nodes and compute nodes.

The value of this parameter cannot exceed the number of ACUs that are specified for the maximum computing resources of the resource group.

Note

AppMaster nodes are responsible for parsing, scheduling, and executing queries.

9

Timeout period

batch_query_timeout

The timeout period of the BSP job. Unit: milliseconds. If the BSP job is executed for a period of time that is longer than the value of this parameter, the job is automatically canceled.

7200000

Priority

query_priority

The priority of the BSP job.

Valid values: HIGH, NORMAL, LOW, and LOWEST.

For information about priority queues, see Priority queues of job resource groups.

NORMAL

FAQ

How do I check the status of a BSP job?

  • If you submit a BSP job by using the SQL editor, you can choose Job Development > SQL Development and click the Execution Records tab to view the status of the job.

  • If you submit a BSP job without using the SQL editor, you can execute the following statement to query the status of the job from the information_schema.kepler_meta_elastic_job_list table:

    SELECT status FROM information_schema.kepler_meta_elastic_job_list WHERE process_id='<job_id>';
    Note

    The information_schema.kepler_meta_elastic_job_list table stores up to 1,000 BSP jobs that are submitted within the last 30 days. You can perform further statistics and analysis such as GROUP BY aggregation on the table. The following sample statement shows how to query the number of BSP jobs in each state:

    SELECT status,count(*) FROM information_schema.kepler_meta_elastic_job_list GROUP BY status;

How do I determine whether to submit a BSP job synchronously or asynchronously?

The only difference between synchronous submission and asynchronous submission is whether the client needs to wait for the query execution to complete.

Asynchronous submission has the following limits:

  • A result set can contain up to 10,000 rows of data.

  • The system can retain up to 1,000 result sets, including the download URLs of CSV files, for up to 30 days.

We recommend that you submit BSP jobs asynchronously for queries that consume large amounts of computing power, require an extended period of time to complete, but return small result sets. Examples: INSERT INTO SELECT, INSERT OVERWRITE SELECT, and CREATE TABLE AS SELECT.