All Products
Search
Document Center

AnalyticDB:Read data from Lindorm using Spark SQL

Last Updated:Mar 30, 2026

AnalyticDB for MySQL allows you to use Spark SQL to read data from Lindorm Hive tables and wide tables by routing data through OSS external tables.

Prerequisites

Before you begin, make sure you have:

How it works

Both access paths follow the same three-stage flow:

  1. Configure Spark session settings — enable Elastic Network Interface (ENI), set network parameters, and configure the Lindorm catalog.

  2. Create an OSS external table as an intermediate destination.

  3. Insert Lindorm data into the external table, then query it with SELECT.

The two paths differ in how they connect to Lindorm:

Hive table

Wide table

Catalog class

org.apache.spark.sql.hive.V2HiveExternalCatalog

com.alibaba.lindorm.ldspark.datasources.v2.LindormCatalog

Connection method

Hive Metastore URI + Hadoop Distributed File System (HDFS) NameNode addresses

VPC endpoint for HBase Java API

Extra setup

Get hdfs-site configuration from Lindorm technical support

Download and upload a JAR package to OSS

Preparations

Before running either access path, collect the following from the Lindorm console:

  1. Log in to the Lindorm console and go to Instance Details. Note the vSwitch ID of your Lindorm instance.

  2. Go to Access Control > Security Groups tab. Note the security group ID of your Lindorm instance. To add a security group, see Add a security group.

Read data from a Hive table

Before you start

Contact Lindorm technical support (DingTalk ID: s0s3eg3) to get the hdfs-site configuration for your Lindorm instance. Provide your instance ID. The configuration contains the Hive Metastore URI and HDFS NameNode addresses used in the SQL below.

Steps

  1. Log in to the AnalyticDB for MySQL console. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, click your cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select the Spark engine and your job resource group.

  4. Enter and run the following SQL. It sets up the Lindorm Hive catalog over ENI, creates an OSS external table, copies Hive table data into it, and reads the result. Replace the placeholders with values from your preparations and the hdfs-site configuration:

    SET spark.adb.connectors = oss,external_hive;
    SET spark.adb.eni.enabled=true;
    SET spark.adb.eni.vswitchId=vsw-bp14pj8h0k5p0kwu3****;
    SET spark.adb.eni.securityGroupId=sg-bp11m93k021tp4ca****;
    SET spark.sql.catalog.lindorm_catalog = org.apache.spark.sql.hive.V2HiveExternalCatalog;
    SET spark.adb.eni.adbHostAlias.enabled=true;
    SET spark.sql.catalogImplementation = hive;
    
    -- HDFS-related parameters from hdfs-site configuration
    SET spark.sql.catalog.lindorm_catalog.spark.hadoop.hive.metastore.uris=thrift://ld-bp1ttz8833x9c****-proxy-ldps-hms.lindorm.aliyuncs.com:9083;
    SET spark.hadoop.dfs.nameservices=ld-bp1ttz8833x9c****;
    SET spark.hadoop.dfs.client.failover.proxy.provider.ld-bp1ttz8833x9c****=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider;
    SET spark.hadoop.dfs.ha.automatic-failover.enabled=true;
    SET spark.hadoop.dfs.ha.namenodes.ld-bp1ttz8833x9c****=nn1,nn2;
    SET spark.hadoop.dfs.namenode.rpc-address.ld-bp1ttz8833x9c****.nn1=ld-bp1ttz8833x9c-master1-001.lindorm.rds.aliyuncs.com:8020;
    SET spark.hadoop.dfs.namenode.rpc-address.ld-bp1ttz8833x9c****.nn2=ld-bp1ttz8833x9c****-master2-001.lindorm.rds.aliyuncs.com:8020;
    SET spark.hadoop.dfs.namenode.http-address.ld-bp1ttz8833x9c****.nn1=ld-bp1ttz8833x9c****-master1-001.lindorm.rds.aliyuncs.com:50071;
    SET spark.hadoop.dfs.namenode.http-address.ld-bp1ttz8833x9c****.nn2=ld-bp1ttz8833x9c****-master2-001.lindorm.rds.aliyuncs.com:50071;
    
    -- Create an external database and table in OSS
    CREATE DATABASE external_oss_db location 'oss://testBucketName/warehouse/db';
    CREATE TABLE external_oss_db.test(id int, name string) using parquet location 'oss://testBucketName/warehouse/db/test';
    
    -- Copy data from the Lindorm Hive table into the OSS external table
    INSERT INTO external_oss_db.test SELECT * FROM lindorm_catalog.spark_test.test;
    
    -- Query the data
    SELECT * FROM external_oss_db.test;

    The following table describes the key parameters.

    Parameter

    Required

    Description

    spark.adb.connectors

    Yes

    Connectors to enable. Set to oss,external_hive for this use case. Valid values: oss, hudi, delta, adb, odps, external_hive, jindo.

    spark.adb.eni.enabled

    Yes

    Enables ENI for connecting to external data sources. Set to true.

    spark.adb.eni.vswitchId

    Yes

    The vSwitch ID noted during preparations.

    spark.adb.eni.securityGroupId

    Yes

    The security group ID noted during preparations.

    spark.sql.catalog.lindorm_catalog

    Yes

    Configures the Hive catalog. Set to org.apache.spark.sql.hive.V2HiveExternalCatalog. You can replace lindorm_catalog with a custom catalog name.

    spark.adb.eni.adbHostAlias.enabled

    Yes

    Writes domain name resolution mappings required by AnalyticDB for MySQL. If you use an ENI to read data from or write data to EMR Hive, you must set this parameter to true.

    spark.sql.catalogImplementation

    Yes

    Set to hive.

    HDFS-related parameters

    Yes

    From the hdfs-site configuration. Format: SET spark.hadoop.<name>=<value>.

    location

    Yes

    OSS path of the external table. Format: oss://<bucket-name>/<database>/<table>.

  5. Click Execute.

  6. After the Spark job completes, go to Spark JAR Development. On the Applications tab, click Logs in the Actions column to view the table data. For details, see Spark editor.

Read data from a wide table

Before you start

  1. Get the VPC endpoint of LindormTable for access by using the HBase Java API. See the View the endpoints of LindormTable section.

  2. Download the required JAR package, then extract and upload it to your OSS bucket. See the Upload an object section.

Steps

  1. Log in to the AnalyticDB for MySQL console. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, click your cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select the Spark engine and your job resource group.

  4. Enter and run the following SQL. It configures the LindormTable catalog over ENI using the HBase Java API endpoint, loads the JAR package, creates an OSS external table, copies wide table data into it, and reads the result. Replace the placeholders with your actual values:

    SET spark.adb.eni.enabled=true;
    SET spark.adb.eni.vswitchId=vsw-bp14pj8h0k5p0kwu3****;
    SET spark.adb.eni.securityGroupId=sg-bp11m93k021tp4ca****;
    SET spark.sql.catalog.lindorm_table = com.alibaba.lindorm.ldspark.datasources.v2.LindormCatalog;
    SET spark.sql.catalog.lindorm_table.url=ld-bp1ttz8833x9c****-proxy-lindorm.lindorm.rds.aliyuncs.com:30020;
    SET spark.sql.catalog.lindorm_table.username=test;
    SET spark.sql.catalog.lindorm_table.password=password;
    add jar "oss://<bucketname>/hbase_jars/*";
    
    -- Create an external database and table in OSS
    CREATE DATABASE IF NOT exists external_oss_db location 'oss://testBucketName/warehouse/db';
    CREATE TABLE IF NOT exists external_oss_db.lindorm_table_2(id string, name string) using parquet location 'oss://testBucketName/warehouse/db/lindorm_table_2';
    
    -- Copy data from the Lindorm wide table into the OSS external table
    INSERT INTO external_oss_db.lindorm_table_2 SELECT * FROM lindorm_table.spark_test.tb;
    
    -- Query the data
    SELECT * FROM external_oss_db.lindorm_table_2;

    The following table describes the key parameters.

    Parameter

    Required

    Description

    spark.adb.eni.enabled

    Yes

    Enables ENI for connecting to external data sources. Set to true.

    spark.adb.eni.vswitchId

    Yes

    The vSwitch ID noted during preparations.

    spark.adb.eni.securityGroupId

    Yes

    The security group ID noted during preparations.

    spark.sql.catalog.lindorm_table

    Yes

    Configures the LindormTable catalog. Set to com.alibaba.lindorm.ldspark.datasources.v2.LindormCatalog. You can replace lindorm_table with a custom catalog name.

    spark.sql.catalog.lindorm_table.url

    Yes

    The VPC endpoint for HBase Java API access obtained in step 1 of "Before you start".

    spark.sql.catalog.lindorm_table.username

    Yes

    The LindormTable username.

    spark.sql.catalog.lindorm_table.password

    Yes

    The LindormTable password. To reset the password, go to the LindormTable cluster management system.

    jar

    Yes

    The OSS path of the JAR package uploaded in step 2 of "Before you start". Format: oss://<bucket-name>/hbase_jars/*.

    location

    Yes

    OSS path of the external table. Format: oss://<bucket-name>/<database>/<table>.

  5. Click Execute.

  6. After the Spark job completes, go to Spark JAR Development. On the Applications tab, click Logs in the Actions column to view the table data. For details, see Spark editor.

AnalyticDB for MySQL supports two methods to access data in Lindorm wide tables: using Java Database Connectivity (JDBC) and using the Spark Datasource V2 interface. The differences between the two methods are as follows:

  • The JDBC method supports accessing only the default database in a Lindorm wide table.

  • The Spark Datasource V2 interface method supports accessing any database in a wide table.

  • The Lindorm team reports no significant difference in efficiency between the two methods because the Spark Datasource V2 interface also concatenates SQL statements at the underlying layer.

Access wide table data using JDBC

  1. Obtain the virtual private cloud (VPC) endpoint for LindormTable SQL. For more information, see View endpoints.

  2. Download the Lindorm JDBC Driver dependency and upload the JAR package to OSS. For more information, see Upload a file.

    Note

    The version of lindorm-all-client must be 2.1.6.

  3. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

  4. In the navigation pane on the left, click Job Development > SQL Development.

  5. In the SQLConsole window, select the Spark engine and the job resource group.

  6. In the SQLConsole window, enter the following job content:

    set spark.adb.eni.enabled=true;
    set spark.adb.eni.vswitchId=<vsw-bp14pj8h0k5p0kwu3****>;
    set spark.adb.eni.securityGroupId=<sg-bp11m93k021tp4ca****>;
    
    add jar oss://<bucketname>/hbase_jars/lindorm-all-client-2.1.6.jar;
    
    CREATE TEMPORARY VIEW tb
    USING org.apache.spark.sql.jdbc
    OPTIONS (
      driver 'com.aliyun.lindorm.table.client.Driver',
      url 'jdbc:lindorm:table:url=http://ld-bp1ttz8833x9c****-proxy-lindorm.lindorm.rds.aliyuncs.com:30060/default', -- private network address
      dbtable 'tb',
    -- The database username obtained from the Lindorm console
      user 'test',
    -- The database password obtained from the Lindorm console
      password 'password',
      fetchsize 10000
    );
    
    -- Create an external OSS database and table
    create database if not exists external_oss_db location 'oss://<bucketname>/warehouse/db';
    create table if not exists external_oss_db.lindorm_table_1(id string, name string) using parquet location 'oss://<bucketname>/warehouse/db/lindorm_table_1/';
    
    -- Insert data from the Lindorm wide table into the external OSS table
    insert into external_oss_db.lindorm_table_1 select * from tb;
    
    -- Read data from the external OSS table
    select * from external_oss_db.lindorm_table_1;

    Parameter description:

    Parameter

    Description

    jar

    The OSS path where the JAR package that you downloaded in Step 2 is stored.

    driver

    The Lindorm JDBC Driver dependency. The value is fixed at com.aliyun.lindorm.table.client.Driver.

    url

    The VPC endpoint for LindormTable SQL that you obtained in Step 1. The format is

    VPC endpoint for LindormTable SQL/database name.

    For example: jdbc:lindorm:table:url=http://ld-bp1ttz8833x9c****-proxy-lindorm.lindorm.rds.aliyuncs.com:30060/default

    dbtable

    The name of the Lindorm wide table database. In this example, the value is tb.

    user

    If you forget the user password, you can change it in the cluster management system of LindormTable.

    password

    fetchsize

    The number of data rows to read per connection. You can adjust this value as needed.

    Important

    You must configure the fetchsize parameter when you submit a Spark job. Otherwise, data cannot be queried.

  7. Click Execute Now.

  8. After a Spark job runs successfully, you can view the table data in the Log section on the Spark Jar Development page and the Application List tab. For more information, see Spark development editor.