All Products
Search
Document Center

Lindorm:Data access

Last Updated:Jul 28, 2025

This topic describes how to use an extract, transform, and load (ETL) resource group to access data in LindormTable and column store. The process involves configuring the environment, switching data sources, and performing data operations.

Prerequisites

  • A Java environment is installed. Java Development Kit (JDK) 1.8 or later is required.

  • Lindorm LindormTable is activated.

  • The Lindorm compute engine is activated.

  • The IP address of the client is added to the Lindorm whitelist.

  • An ETL resource group is enabled.

Usage notes

If your application is deployed on an ECS instance, ensure that the Lindorm instance and the ECS instance meet the following requirements to access the Lindorm instance over a virtual private cloud (VPC). This ensures network connectivity.

  • The Lindorm instance and the ECS instance are in the same region. We recommend that you deploy them in the same zone to reduce network latency.

  • The ECS instance and the Lindorm instance are in the same VPC.

Step 1: Connect to an ETL resource group

  1. Download the Spark deployment package.

  2. Decompress the Spark deployment package.

  3. Set the SPARK_HOME environment variable to the path of the decompressed package.

    export SPARK_HOME=/path/to/spark/;
  4. Configure the $SPARK_HOME/conf/beeline.conf file.

    • endpoint: The JDBC endpoint of the Lindorm compute engine. For more information, see View endpoints.

    • user: The username for LindormTable.

    • password: The password for the specified username.

    • shareResource: Specifies whether to share computing resources among multiple sessions of the same user. The default value is true.

    • compute-group: Specifies the name of the ETL resource group for the compute engine. If you do not configure this parameter, the default value is default.

      Note

      You can add additional job parameters to the beeline.conf file in the key=value format. For more information, see Job configuration.

  5. Go to the $SPARK_HOME/bin folder and run the ./beeline command to start an interactive session, where you can enter SQL statements to perform write or query operations.

Step 2: Access data

An ETL resource group provides access to two types of catalogs for different data sources: lindorm_columnar for column store data sources and lindorm_table for wide table data sources. By default, the ETL resource group is connected to the lindorm_columnar catalog. This catalog determines the data source for the SQL operations that you execute directly.

You can run the following SQL statements to switch between the column store and wide table data sources:

  • Switch to the column store data source: USE lindorm_columnar;.

  • Switch to the wide table data source: USE lindorm_table;.

Access column store data

Column store data is stored in an open data lake format that is compatible with the Apache Iceberg specification. The data is physically stored in the underlying distributed file system of a Lindorm instance. This storage architecture allows shared access from multiple types of resource groups. The online analytical processing (OLAP) resource group and the ETL resource group in the same Lindorm instance can directly access the columnar data.

  1. Create and use a database.

    -- Create a database.
    CREATE DATABASE etldemo;
    
    -- Use the database.
    USE etldemo;
  2. Create a data table and write data to the table.

    -- Create a table.
    CREATE TABLE test (id INT, name STRING)
    
    -- Insert data.
    INSERT INTO test VALUES (0, 'Jay'), (1, 'Edison');
  3. Query data.

    • Example 1:

      SELECT id, name FROM test WHERE id != 0;

      The following result is returned:

      +------+--------+
      | id   | name   |
      +------+--------+
      |    1 | Edison |
      +------+--------+
    • Example 2:

      SELECT count(DISTINCT name) FROM test;

      The following result is returned:

      +----------------------+
      | count(DISTINCT name) |
      +----------------------+
      |                    2 |
      +----------------------+
    • Example 3:

      SELECT * FROM
       (SELECT id, name FROM test WHERE id != 0) t0
       JOIN
       (SELECT id, name FROM test WHERE id != 2) t1
       ON t0.id=t1.id;
      +------+--------+------+--------+
      | id   | name   | id   | name   |
      +------+--------+------+--------+
      |    1 | Edison |    1 | Edison |
      +------+--------+------+--------+
  4. Write the query results to a table.

    INSERT INTO test SELECT * FROM test;
  5. Delete the table.

    DROP TABLE test PURGE;
  6. Delete the database.

    DROP DATABASE etldemo;

Access wide table data

You can connect an ETL resource group to LindormTable to perform real-time analytics on wide table data. This process uses independent computing resources and provides enhanced query capabilities.

Data in LindormTable is stored in the lindorm_table catalog. To access this data, you must run the USE lindorm_table; statement to switch to this catalog.

If you already have a wide table, proceed to the next step. Otherwise, connect to and use LindormTable and run the following statements to create a test database and a test table.

-- Create a database.
CREATE DATABASE test; 

-- Use the database.
USE test; 

-- Create a data table and insert two data records.
CREATE TABLE tb (id varchar, name varchar, address varchar,  PRIMARY KEY(id, name)) ; 
UPSERT INTO tb (id, name, address) values ('001', 'Jack',  'hz'); 
UPSERT INTO tb (id, name, address) values ('002', 'Edison',  'bj'); 

In the lindorm-beeline interactive session, you can run the following query statements to access wide table data.

  1. Switch the data source and use the database.

    -- Explicitly switch the data source.
    USE lindorm_table;
    
    -- Use the test database.
    USE test;
  2. Query wide table data.

    Example 1:

    SELECT * FROM tb LIMIT 5;

    The following result is returned:

    +------+--------+---------+
    | id   | name   | address |
    +------+--------+---------+
    | 001  | Jack   | hz      |
    | 002  | Edison | bj      |
    +------+--------+---------+

    Example 2:

    SELECT count(*) FROM tb;

    The following result is returned:

    +-----------+
    | count(1)  |
    +-----------+
    | 2         |
    +-----------+

References

You can also use Spark SQL or Java Database Connectivity (JDBC) to access Lindorm column store and wide table data for data analytics in JAR and Python jobs. For more information, see the following topics: