Use Java Database Connectivity (JDBC) to connect your application to Lindorm Distributed Processing System (LDPS) and run Spark SQL queries, analytics, and data generation workloads.
Prerequisites
Before you begin, make sure you have:
A Lindorm instance with LindormTable activated. See Create an instance
LDPS activated for the instance. See Activate LDPS and modify the configurations
JDK 1.8 or later installed in a Java IDE
Get the JDBC endpoint
The JDBC endpoint follows this format:
jdbc:hive2://<host>:10009/;?token=<your-token>To look up the endpoint for your instance, see View endpoints.
Connect with Beeline
Use Beeline, the interactive CLI client bundled in the Spark release package, to run SQL statements directly against LDPS without writing any application code.
Download the Spark release package and decompress it.
Set the
SPARK_HOMEenvironment variable to the decompressed directory:export SPARK_HOME=/path/to/spark/Configure
$SPARK_HOME/conf/beeline.confwith the following parameters:Parameter Description endpointJDBC endpoint of LDPS userUsername for Lindorm wide tables passwordPassword for Lindorm wide tables shareResourceWhether multiple interactive sessions share Spark resources. Default: trueStart Beeline:
/bin/beelineIn the interactive session, execute SQL statements against your LDPS data sources.
LDPS supports multiple data source types. For details, see Precautions.
Example: Create and query a table with Hive Metastore
After you activate Hive Metastore, run the following statements to create a table, insert data, and query it. For setup instructions, see Use Hive Metastore to manage metadata in Lindorm.
CREATE TABLE test (id INT, name STRING);
INSERT INTO test VALUES (0, 'Jay'), (1, 'Edison');
SELECT id, name FROM test;Connect with Java
All Java examples use the org.apache.hive.jdbc.HiveDriver driver and the DriverManager.getConnection() API.
Add the JDBC driver dependency to your project. Maven (
pom.xml):<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>2.3.8</version> </dependency>Connect to LDPS and run a query:
import java.sql.*; public class App { public static void main(String[] args) throws Exception { // Register the Hive JDBC driver Class.forName("org.apache.hive.jdbc.HiveDriver"); // Replace with your LDPS JDBC endpoint String endpoint = "jdbc:hive2://123.234.XX.XX:10009/;?token=bisdfjis-f7dc-fdsa-9qwe-dasdfhhv8****"; String user = ""; String password = ""; Connection con = DriverManager.getConnection(endpoint, user, password); Statement stmt = con.createStatement(); // Execute a query and print results ResultSet res = stmt.executeQuery("SELECT * FROM test"); while (res.next()) { System.out.println(res.getString(1)); } } }(Optional) Pass Spark job parameters by appending them to the endpoint URL with semicolons:
String endpoint = "jdbc:hive2://123.234.XX.XX:10009/;?token=bisdfjis-f7dc-fdsa-9qwe-dasdfhhv8****" + ";spark.dynamicAllocation.minExecutors=3" + ";spark.sql.adaptive.enabled=false";
Connect with Python
All Python examples use the JayDeBeApi library, which bridges the Python DB-API 2.0 interface to the Hive JDBC driver.
Download the Spark release package and decompress it.
Set the environment variables:
export SPARK_HOME=/path/to/dir/ export CLASSPATH=$CLASSPATH:$SPARK_HOME/jars/*Install JayDeBeApi:
pip install JayDeBeApiConnect to LDPS and run a query:
import jaydebeapi driver = 'org.apache.hive.jdbc.HiveDriver' endpoint = 'jdbc:hive2://123.234.XX.XX:10009/;?token=bisdfjis-f7dc-fdsa-9qwe-dasdfhhv8****' jar_path = '/path/to/sparkhome/jars/hive-jdbc-****.jar' user = '****' password = '****' conn = jaydebeapi.connect(driver, endpoint, [user, password], [jar_path]) cursor = conn.cursor() cursor.execute("select 1") results = cursor.fetchall() cursor.close() conn.close()(Optional) Pass Spark job parameters by appending them to the endpoint string:
endpoint = ( "jdbc:hive2://123.234.XX.XX:10009/;?token=bisdfjis-f7dc-fdsa-9qwe-dasdfhhv8****" ";spark.dynamicAllocation.minExecutors=3" ";spark.sql.adaptive.enabled=false" )
What's next
Precautions — supported data sources and known limitations for LDPS
Use Hive Metastore to manage metadata in Lindorm — manage table metadata for Beeline and JDBC queries
Activate LDPS and modify the configurations — tune LDPS cluster settings