Add a User-Created Spark SQL Data Source - Quick BI - Alibaba Cloud Documentation Center

Quick BI can connect to the Spark SQL database, via either a public network or an Alibaba Cloud VPC. This topic describes how to add a self-managed Spark SQL database.

Prerequisites

Your network meets the following requirements:
- If you Quick BI connect to a Spark SQL database (version 3.0 or later) over the Internet, add the Quick BI IP address to the whitelist of the database. For more information, see Add security group rules.
- You can connect your Quick BI to a Spark SQL database (version 3.0 or later) over an internal network. Use one of the following methods to connect the data source to the Quick BI:
  - If the Spark SQL database is deployed on an Elastic Compute Service (ECS) instance, you can connect the Spark SQL database to the instance over a virtual private cloud (VPC).
  - You can deploy a jump server and access the database over an SSH tunnel.
The username and password of the self-built Spark SQL database (3.0 or later) are obtained.

Limits

The Spark SQL database to be added is of version 3.0 or later, and the underlying Hive metastore is of version 2.0 or later.

Procedure

Log on to the Quick BI console.
Perform the steps that are shown in the following figure to add a user-created MongoDB data source.
1. Go to the Create Data Source page.
2. Click Show More.
3. Select Spark SQL data source.

In the Configure Connection dialog box, configure the following parameters based on your business scenario.

Component	Description
Display Name	The name of the data source. The name is displayed in the data source list. The name cannot contain special characters or start or end with spaces.
Database Address	The address where the Spark SQL database is deployed, including the IP address or URL.
Port	The corresponding port number of the database.
Database	The custom database name when you deploy the Spark SQL database.
Username and Password	The username and password that are used to log on to the Spark SQL database. Make sure that the username has the create, insert, update, and delete permissions on the tables in the database.
VPC data source	If the Spark SQL database is deployed on a ECS and the network type is Alibaba Cloud VPC, select VPC Data Source and configure the following parameters: AccessKey ID: the AccessKey ID that is used to purchase the instance. For more information, see Obtain an AccessKey pair. AccessKey: the AccessKey secret that is used to purchase the instance. For more information, see Obtain an AccessKey pair. Instance ID: the ID of the ECS instance. Region: the region in which the ECS instance is deployed.
SSH	If you select SSH, configure the following parameters: You can deploy a jump server and access the database over an SSH tunnel. To obtain the jump server information, contact O&M personnel or system administrators. SSH Host: the IP address of the jump server. SSH Username: the username that is used to log on to the jump server. SSH Password: the password that is used to log on to the jump server. SSH Port Number: the port number that is used to connect to the jump server. Default value: 22. For more information, see Connect to a Linux instance by using a password. Note Only Quick BI Enterprise Standard allows you to access user-created data sources via SSH tunnels.
Initialize SQL statements	The SQL statement that is initialized and executed after each data source connection. Only SET statements are allowed. The statements are separated by semicolons and line breaks are not allowed.

Click Test Connection to verify that the data source can be connected.
After the data source passes the connectivity test, click OK.
The data source that you added is displayed in the data source list.

What to do next

After you add a data source, you can create a dataset and analyze data.

Add a data table in the Spark SQL database or a user-created SQL statement to the Quick BI. For more information, see Create and Manage Datasets.
You can add charts and analyze data. For more information, see Create a dashboard and Overview.
You can drill down and further analyze data. For more information, see Drilling, filter interaction, and hyperlink.