All Products
Search
Document Center

Quick BI:Add a User-Created Spark SQL Data Source

Last Updated:Jan 25, 2024

Quick BI can connect to the Spark SQL database, via either a public network or an Alibaba Cloud VPC. This topic describes how to add a self-managed Spark SQL database.

Prerequisites

  • Your network meets the following requirements:

    • If you Quick BI connect to a Spark SQL database (version 3.0 or later) over the Internet, add the Quick BI IP address to the whitelist of the database. For more information, see Add security group rules.

    • You can connect your Quick BI to a Spark SQL database (version 3.0 or later) over an internal network. Use one of the following methods to connect the data source to the Quick BI:

      • If the Spark SQL database is deployed on an Elastic Compute Service (ECS) instance, you can connect the Spark SQL database to the instance over a virtual private cloud (VPC).

      • You can deploy a jump server and access the database over an SSH tunnel.

  • The username and password of the self-built Spark SQL database (3.0 or later) are obtained.

Limits

The Spark SQL database to be added is of version 3.0 or later, and the underlying Hive metastore is of version 2.0 or later.

Procedure

  1. Log on to the Quick BI console.

  2. Perform the steps that are shown in the following figure to add a user-created MongoDB data source.

    1. Go to the Create Data Source page.

    2. Click Show More. image.png

    3. Select Spark SQL data source. image

  3. In the Configure Connection dialog box, configure the following parameters based on your business scenario.

    Component

    Description

    Display Name

    The name of the data source. The name is displayed in the data source list.

    The name cannot contain special characters or start or end with spaces.

    Database Address

    The address where the Spark SQL database is deployed, including the IP address or URL.

    Port

    The corresponding port number of the database.

    Database

    The custom database name when you deploy the Spark SQL database.

    Username and Password

    The username and password that are used to log on to the Spark SQL database. Make sure that the username has the create, insert, update, and delete permissions on the tables in the database.

    VPC data source

    If the Spark SQL database is deployed on a ECS and the network type is Alibaba Cloud VPC, select VPC Data Source and configure the following parameters:

    • AccessKey ID: the AccessKey ID that is used to purchase the instance.

      For more information, see Obtain an AccessKey pair.

    • AccessKey: the AccessKey secret that is used to purchase the instance.

      For more information, see Obtain an AccessKey pair.

    • Instance ID: the ID of the ECS instance.

    • Region: the region in which the ECS instance is deployed.

    SSH

    If you select SSH, configure the following parameters:

    You can deploy a jump server and access the database over an SSH tunnel. To obtain the jump server information, contact O&M personnel or system administrators.

    • SSH Host: the IP address of the jump server.

    • SSH Username: the username that is used to log on to the jump server.

    • SSH Password: the password that is used to log on to the jump server.

    • SSH Port Number: the port number that is used to connect to the jump server. Default value: 22.

    For more information, see Connect to a Linux instance by using a password.

    Note

    Only Quick BI Enterprise Standard allows you to access user-created data sources via SSH tunnels.

    Initialize SQL statements

    The SQL statement that is initialized and executed after each data source connection. Only SET statements are allowed. The statements are separated by semicolons and line breaks are not allowed.

  4. Click Test Connection to verify that the data source can be connected.

    image.png

  5. After the data source passes the connectivity test, click OK.

    The data source that you added is displayed in the data source list.

What to do next

After you add a data source, you can create a dataset and analyze data.