Superset is a lightweight business intelligence (BI) tool. You can connect Superset to multiple data sources and use it to analyze, visualize, and define charts and dashboards. You can also use Superset to import or export dashboards and manage the permissions of users and roles. This topic describes how to use Superset. An E-MapReduce (EMR) V3.34.0 cluster is used in the example.
Background information
Superset is deeply integrated into EMR Druid clusters and supports various relational databases. EMR Druid supports SQL. You can use the native query language of Apache Druid or SQL to access EMR Druid from Superset.
Prerequisites
An EMR Hadoop or Druid cluster is created, and Superset is selected from the optional services when you create the cluster. For more information, see Create a cluster.
Limits
By default, Superset is installed on the emr-header-1 node of a cluster. Superset cannot be deployed in high-availability (HA) mode.
You cannot use Knox to access the web UI of Superset.
Before you use Superset, you must make sure that your computer can access the emr-header-1 node of the cluster. For more information, see Create an SSH tunnel to access web UIs of open source components.
Access EMR Druid from Superset
Log on to the web UI of Superset.
Create an SSH tunnel to log on to the web UI of Superset. For more information, see Create an SSH tunnel to access web UIs of open source components.
The default username and password are both admin. Change the password after you log on to the web UI.
NoteThe English web UI appears after you log on for the first time.
Add an EMR Druid cluster.
Choose .
Click the
icon.
In the Add Druid Cluster dialog box, configure the parameters.
Parameter
Description
Broker Host
Enter emr-header-1. This is the fixed value.
Broker Port
Set the parameter to a value in the format of 1+port number of the open source broker.
For example, if the port number of the open source broker is 8082, set this parameter to 18082.
Cluster Name
Enter the name of the Druid cluster that you created in the EMR console.
Click Save.
Add a data source.
Choose .
Click the
icon.
In the Add Druid Datasource dialog box, configure the parameters.
Parameter
Description
Datasource Name
Customize a database name.
Cluster
The name of the EMR Druid cluster that you added.
Click Save.
After you save the configuration, you can click the
icon to specify dimension columns and metric columns.
View information about the added EMR Druid cluster.
After the data source is added, you can click the data source name to view information about the added EMR Druid cluster.
Access a Hive database from Superset
SQLAlchemy is integrated into Superset to support a variety of databases in multiple languages, such as MySQL, Oracle, PostgreSQL, and Microsoft SQL Server. Superset also supports big data query engines, such as Hive, Presto, and Druid. This section describes how to access a Hive database from Superset. Hive is installed in EMR Hadoop clusters by default. For more information about how to access other types of databases from Superset, see SQLAlchemy.
Log on to the web UI of Superset.
Create an SSH tunnel to log on to the web UI of Superset. For more information, see Create an SSH tunnel to access web UIs of open source components.
The default username and password are both admin. Change the password after you log on to the web UI.
Add a Hive database.
Choose .
Click the
icon.
In the Add Database dialog box, configure the parameters.
Parameter
Description
Database
The name of the database that you want to add.
SQLAlchemy URI
Enter hive://emr-header-1:10000/.
Click Save.
Add a table.
Choose .
Click the
icon.
In the Import a table definition dialog box, configure the parameters.
Parameter
Description
Database
The name of the database that you added.
Table Name
The name of a table that is stored in the database that you added.
In this example, the test table is added.
Click Save.
Query data from the added database.
Choose .
Select the added database Hive JDBC Server.
Select the default mode.
Run a Hive command to query data from the database.
FAQ
Problem description: The first time the admin user logs on to the web UI of Superset from an EMR cluster whose minor version is earlier than V4.6 or V3.33, the "invalid login" message appears.
Solution
Log on to the master node of your EMR cluster in SSH mode. For more information, see Log on to a cluster.
ImportantYou must perform the following steps as the root user.
Run the following command to go to the Superset command-line interface (CLI):
source /usr/lib/superset-current/bin/activate
Run the following command to create an administrator:
superset fab create-admin
Enter the username and password and confirm the password as prompted.
Username [admin]: User first name [admin]: User last name [user]: Email [admin@fab.org]: Password: Repeat for confirmation: Recognized Database Authentications. Admin User admin created.
Initialize the user that you created.
Run the following command to initialize the database:
superset db upgrade
Run the following command to initialize Superset:
superset init
After you perform the preceding operations, you must create an SSH tunnel that is used to access the web UIs of open source components. Then, you can log on to the web UI of Superset as the user that you created. For more information about how to create an SSH tunnel, see Create an SSH tunnel to access web UIs of open source components.