Superset is a lightweight business intelligence (BI) tool. Connect it to E-MapReduce (EMR) Druid or Hive data sources to run queries, build charts, and publish dashboards — all from a browser. An EMR V3.34.0 cluster is used in the examples.
Prerequisites
Before you begin, make sure that you have:
-
An EMR Hadoop or Druid cluster with Superset selected as an optional service at creation time. For instructions, see Create a cluster.
-
Network access from your computer to the emr-header-1 node of the cluster. Superset's web UI is only accessible through an SSH tunnel to that node. For setup instructions, see Create an SSH tunnel to access web UIs of open source components.
Limitations
-
Superset runs on the emr-header-1 node only and cannot be deployed in high-availability (HA) mode.
-
Knox cannot be used to access the Superset web UI.
Access EMR Druid from Superset
Superset is deeply integrated with EMR Druid. You can query Druid using either SQL or Druid's native query language.
Step 1: Log in to Superset
Open the Superset web UI through your SSH tunnel. The default username and password are both admin. Change the password immediately after your first login.
The web UI displays in English on first login.
Step 2: Add an EMR Druid cluster
-
Choose Sources > Druid Clusters.
-
Click the
icon. -
In the Add Druid Cluster dialog box, configure the following parameters.
Parameter Description Broker Host Enter emr-header-1. This value is fixed.Broker Port Enter 1followed by the open-source Broker port number. For example, if the Broker port is8082, enter18082.Cluster Name Enter the name of the Druid cluster you created in the EMR console. 
-
Click Save.
Step 3: Add a data source
-
Choose Sources > Druid Datasources.
-
Click the
icon. -
In the Add Druid Datasource dialog box, configure the following parameters.
Parameter Description Datasource Name Enter a name for the datasource. Cluster Select the EMR Druid cluster you added in the previous step. 
-
Click Save. After saving, click the
icon to specify dimension columns and metric columns.
Step 4: Verify the connection
Click the datasource name to view the details of the EMR Druid cluster.
Access a Hive database from Superset
Superset uses SQLAlchemy to connect to relational databases and big data query engines, including MySQL, Oracle, PostgreSQL, Microsoft SQL Server, Hive, Presto, and Druid. Hive is installed by default on EMR Hadoop clusters.
For other supported database types, see the SQLAlchemy dialect documentation.
Step 1: Log in to Superset
Open the Superset web UI through your SSH tunnel. The default username and password are both admin.
Step 2: Add a Hive database
-
Choose Sources > Databases.
-
Click the
icon. -
In the Add Database dialog box, configure the following parameters.
Parameter Description Database Enter a name for the database connection. SQLAlchemy URI Enter hive://emr-header-1:10000/.
-
Click Save.
Step 3: Add a table
-
Choose Sources > Tables.
-
Click the
icon. -
In the Import a table definition dialog box, configure the following parameters.
Parameter Description Database Select the database you added in the previous step. Table Name Enter the name of a table in that database. This example uses a table named test.
-
Click Save.
Step 4: Query data
-
Choose SQL Lab > SQL Editor.
-
Select Hive JDBC Server as the database.
-
Select default as the schema.
-
Run your Hive query.
FAQ
The admin user sees "invalid login" on first login.
This happens on EMR clusters with a minor version earlier than V4.6 or V3.33. To fix this, run the following commands on the master node as the root user:
-
Log in to the master node via SSH. For instructions, see Log on to a cluster.
-
Activate the Superset environment:
source /usr/lib/superset-current/bin/activate -
Create an admin account:
superset fab create-adminEnter the username, first name, last name, email, password, and confirmation as prompted. The defaults are:
Username [admin]: User first name [admin]: User last name [user]: Email [admin@fab.org]: Password: Repeat for confirmation: -
Initialize the database:
superset db upgrade -
Initialize Superset:
superset init
After these steps, create an SSH tunnel and log in with the account you just created.