Superset is a lightweight business intelligence (BI) tool. You can connect Superset to multiple data sources and use it to analyze, visualize, and define charts and dashboards. You can also use Superset to import or export dashboards and manage the permissions of users and roles. This topic uses an E-MapReduce (EMR) V3.34.0 cluster as an example to describe how to use Superset.

Prerequisites

An EMR Hadoop or Druid cluster is created, and Superset is selected from the optional services during cluster creation.

For more information, see Create a cluster.

Background information

Superset is deeply integrated into EMR Druid clusters and supports various relational databases. EMR Druid supports SQL. You can use the native query language of Apache Druid or SQL to access EMR Druid from Superset.

Limits

  • By default, Superset is installed on the emr-header-1 node of a cluster. Superset cannot be deployed in high-availability (HA) mode.
  • You cannot use Knox to access the web UI of Superset.
  • Before you use Superset, you must make sure that your computer can access the emr-header-1 node of the cluster. For more information, see Create an SSH tunnel to access web UIs of open source components.

Access EMR Druid from Superset

  1. Log on to the web UI of Superset.
    Create an SSH tunnel to log on to the web UI of Superset. For more information, see Create an SSH tunnel to access web UIs of open source components.
    The default username and password are both admin. Change the username and password after you log on to the web UI.
    Note The English web UI appears after you log on for the first time.
  2. Add an EMR Druid cluster.
    1. Choose Sources > Druid Clusters.
    2. Click the Add icon.
    3. In the Add Druid Cluster dialog box, configure the parameters described in the following table.
      Add Druid
      Parameter Description
      Broker Host Enter emr-header-1.
      Broker Port Set the parameter to a value in the format of 1+port number of the open source broker.

      For example, if the port number of the open source broker is 8082, set this parameter to 18082.

      Cluster Name Enter the name of the Druid cluster that you created in the EMR console.
    4. Click Save.
  3. Add a data source.
    1. Choose Sources > Druid Datasources.
    2. Click the Add icon.
    3. In the Add Druid Datasource dialog box, configure the parameters described in the following table.
      datasource
      Parameter Description
      Datasource Name Customize a database name.
      Cluster The name of the EMR Druid cluster that you added.
    4. Click Save.
      After you save the configuration, you can click the Edit icon to specify dimension columns and metric columns.
  4. View information about the added EMR Druid cluster.
    After the data source is added, you can click the data source name to view information about the added EMR Druid cluster. check-datasource

Access a Hive database from Superset

SQLAlchemy is integrated into Superset to support a variety of databases in multiple languages, such as MySQL, Oracle, PostgreSQL, and Microsoft SQL Server. Superset also supports big data query engines, such as Hive, Presto, and Druid. This section uses Hive as an example. Hive is installed in EMR Hadoop clusters by default. For more information about how to access other types of databases from Superset, see SQLAlchemy.

  1. Log on to the web UI of Superset.
    Create an SSH tunnel to log on to the web UI of Superset. For more information, see Create an SSH tunnel to access web UIs of open source components.

    The default username and password are both admin. Change the username and password after you log on to the web UI.

  2. Add a Hive database.
    1. Choose Sources > Databases.
    2. Click the Add icon.
    3. In the Add Database dialog box, configure the parameters described in the following table.
      DataBase
      Parameter Description
      Database The name of the database that you want to add.
      SQLAlchemy URI Enter hive://emr-header-1:10000/.
    4. Click Save.
  3. Add a table.
    1. Choose Sources > Tables.
    2. Click the Add icon.
    3. In the Import a table definition dialog box, configure the parameters described in the following table.
      add table
      Parameter Description
      Database The name of the database that you added.
      Table Name The name of a table that is stored in the database that you added.

      In this example, the test table is added.

    4. Click Save.
  4. Query data from the added database.
    1. Choose SQL Lab > SQL Editor.
    2. Select the added database Hive JDBC Server.
    3. Select the default mode.
    4. Run a Hive command to query data from the database.