All Products
Search
Document Center

E-MapReduce:Superset (available only for existing users)

Last Updated:Jul 13, 2023

Superset is a lightweight business intelligence (BI) tool. You can connect Superset to multiple data sources and use it to analyze, visualize, and define charts and dashboards. You can also use Superset to import or export dashboards and manage the permissions of users and roles. This topic describes how to use Superset. An E-MapReduce (EMR) V3.34.0 cluster is used in the example.

Background information

Superset is deeply integrated into EMR Druid clusters and supports various relational databases. EMR Druid supports SQL. You can use the native query language of Apache Druid or SQL to access EMR Druid from Superset.

Prerequisites

An EMR Hadoop or Druid cluster is created, and Superset is selected from the optional services when you create the cluster. For more information, see Create a cluster.

Limits

  • By default, Superset is installed on the emr-header-1 node of a cluster. Superset cannot be deployed in high-availability (HA) mode.

  • You cannot use Knox to access the web UI of Superset.

  • Before you use Superset, you must make sure that your computer can access the emr-header-1 node of the cluster. For more information, see Create an SSH tunnel to access web UIs of open source components.

Access EMR Druid from Superset

  1. Log on to the web UI of Superset.

    Create an SSH tunnel to log on to the web UI of Superset. For more information, see Create an SSH tunnel to access web UIs of open source components.

    The default username and password are both admin. Change the password after you log on to the web UI.

    Note

    The English web UI appears after you log on for the first time.

  2. Add an EMR Druid cluster.

    1. Choose Sources > Druid Clusters.

    2. Click the Add icon.

    3. In the Add Druid Cluster dialog box, configure the parameters.

      Add Druid

      Parameter

      Description

      Broker Host

      Enter emr-header-1. This is the fixed value.

      Broker Port

      Set the parameter to a value in the format of 1+port number of the open source broker.

      For example, if the port number of the open source broker is 8082, set this parameter to 18082.

      Cluster Name

      Enter the name of the Druid cluster that you created in the EMR console.

    4. Click Save.

  3. Add a data source.

    1. Choose Sources > Druid Datasources.

    2. Click the Add icon.

    3. In the Add Druid Datasource dialog box, configure the parameters.

      datasource

      Parameter

      Description

      Datasource Name

      Customize a database name.

      Cluster

      The name of the EMR Druid cluster that you added.

    4. Click Save.

      After you save the configuration, you can click the Edit icon to specify dimension columns and metric columns.

  4. View information about the added EMR Druid cluster.

    After the data source is added, you can click the data source name to view information about the added EMR Druid cluster. check-datasource

Access a Hive database from Superset

SQLAlchemy is integrated into Superset to support a variety of databases in multiple languages, such as MySQL, Oracle, PostgreSQL, and Microsoft SQL Server. Superset also supports big data query engines, such as Hive, Presto, and Druid. This section describes how to access a Hive database from Superset. Hive is installed in EMR Hadoop clusters by default. For more information about how to access other types of databases from Superset, see SQLAlchemy.

  1. Log on to the web UI of Superset.

    Create an SSH tunnel to log on to the web UI of Superset. For more information, see Create an SSH tunnel to access web UIs of open source components.

    The default username and password are both admin. Change the password after you log on to the web UI.

  2. Add a Hive database.

    1. Choose Sources > Databases.

    2. Click the Add icon.

    3. In the Add Database dialog box, configure the parameters.

      DataBase

      Parameter

      Description

      Database

      The name of the database that you want to add.

      SQLAlchemy URI

      Enter hive://emr-header-1:10000/.

    4. Click Save.

  3. Add a table.

    1. Choose Sources > Tables.

    2. Click the Add icon.

    3. In the Import a table definition dialog box, configure the parameters.

      add table

      Parameter

      Description

      Database

      The name of the database that you added.

      Table Name

      The name of a table that is stored in the database that you added.

      In this example, the test table is added.

    4. Click Save.

  4. Query data from the added database.

    1. Choose SQL Lab > SQL Editor.

    2. Select the added database Hive JDBC Server.

    3. Select the default mode.

    4. Run a Hive command to query data from the database.

FAQ

  • Problem description: The first time the admin user logs on to the web UI of Superset from an EMR cluster whose minor version is earlier than V4.6 or V3.33, the "invalid login" message appears.

  • Solution

    1. Log on to the master node of your EMR cluster in SSH mode. For more information, see Log on to a cluster.

      Important

      You must perform the following steps as the root user.

    2. Run the following command to go to the Superset command-line interface (CLI):

      source /usr/lib/superset-current/bin/activate
    3. Run the following command to create an administrator:

      superset fab create-admin

      Enter the username and password and confirm the password as prompted.

      Username [admin]:
      User first name [admin]:
      User last name [user]:
      Email [admin@fab.org]:
      Password:
      Repeat for confirmation:
      Recognized Database Authentications.
      Admin User admin created.
    4. Initialize the user that you created.

      1. Run the following command to initialize the database:

        superset db upgrade
      2. Run the following command to initialize Superset:

        superset init

        After you perform the preceding operations, you must create an SSH tunnel that is used to access the web UIs of open source components. Then, you can log on to the web UI of Superset as the user that you created. For more information about how to create an SSH tunnel, see Create an SSH tunnel to access web UIs of open source components.