You can connect Superset to a MaxCompute project and use Superset to explore and visualize data in the MaxCompute project. This topic describes how to use PyODPS to connect Superset to a MaxCompute project and visualize MaxCompute data on the Superset UI.

Background information

Apache Superset is a modern data exploration and visualization platform. It is a fast, lightweight, and intuitive platform that supports various types of charts, which range from simple line charts to highly detailed geospatial charts. For more information about Superset, see Superset.

Prerequisites

Make sure that the following conditions are met:
  • A MaxCompute project is created.

    For more information about how to create a MaxCompute project, see Create a MaxCompute project.

  • The AccessKey ID and AccessKey secret that are used to access the MaxCompute project are obtained.

    You can obtain the AccessKey ID and AccessKey secret on the AccessKey Pair page.

  • Superset is installed.

    You can install Superset by following instructions in the official Superset documentation. For more information, see Install Superset. You can also use Docker to install Superset and perform additional steps based on the official Superset documentation. For more information, see Adding New Database Drivers in Docker.

    In this topic, Superset 1.1.0 is used.

  • PyODPS 0.10.7 or later is installed.

    We recommend that you use Python 3 to install PyODPS. For more information, see Install PyODPS.

    In this topic, PyODPS 0.10.7 is used.

Step 1: Connect Superset to MaxCompute

  1. Start Superset.
    For more information, see Start Superset.
  2. In the top navigation bar, choose Data > Databases. Then, click +DATABASE in the upper-right corner.
    Add a MaxCompute data source
  3. In the Add Database dialog box, configure the parameters as described in the following table.
    Configure connection parameters
    Parameter Description
    DATABASE NAME The name of the data source that you want to add.
    SQLALCHEMY URI The SQLAlchemy connection string that is used to connect Superset to the MaxCompute project. You must configure this parameter in the format of odps://<accesskey_id>:<accesskey_secret>@<MaxCompute_project_name>/?endpoint=<MaxCompute_endpoint>, where:
    • <accesskey_id>: required. The AccessKey ID that is used to access the MaxCompute project.

      You can obtain the AccessKey ID on the AccessKey Pair page.

    • <accesskey_secret>: required. The AccessKey secret that corresponds to the AccessKey ID.

      You can obtain the AccessKey secret on the AccessKey Pair page.

    • <MaxCompute_project_name>: required. The name of the MaxCompute project to which you want to connect Tableau.

      This parameter specifies the name of your MaxCompute project instead of the DataWorks workspace to which the MaxCompute project corresponds. You can log on to the MaxCompute console, select the region where your MaxCompute project resides in the top navigation bar, and then view the name of the MaxCompute project on the Project management tab.

    • <MaxCompute_endpoint>: required. The endpoint of MaxCompute. Configure this parameter based on the region where the MaxCompute project resides.

      For more information about the endpoints of MaxCompute in different regions, see Endpoints.

    If you want to enable the MaxCompute Query Acceleration (MCQA) feature, add the parameters that are described in the following table to the end of the SQLAlchemy connection string.

    Parameter Value Description
    interactive_mode true Specifies whether to enable the MCQA feature.
    reuse_odps true Optional. This parameter specifies whether to forcibly reuse connections. We recommend that you set this parameter to true. By default, Superset forcibly creates a connection for each SQL request. The forcible reuse of connections simplifies the creation process.
    fallback_policy
    • unsupported
    • upgrading
    • noresource
    • timeout
    • generic
    • default
    • all
    Optional. This parameter specifies the fallback policy that is used when query acceleration fails. You must configure this parameter in the <policy1>,<policy2>... format. Valid values:
    • unsupported: Fall back to the offline mode if the MCQA feature is not supported.
    • upgrading: Fall back to the offline mode if MaxCompute is in upgrade.
    • noresource: Fall back to the offline mode if resources are insufficient.
    • timeout: Fall back to the offline mode if a connection timeout error occurs.
    • generic: Fall back to the offline mode if an unknown error occurs.
    • default: Fall back to the offline node if the fallback condition for unsupported, upgrading, noresource, or timeout is met. If you do not specify fallback_policy in the SQLAlchemy connection string, this value is used as the fallback policy.
    • all: Fall back to the offline mode regardless of the reason why query acceleration fails.

    For example, if you want to enable the MCQA feature and the forcible reuse of connections and also want to fall back to the offline mode if the fallback condition for unsupported, upgrading, or noresource is met, configure the SQLAlchemy connection string in the following format: odps://<accesskey_id>:<accesskey_secret>@<MaxCompute_project_name>/?endpoint=<MaxCompute_endpoint>&interactive_mode=true&reuse_odps=true&fallback_policy=unsupported,upgrading,noresource.

  4. Click TEST CONNECTION. If Connection looks good! appears in the lower-right corner of the page, the connectivity test is successful. In this case, click ADD to add the MaxCompute project to Superset.
    Test connectivity

Step 2: Use Superset to query and visualize data

After you configure the MaxCompute data source, you can add datasets to query and visualize data in MaxCompute tables. You can perform the following operations. For more information, see Superset.
  • View all tables in the MaxCompute project

    In the top navigation bar of the Superset web UI, choose Data > Datasets. Then, click +DATASET in the upper-right corner of the page. In the Add dataset dialog box, set DATASOURCE to the name of the data source you added in Step 1 and SCHEMA to the name of your MaxCompute project. Then, all tables in the MaxCompute project are displayed in the TABLE drop-down list.

    View all tables in the MaxCompute project
  • View the schema of a table

    In the top navigation bar of the Superset web UI, choose Data > Datasets. Then, click +DATASET in the upper-right corner of the page. In the Add dataset dialog box, set DATASOURCE to the name of the data source that you added in Step 1, set SCHEMA to the name of your MaxCompute project, set TABLE to the name of the table whose schema you want to view, and then click ADD. Then, the schema of the table is displayed in the Columns section of the page.

    View the schema of a table
  • View data in a table

    Enter and run an SQL script in the SQL editor to view data in a table.

    View data in a table
  • Visualize data

    On the Datasets page, click a table, select a chart type, and then configure properties to visualize data in the table based on your business requirements.

    Visualize data