You can connect Superset to a MaxCompute project and use Superset to explore and visualize data in the MaxCompute project. This topic describes how to use PyODPS to connect Superset to a MaxCompute project and visualize MaxCompute data on the Superset UI.
Background information
Apache Superset is a modern data exploration and visualization platform. It is a fast, lightweight, and intuitive platform that supports various types of charts, which range from simple line charts to highly detailed geospatial charts. For more information about Superset, see Superset.
Prerequisites
Make sure that the following conditions are met:
A MaxCompute project is created.
For more information about how to create a MaxCompute project, see Creates a MaxCompute project.
The AccessKey ID and AccessKey secret that are used to access the MaxCompute project are obtained.
You can obtain the AccessKey ID and AccessKey secret on the AccessKey Pair page.
Superset is installed.
You can install Superset by following instructions in the official Superset documentation. For more information, see Superset Documentation. You can also use Docker to install Superset and perform additional steps based on the official Superset documentation. For more information, see Adding New Database Drivers in Docker.
In this topic, Superset 1.1.0 is used.
PyODPS 0.10.7 or later is installed.
We recommend that you use Python 3 to install PyODPS. For more information, see Install PyODPS.
In this topic, PyODPS 0.10.7 is used.
Step 1: Connect Superset to MaxCompute
Start Superset.
For more information, see Superset Documentation.
In the top navigation bar, choose . Then, click +DATABASE in the upper-right corner.
In the Add Database dialog box, configure the parameters as described in the following table.
Parameter
Description
DATABASE NAME
The name of the data source that you want to add.
SQLALCHEMY URI
The SQLAlchemy connection string that is used to connect Superset to the MaxCompute project. You must configure this parameter in the format of
odps://<accesskey_id>:<accesskey_secret>@<MaxCompute_project_name>/?endpoint=<MaxCompute_endpoint>
, where:<accesskey_id>: required. The AccessKey ID that is used to access the MaxCompute project.
You can obtain the AccessKey ID on the AccessKey Pair page.
<accesskey_secret>: required. The AccessKey secret that corresponds to the AccessKey ID.
You can obtain the AccessKey secret on the AccessKey Pair page.
<MaxCompute_project_name>: required. The name of the MaxCompute project.
This parameter specifies the name of your MaxCompute project instead of the DataWorks workspace to which the MaxCompute project corresponds. You can log on to the MaxCompute console, select the region where your MaxCompute project resides in the top navigation bar, and then view the name of the MaxCompute project on the Project management tab.
<MaxCompute_endpoint>: required. The endpoint of MaxCompute. Configure this parameter based on the region where the MaxCompute project resides.
For more information about the endpoints of MaxCompute in different regions, see Endpoints.
If you want to enable the MaxCompute Query Acceleration (MCQA) feature, add the parameters that are described in the following table to the end of the SQLAlchemy connection string.
Parameter
Value
Description
interactive_mode
true
Specifies whether to enable the MCQA feature.
reuse_odps
true
Optional. This parameter specifies whether to forcibly reuse connections. We recommend that you set this parameter to true. By default, Superset forcibly creates a connection for each SQL request. The forcible reuse of connections simplifies the creation process.
fallback_policy
unsupported
upgrading
noresource
timeout
generic
default
all
Optional. This parameter specifies the fallback policy that is used when query acceleration fails. You must configure this parameter in the
<policy1>,<policy2>...
format. Valid values:unsupported: Fall back to the offline mode if the MCQA feature is not supported.
upgrading: Fall back to the offline mode if MaxCompute is in upgrade.
noresource: Fall back to the offline mode if resources are insufficient.
timeout: Fall back to the offline mode if a connection timeout error occurs.
generic: Fall back to the offline mode if an unknown error occurs.
default: Fall back to the offline node if the fallback condition for unsupported, upgrading, noresource, or timeout is met. If you do not specify fallback_policy in the SQLAlchemy connection string, this value is used as the fallback policy.
all: Fall back to the offline mode regardless of the reason why query acceleration fails.
For example, if you want to enable the MCQA feature and the forcible reuse of connections and also want to fall back to the offline mode if the fallback condition for unsupported, upgrading, or noresource is met, configure the SQLAlchemy connection string in the following format:
odps://<accesskey_id>:<accesskey_secret>@<MaxCompute_project_name>/?endpoint=<MaxCompute_endpoint>&interactive_mode=true&reuse_odps=true&fallback_policy=unsupported,upgrading,noresource
.Click TEST CONNECTION. If Connection looks good! appears in the lower-right corner of the page, the connectivity test is successful. In this case, click ADD to add the MaxCompute project to Superset.
Step 2: Use Superset to query and visualize data
After you configure the MaxCompute data source, you can add datasets to query and visualize data in MaxCompute tables. You can perform the following operations. For more information, see Superset.
View all tables in the MaxCompute project
In the top navigation bar of the Superset web UI, choose
. Then, click +DATASET in the upper-right corner of the page. In the Add dataset dialog box, set DATASOURCE to the name of the data source you added in Step 1 and SCHEMA to the name of your MaxCompute project. Then, all tables in the MaxCompute project are displayed in the TABLE drop-down list.View the schema of a table
In the top navigation bar of the Superset web UI, choose
. Then, click +DATASET in the upper-right corner of the page. In the Add dataset dialog box, set DATASOURCE to the name of the data source that you added in Step 1, set SCHEMA to the name of your MaxCompute project, set TABLE to the name of the table whose schema you want to view, and then click ADD. Then, the schema of the table is displayed in the Columns section of the page.View data in a table
Enter and run an SQL script in the SQL editor to view data in a table.
Visualize data
On the Datasets page, click a table, select a chart type, and then configure properties to visualize data in the table based on your business requirements.