Create a cluster in single-tenant multi-engine mode - Dataphin

Permissions

Users with the super administrator, system administrator, or a custom global role that includes the Cluster Management - Manage permission can create and manage clusters. These users can also specify which users can reference the cluster when creating a compute source and assign cluster administrators to the cluster.
Cluster administrators can manage their assigned clusters.
Users with a global role that includes the Compute Source Management - Create permission can reference authorized clusters when creating a compute source.

Create a cluster

In the top navigation bar of the Dataphin homepage, choose Plan > Cluster Management.
On the Cluster Management page, click Create Cluster.

On the Create Cluster page, configure the following parameters.

Basic information

Parameter	Description
Cluster name	Enter a name for the cluster. The name can contain Chinese characters, English letters, digits, spaces, and the following special characters: `-_.@~()`. The name cannot exceed 128 characters.
Engine type	Select one of the following engines: MaxCompute AnalyticDB for PostgreSQL Aliyun EMR 3.x Aliyun EMR 5.x CDH 5.x CDH 6.x Cloudera Data Platform 7.x Huawei FusionInsight 8.x AsiaInfo DP 5.3 StarRocks Databricks Amazon EMR SelectDB Doris GaussDB(DWS) Transwarp TDH 6.x Transwarp TDH 9.3.x Transwarp ArgoDB Lindorm (compute engine) Hologres OushuDB Aliyun EMR Serverless Spark
Cluster administrator	Select one or more tenant members as cluster administrators. Cluster administrators can edit, view version history, and delete the cluster.
Description (Optional)	Enter a brief description for the cluster. The description cannot exceed 128 characters.

Cluster security control

Authorized users: Specify which users can reference this cluster configuration when creating a compute source. You can select Roles with "Create Compute Source" permission or Specified users.
- Roles with "Create Compute Source" permission: Selected by default.
- Specified users: You can select one or more individual accounts and user groups.

Cluster configuration

MaxCompute

Parameter	Description
Endpoint	Enter the endpoint of the compute engine, for example, `http://service.odps.aliyun.com/api`.
AccessKey ID	Enter the AccessKey ID and AccessKey Secret of an account with access to the MaxCompute project data. You can obtain the AccessKey ID and AccessKey Secret from the User Information Management page. Important For a stable connection between your Dataphin and MaxCompute projects, use the AccessKey of a MaxCompute project administrator. To ensure proper metadata collection, avoid changing the AccessKey of the MaxCompute project.
AccessKey Secret

Hadoop

Hadoop includes the CDH 5.x, CDH 6.x, Cloudera Data Platform 7.x, Aliyun EMR 3.x, Aliyun EMR 5.x, AsiaInfo DP 5.3, and Huawei FusionInsight 8.x engines.

In single-tenant multi-engine mode, the configuration for Hadoop clusters, HDFS compute engines, Hive metadata, Spark JAR services, Spark SQL services, and Impala tasks is the same as in single-engine mode. For more information, see Hadoop cluster configuration.

AnalyticDB for PostgreSQL

In single-tenant multi-engine mode, the cluster configuration for AnalyticDB for PostgreSQL is the same as in single-engine mode. For more information, see AnalyticDB for PostgreSQL cluster configuration.

Transwarp TDH 6.x and 9.3.x

In single-tenant multi-engine mode, the configuration for Transwarp TDH 6.x and Transwarp TDH 9.3.x clusters, HDFS information, Inceptor, and Inceptor metadata connection is the same as in single-engine mode. For more information, see Transwarp TDH cluster configuration.

Transwarp ArgoDB

In single-tenant multi-engine mode, the configuration for Transwarp ArgoDB clusters, HDFS information, ArgoDB, and ArgoDB metadata connection is the same as in single-engine mode. For more information, see Transwarp ArgoDB cluster configuration.

SelectDB, Doris, and StarRocks

In single-tenant multi-engine mode, the cluster configurations for SelectDB, Doris, and StarRocks are the same as in single-engine mode. For more information, see SelectDB and Doris cluster configuration and StarRocks cluster configuration.

Databricks

In single-tenant multi-engine mode, the cluster configuration for Databricks is the same as in single-engine mode. For more information, see Databricks cluster configuration.

Amazon EMR

In single-tenant multi-engine mode, the cluster configuration for Amazon EMR is the same as in single-engine mode. For more information, see Amazon EMR cluster configuration.

Lindorm (compute engine)

Parameter	Description
core-site.xml	Upload the core-site.xml, hdfs-site.xml, and hive-site.xml configuration files for Lindorm (compute engine). For more information about these files, see Connect to and use an instance.
hdfs-site.xml
hive-site.xml (Optional)
JDBC URL	Configure the JDBC URL for Lindorm (compute engine). To obtain the URL, see View connection string.
Username, Password	The username and password to access the Lindorm instance.

GaussDB (DWS)

Parameter	Description
Version	Only version 9.1.0 is currently supported.
JDBC URL	Enter the JDBC connection string, for example, `jdbc:postgresql://{host};{port}/{database name}`.
Username, Password	Enter the username and password for the GaussDB (DWS) compute engine database.

Hologres

Parameter

Description

JDBC URL

The connection string for the Hologres compute source. The format is jdbc:postgresql://host:port/dbname.

Username, Password

Enter the username and password for connecting to the compute source.

If you use an Alibaba Cloud RAM account, enter its AccessKey ID and AccessKey Secret. If you use a database-native account, enter the ID and password for that account.

OushuDB

Parameter	Description
Version	Only version 6.4.0 is currently supported.
JDBC URL	Enter the JDBC URL in the format `jdbc:oushudb://{host}:{port}/`.
Default execution user, Password	Enter the authentication username and password. To ensure tasks run properly, make sure the user has the required data permissions.

Aliyun EMR Serverless Spark

Parameter	Description
Endpoint	Enter the endpoint for the Aliyun EMR Serverless Spark OpenAPI (SDK).
AccessKey ID, AccessKey Secret	Enter the AccessKey ID and AccessKey Secret.
Workspace	Select a workspace that the RAM account associated with the AccessKey has joined (ListWorkspaces).

Other configurations

MaxCompute

Parameter	Description
Default storage format for external tables	The default storage format for new external tables. You can select one of the following formats: parquet avro rcfile orc textfile sequencefile
MCQA acceleration for ad hoc queries	Enables MCQA acceleration for ad hoc queries in MaxCompute engine projects.
LogView URL in logs	Specifies the display format for LogView URLs in logs. You can select Display in plaintext or Hide when execution statements contain global variables for username and password.
Default lifecycle	The default lifecycle for physical and logical tables. You can enter a value from 1 to 36,500 days, or select 7, 14, 30, or 360 days.
Enable custom parameters	Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

Hadoop

Hadoop includes the CDH 5.x, CDH 6.x, Cloudera Data Platform 7.x, Aliyun EMR 3.x, Aliyun EMR 5.x, AsiaInfo DP 5.3, and Huawei FusionInsight 8.x engines.

Parameter	Description
Default storage format	The default storage format for new tables created in Table Management. You can select one of the following formats: Engine Default (can be specified in CREATE TABLE statements) hudi delta (Delta Lake) paimon iceberg kudu parquet avro rcfile orc textfile sequencefile Note You can select the hudi, delta (Delta Lake), paimon, or iceberg formats only after you enable Spark SQL Service Configuration. You can select the kudu format only after you enable Impala Task Configuration.
Default compute engine for standard modeling	You can select Hive, Spark, or Impala. Note You can select Spark only after you enable Spark SQL Service Configuration. You can select Impala only after you enable Impala Task Configuration.
Enable custom parameters	Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

AnalyticDB for PostgreSQL and OushuDB

Enable custom parameters: Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

Transwarp TDH 6.x and 9.3.x, Lindorm (compute engine), and Aliyun EMR Serverless Spark

Parameter

Description

Default storage format

The default storage format for new tables created in Table Management. You can select one of the following formats:

Engine Default (can be specified in CREATE TABLE statements)
parquet
avro
rcfile
orc
textfile
sequencefile

Enable custom parameters

Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

Transwarp ArgoDB, SelectDB, StarRocks, and Doris

Databricks

Parameter

Description

Default storage format

The default storage format for new tables created in Table Management. You can select one of the following formats:

Engine Default (can be specified in CREATE TABLE statements)
parquet
avro
orc
binaryfile
csv
json
text

Enable custom parameters

Amazon EMR

Parameter	Description
Default storage format	The default storage format for new tables created in Table Management. You can select one of the following formats: Engine Default (can be specified in CREATE TABLE statements) hudi delta (Delta Lake) paimon iceberg parquet avro rcfile orc textfile sequencefile Note You can select the hudi, delta (Delta Lake), paimon, or iceberg formats only after you enable Spark SQL Service Configuration.
Default compute engine for standard modeling	You can select Hive or Spark. Note You can select Spark only after you enable Spark SQL Service Configuration.
Enable custom parameters	Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

Click Test connection. The system automatically tests the connection to each configured service.

If the test passes, you can save the configuration. If it fails, a Connection Test Failed dialog box lists the failed services with error details.
After the connection test succeeds, click Save to create the cluster.