Permissions
-
Users with the super administrator, system administrator, or a custom global role that includes the Cluster Management - Manage permission can create and manage clusters. These users can also specify which users can reference the cluster when creating a compute source and assign cluster administrators to the cluster.
-
Cluster administrators can manage their assigned clusters.
-
Users with a global role that includes the Compute Source Management - Create permission can reference authorized clusters when creating a compute source.
Create a cluster
-
In the top navigation bar of the Dataphin homepage, choose Plan > Cluster Management.
-
On the Cluster Management page, click Create Cluster.
-
On the Create Cluster page, configure the following parameters.
-
Basic information
Parameter
Description
Cluster name
Enter a name for the cluster. The name can contain Chinese characters, English letters, digits, spaces, and the following special characters:
-_.@~(). The name cannot exceed 128 characters.Engine type
Select one of the following engines:
-
MaxCompute
-
AnalyticDB for PostgreSQL
-
Aliyun EMR 3.x
-
Aliyun EMR 5.x
-
CDH 5.x
-
CDH 6.x
-
Cloudera Data Platform 7.x
-
Huawei FusionInsight 8.x
-
AsiaInfo DP 5.3
-
StarRocks
-
Databricks
-
Amazon EMR
-
SelectDB
-
Doris
-
GaussDB(DWS)
-
Transwarp TDH 6.x
-
Transwarp TDH 9.3.x
-
Transwarp ArgoDB
-
Lindorm (compute engine)
-
Hologres
-
OushuDB
-
Aliyun EMR Serverless Spark
Cluster administrator
Select one or more tenant members as cluster administrators. Cluster administrators can edit, view version history, and delete the cluster.
Description (Optional)
Enter a brief description for the cluster. The description cannot exceed 128 characters.
-
-
Cluster security control
Authorized users: Specify which users can reference this cluster configuration when creating a compute source. You can select Roles with "Create Compute Source" permission or Specified users.
-
Roles with "Create Compute Source" permission: Selected by default.
-
Specified users: You can select one or more individual accounts and user groups.
-
-
Cluster configuration
MaxCompute
Parameter
Description
Endpoint
Enter the endpoint of the compute engine, for example,
http://service.odps.aliyun.com/api.AccessKey ID
Enter the AccessKey ID and AccessKey Secret of an account with access to the MaxCompute project data.
You can obtain the AccessKey ID and AccessKey Secret from the User Information Management page.
Important-
For a stable connection between your Dataphin and MaxCompute projects, use the AccessKey of a MaxCompute project administrator.
-
To ensure proper metadata collection, avoid changing the AccessKey of the MaxCompute project.
AccessKey Secret
Hadoop
Hadoop includes the CDH 5.x, CDH 6.x, Cloudera Data Platform 7.x, Aliyun EMR 3.x, Aliyun EMR 5.x, AsiaInfo DP 5.3, and Huawei FusionInsight 8.x engines.
In single-tenant multi-engine mode, the configuration for Hadoop clusters, HDFS compute engines, Hive metadata, Spark JAR services, Spark SQL services, and Impala tasks is the same as in single-engine mode. For more information, see Hadoop cluster configuration.
AnalyticDB for PostgreSQL
In single-tenant multi-engine mode, the cluster configuration for AnalyticDB for PostgreSQL is the same as in single-engine mode. For more information, see AnalyticDB for PostgreSQL cluster configuration.
Transwarp TDH 6.x and 9.3.x
In single-tenant multi-engine mode, the configuration for Transwarp TDH 6.x and Transwarp TDH 9.3.x clusters, HDFS information, Inceptor, and Inceptor metadata connection is the same as in single-engine mode. For more information, see Transwarp TDH cluster configuration.
Transwarp ArgoDB
In single-tenant multi-engine mode, the configuration for Transwarp ArgoDB clusters, HDFS information, ArgoDB, and ArgoDB metadata connection is the same as in single-engine mode. For more information, see Transwarp ArgoDB cluster configuration.
SelectDB, Doris, and StarRocks
In single-tenant multi-engine mode, the cluster configurations for SelectDB, Doris, and StarRocks are the same as in single-engine mode. For more information, see SelectDB and Doris cluster configuration and StarRocks cluster configuration.
Databricks
In single-tenant multi-engine mode, the cluster configuration for Databricks is the same as in single-engine mode. For more information, see Databricks cluster configuration.
Amazon EMR
In single-tenant multi-engine mode, the cluster configuration for Amazon EMR is the same as in single-engine mode. For more information, see Amazon EMR cluster configuration.
Lindorm (compute engine)
Parameter
Description
core-site.xml
Upload the core-site.xml, hdfs-site.xml, and hive-site.xml configuration files for Lindorm (compute engine). For more information about these files, see Connect to and use an instance.
hdfs-site.xml
hive-site.xml (Optional)
JDBC URL
Configure the JDBC URL for Lindorm (compute engine). To obtain the URL, see View connection string.
Username, Password
The username and password to access the Lindorm instance.
GaussDB (DWS)
Parameter
Description
Version
Only version 9.1.0 is currently supported.
JDBC URL
Enter the JDBC connection string, for example,
jdbc:postgresql://{host};{port}/{database name}.Username, Password
Enter the username and password for the GaussDB (DWS) compute engine database.
Hologres
Parameter
Description
JDBC URL
The connection string for the Hologres compute source. The format is
jdbc:postgresql://host:port/dbname.Username, Password
Enter the username and password for connecting to the compute source.
If you use an Alibaba Cloud RAM account, enter its AccessKey ID and AccessKey Secret. If you use a database-native account, enter the ID and password for that account.
OushuDB
Parameter
Description
Version
Only version 6.4.0 is currently supported.
JDBC URL
Enter the JDBC URL in the format
jdbc:oushudb://{host}:{port}/.Default execution user, Password
Enter the authentication username and password. To ensure tasks run properly, make sure the user has the required data permissions.
Aliyun EMR Serverless Spark
Parameter
Description
Endpoint
Enter the endpoint for the Aliyun EMR Serverless Spark OpenAPI (SDK).
AccessKey ID, AccessKey Secret
Enter the AccessKey ID and AccessKey Secret.
Workspace
Select a workspace that the RAM account associated with the AccessKey has joined (ListWorkspaces).
-
-
Other configurations
MaxCompute
Parameter
Description
Default storage format for external tables
The default storage format for new external tables. You can select one of the following formats:
-
parquet
-
avro
-
rcfile
-
orc
-
textfile
-
sequencefile
MCQA acceleration for ad hoc queries
Enables MCQA acceleration for ad hoc queries in MaxCompute engine projects.
LogView URL in logs
Specifies the display format for LogView URLs in logs. You can select Display in plaintext or Hide when execution statements contain global variables for username and password.
Default lifecycle
The default lifecycle for physical and logical tables. You can enter a value from 1 to 36,500 days, or select 7, 14, 30, or 360 days.
Enable custom parameters
Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.
Hadoop
Hadoop includes the CDH 5.x, CDH 6.x, Cloudera Data Platform 7.x, Aliyun EMR 3.x, Aliyun EMR 5.x, AsiaInfo DP 5.3, and Huawei FusionInsight 8.x engines.
Parameter
Description
Default storage format
The default storage format for new tables created in Table Management. You can select one of the following formats:
-
Engine Default (can be specified in CREATE TABLE statements)
-
hudi
-
delta (Delta Lake)
-
paimon
-
iceberg
-
kudu
-
parquet
-
avro
-
rcfile
-
orc
-
textfile
-
sequencefile
NoteYou can select the hudi, delta (Delta Lake), paimon, or iceberg formats only after you enable Spark SQL Service Configuration. You can select the kudu format only after you enable Impala Task Configuration.
Default compute engine for standard modeling
You can select Hive, Spark, or Impala.
NoteYou can select Spark only after you enable Spark SQL Service Configuration. You can select Impala only after you enable Impala Task Configuration.
Enable custom parameters
Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.
AnalyticDB for PostgreSQL and OushuDB
Enable custom parameters: Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.
Transwarp TDH 6.x and 9.3.x, Lindorm (compute engine), and Aliyun EMR Serverless Spark
Parameter
Description
Default storage format
The default storage format for new tables created in Table Management. You can select one of the following formats:
-
Engine Default (can be specified in CREATE TABLE statements)
-
parquet
-
avro
-
rcfile
-
orc
-
textfile
-
sequencefile
Enable custom parameters
Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.
Transwarp ArgoDB, SelectDB, StarRocks, and Doris
Enable custom parameters: Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.
Databricks
Parameter
Description
Default storage format
The default storage format for new tables created in Table Management. You can select one of the following formats:
-
Engine Default (can be specified in CREATE TABLE statements)
-
parquet
-
avro
-
orc
-
binaryfile
-
csv
-
json
-
text
Enable custom parameters
Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.
Amazon EMR
Parameter
Description
Default storage format
The default storage format for new tables created in Table Management. You can select one of the following formats:
-
Engine Default (can be specified in CREATE TABLE statements)
-
hudi
-
delta (Delta Lake)
-
paimon
-
iceberg
-
parquet
-
avro
-
rcfile
-
orc
-
textfile
-
sequencefile
NoteYou can select the hudi, delta (Delta Lake), paimon, or iceberg formats only after you enable Spark SQL Service Configuration.
Default compute engine for standard modeling
You can select Hive or Spark.
NoteYou can select Spark only after you enable Spark SQL Service Configuration.
Enable custom parameters
Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.
-
-
-
Click Test connection. The system automatically tests the connection to each configured service.
If the test passes, you can save the configuration. If it fails, a Connection Test Failed dialog box lists the failed services with error details.
-
After the connection test succeeds, click Save to create the cluster.