All Products
Search
Document Center

Dataphin:Create a cluster in single-tenant multi-engine mode

Last Updated:Jun 23, 2026

Permissions

  • Users with the super administrator, system administrator, or a custom global role that includes the Cluster Management - Manage permission can create and manage clusters. These users can also specify which users can reference the cluster when creating a compute source and assign cluster administrators to the cluster.

  • Cluster administrators can manage their assigned clusters.

  • Users with a global role that includes the Compute Source Management - Create permission can reference authorized clusters when creating a compute source.

Create a cluster

  1. In the top navigation bar of the Dataphin homepage, choose Plan > Cluster Management.

  2. On the Cluster Management page, click Create Cluster.

  3. On the Create Cluster page, configure the following parameters.

    • Basic information

      Parameter

      Description

      Cluster name

      Enter a name for the cluster. The name can contain Chinese characters, English letters, digits, spaces, and the following special characters: -_.@~(). The name cannot exceed 128 characters.

      Engine type

      Select one of the following engines:

      • MaxCompute

      • AnalyticDB for PostgreSQL

      • Aliyun EMR 3.x

      • Aliyun EMR 5.x

      • CDH 5.x

      • CDH 6.x

      • Cloudera Data Platform 7.x

      • Huawei FusionInsight 8.x

      • AsiaInfo DP 5.3

      • StarRocks

      • Databricks

      • Amazon EMR

      • SelectDB

      • Doris

      • GaussDB(DWS)

      • Transwarp TDH 6.x

      • Transwarp TDH 9.3.x

      • Transwarp ArgoDB

      • Lindorm (compute engine)

      • Hologres

      • OushuDB

      • Aliyun EMR Serverless Spark

      Cluster administrator

      Select one or more tenant members as cluster administrators. Cluster administrators can edit, view version history, and delete the cluster.

      Description (Optional)

      Enter a brief description for the cluster. The description cannot exceed 128 characters.

    • Cluster security control

      Authorized users: Specify which users can reference this cluster configuration when creating a compute source. You can select Roles with "Create Compute Source" permission or Specified users.

      • Roles with "Create Compute Source" permission: Selected by default.

      • Specified users: You can select one or more individual accounts and user groups.

    • Cluster configuration

      MaxCompute

      Parameter

      Description

      Endpoint

      Enter the endpoint of the compute engine, for example, http://service.odps.aliyun.com/api.

      AccessKey ID

      Enter the AccessKey ID and AccessKey Secret of an account with access to the MaxCompute project data.

      You can obtain the AccessKey ID and AccessKey Secret from the User Information Management page.

      Important
      • For a stable connection between your Dataphin and MaxCompute projects, use the AccessKey of a MaxCompute project administrator.

      • To ensure proper metadata collection, avoid changing the AccessKey of the MaxCompute project.

      AccessKey Secret

      Hadoop

      Hadoop includes the CDH 5.x, CDH 6.x, Cloudera Data Platform 7.x, Aliyun EMR 3.x, Aliyun EMR 5.x, AsiaInfo DP 5.3, and Huawei FusionInsight 8.x engines.

      In single-tenant multi-engine mode, the configuration for Hadoop clusters, HDFS compute engines, Hive metadata, Spark JAR services, Spark SQL services, and Impala tasks is the same as in single-engine mode. For more information, see Hadoop cluster configuration.

      AnalyticDB for PostgreSQL

      In single-tenant multi-engine mode, the cluster configuration for AnalyticDB for PostgreSQL is the same as in single-engine mode. For more information, see AnalyticDB for PostgreSQL cluster configuration.

      Transwarp TDH 6.x and 9.3.x

      In single-tenant multi-engine mode, the configuration for Transwarp TDH 6.x and Transwarp TDH 9.3.x clusters, HDFS information, Inceptor, and Inceptor metadata connection is the same as in single-engine mode. For more information, see Transwarp TDH cluster configuration.

      Transwarp ArgoDB

      In single-tenant multi-engine mode, the configuration for Transwarp ArgoDB clusters, HDFS information, ArgoDB, and ArgoDB metadata connection is the same as in single-engine mode. For more information, see Transwarp ArgoDB cluster configuration.

      SelectDB, Doris, and StarRocks

      In single-tenant multi-engine mode, the cluster configurations for SelectDB, Doris, and StarRocks are the same as in single-engine mode. For more information, see SelectDB and Doris cluster configuration and StarRocks cluster configuration.

      Databricks

      In single-tenant multi-engine mode, the cluster configuration for Databricks is the same as in single-engine mode. For more information, see Databricks cluster configuration.

      Amazon EMR

      In single-tenant multi-engine mode, the cluster configuration for Amazon EMR is the same as in single-engine mode. For more information, see Amazon EMR cluster configuration.

      Lindorm (compute engine)

      Parameter

      Description

      core-site.xml

      Upload the core-site.xml, hdfs-site.xml, and hive-site.xml configuration files for Lindorm (compute engine). For more information about these files, see Connect to and use an instance.

      hdfs-site.xml

      hive-site.xml (Optional)

      JDBC URL

      Configure the JDBC URL for Lindorm (compute engine). To obtain the URL, see View connection string.

      Username, Password

      The username and password to access the Lindorm instance.

      GaussDB (DWS)

      Parameter

      Description

      Version

      Only version 9.1.0 is currently supported.

      JDBC URL

      Enter the JDBC connection string, for example, jdbc:postgresql://{host};{port}/{database name}.

      Username, Password

      Enter the username and password for the GaussDB (DWS) compute engine database.

      Hologres

      Parameter

      Description

      JDBC URL

      The connection string for the Hologres compute source. The format is jdbc:postgresql://host:port/dbname.

      Username, Password

      Enter the username and password for connecting to the compute source.

      If you use an Alibaba Cloud RAM account, enter its AccessKey ID and AccessKey Secret. If you use a database-native account, enter the ID and password for that account.

      OushuDB

      Parameter

      Description

      Version

      Only version 6.4.0 is currently supported.

      JDBC URL

      Enter the JDBC URL in the format jdbc:oushudb://{host}:{port}/.

      Default execution user, Password

      Enter the authentication username and password. To ensure tasks run properly, make sure the user has the required data permissions.

      Aliyun EMR Serverless Spark

      Parameter

      Description

      Endpoint

      Enter the endpoint for the Aliyun EMR Serverless Spark OpenAPI (SDK).

      AccessKey ID, AccessKey Secret

      Enter the AccessKey ID and AccessKey Secret.

      Workspace

      Select a workspace that the RAM account associated with the AccessKey has joined (ListWorkspaces).

    • Other configurations

      MaxCompute

      Parameter

      Description

      Default storage format for external tables

      The default storage format for new external tables. You can select one of the following formats:

      • parquet

      • avro

      • rcfile

      • orc

      • textfile

      • sequencefile

      MCQA acceleration for ad hoc queries

      Enables MCQA acceleration for ad hoc queries in MaxCompute engine projects.

      LogView URL in logs

      Specifies the display format for LogView URLs in logs. You can select Display in plaintext or Hide when execution statements contain global variables for username and password.

      Default lifecycle

      The default lifecycle for physical and logical tables. You can enter a value from 1 to 36,500 days, or select 7, 14, 30, or 360 days.

      Enable custom parameters

      Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

      Hadoop

      Hadoop includes the CDH 5.x, CDH 6.x, Cloudera Data Platform 7.x, Aliyun EMR 3.x, Aliyun EMR 5.x, AsiaInfo DP 5.3, and Huawei FusionInsight 8.x engines.

      Parameter

      Description

      Default storage format

      The default storage format for new tables created in Table Management. You can select one of the following formats:

      • Engine Default (can be specified in CREATE TABLE statements)

      • hudi

      • delta (Delta Lake)

      • paimon

      • iceberg

      • kudu

      • parquet

      • avro

      • rcfile

      • orc

      • textfile

      • sequencefile

      Note

      You can select the hudi, delta (Delta Lake), paimon, or iceberg formats only after you enable Spark SQL Service Configuration. You can select the kudu format only after you enable Impala Task Configuration.

      Default compute engine for standard modeling

      You can select Hive, Spark, or Impala.

      Note

      You can select Spark only after you enable Spark SQL Service Configuration. You can select Impala only after you enable Impala Task Configuration.

      Enable custom parameters

      Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

      AnalyticDB for PostgreSQL and OushuDB

      Enable custom parameters: Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

      Transwarp TDH 6.x and 9.3.x, Lindorm (compute engine), and Aliyun EMR Serverless Spark

      Parameter

      Description

      Default storage format

      The default storage format for new tables created in Table Management. You can select one of the following formats:

      • Engine Default (can be specified in CREATE TABLE statements)

      • parquet

      • avro

      • rcfile

      • orc

      • textfile

      • sequencefile

      Enable custom parameters

      Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

      Transwarp ArgoDB, SelectDB, StarRocks, and Doris

      Enable custom parameters: Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

      Databricks

      Parameter

      Description

      Default storage format

      The default storage format for new tables created in Table Management. You can select one of the following formats:

      • Engine Default (can be specified in CREATE TABLE statements)

      • parquet

      • avro

      • orc

      • binaryfile

      • csv

      • json

      • text

      Enable custom parameters

      Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

      Amazon EMR

      Parameter

      Description

      Default storage format

      The default storage format for new tables created in Table Management. You can select one of the following formats:

      • Engine Default (can be specified in CREATE TABLE statements)

      • hudi

      • delta (Delta Lake)

      • paimon

      • iceberg

      • parquet

      • avro

      • rcfile

      • orc

      • textfile

      • sequencefile

      Note

      You can select the hudi, delta (Delta Lake), paimon, or iceberg formats only after you enable Spark SQL Service Configuration.

      Default compute engine for standard modeling

      You can select Hive or Spark.

      Note

      You can select Spark only after you enable Spark SQL Service Configuration.

      Enable custom parameters

      Applies custom parameters globally to the compute engine's code generation rules, controlling runtime behavior and resource allocation — for example, default memory, priority, and MapJoin settings. The custom parameter configuration must be compatible with the engine type.

  4. Click Test connection. The system automatically tests the connection to each configured service.

    If the test passes, you can save the configuration. If it fails, a Connection Test Failed dialog box lists the failed services with error details.

  5. After the connection test succeeds, click Save to create the cluster.