All Products
Search
Document Center

Elastic High Performance Computing:Create a cluster by using a template

Last Updated:Oct 31, 2024

You can create an Elastic High Performance Computing (E-HPC) cluster based on a cluster template in the E-HPC console. Default templates contain the parameters that are required to create a cluster, such as the parameters related to zone, deployment mode, and image. You can save common parameter settings as a template and then use the template to create clusters in an efficient manner.

Prerequisites

  • A service-linked role for E-HPC is created. The first time you log on to the E-HPC console, you are prompted to create a service-linked role for E-HPC.

  • A virtual private cloud (VPC) and a vSwitch are created. For more information, see Create and manage a VPC and Create and manage a vSwitch.

  • File Storage NAS (NAS) is activated. A NAS file system and a mount target are created. For more information, see Create a file system and Manage mount targets.

Background information

A cluster provides computing resources and storage resources. You can submit jobs, debug jobs, store results, and view results in the cluster. Before you create and use an E-HPC cluster, take note of the following information:

  • You can create up to three clusters in a region. To create more clusters, submit a ticket.

  • You are charged E-HPC service fees and other resource fees when you create a cluster. For more information, see Billable items.

  • Do not use the Elastic Compute Service (ECS) console to manage nodes in the cluster. We recommend that you manage the nodes in a cluster in the E-HPC console rather than the Elastic Compute Service (ECS) console.

Procedure

  1. Go to the template configuration page.

    1. Log on to the E-HPC console.

    2. In the upper-left corner of the top navigation bar, select a region.

    3. In the top navigation bar, click Cluster.

    4. On the Cluster page, move the pointer over the drop-down icon to the right of Create Cluster and click Cluster Template.

  2. In the Basic Configurations section, enter the cluster name and logon password.

  3. In the template configuration section, configure the template parameters.

    The following section describes how to configure the parameters in the template configuration section:

    • 打开本地模板..png: imports a cluster template from the on-premises machine.

    • 保存模板..png: saves the cluster template to your on-premises machine.

    • 选择模板..png: selects a template provided by E-HPC.

    • 模板-查看网络..png: views network and storage configurations. The network configurations include the virtual private cloud (VPC) ID and vSwitch ID. The storage configurations include the File Storage NAS (NAS) file system ID and mount target.

    • 模板-查询规格..png: queries supported instance types.

    This topic uses the following template as an example. Modify the parameters based on your business requirements.

    Note

    If you select a Batch Serverless template to create a serverless cluster, see Create a serverless cluster.

    [Global]
    zoneId=cn-shenzhen-a
    ecsChargeType=PostPaid
    
    [Node]
    deployMode=Standard
    ecsOrderComputeInstanceType=ecs.c6.large
    ecsOrderComputeCount=1
    ecsOrderManagerInstanceType=ecs.c6.large
    ecsOrderLoginInstanceType=ecs.c6.large
    systemDiskSize=40
    
    [Image]
    osTag=CentOS_7.6_64
    imageOwnerAlias=system
    
    [Network]
    vpcId=vpc-wz9lq2oynq8tia5h8****
    vSwitchId=vsw-wz992iw34x8on06he****
    securityGroupId=sg-bp13n61xsydodfyg****
    
    [Storage]
    volumeId=2bfe3480a3
    volumeMountpoint=2bfe348***-cs***.cn-shenzhen.nas.aliyuncs.com
    
    [Scheduler]
    schedulerType=pbs
    
    [Account]
    accountType=nis

    The following table describes the parameters in the template.

    [Global] settings

    Parameter

    Required

    Example

    Description

    zoneId

    Yes

    cn-hangzhou-i

    The zone to which the cluster belongs. Only the logon node, management nodes, and compute nodes that are created with the cluster belong to this zone. You can select other zones during manual or automatic scale-outs.

    ecsChargeType

    Yes

    PostPaid

    The billing method of the nodes in the cluster. Valid values:

    • PostPaid: pay-as-you-go

    • PrePaid: subscription

    For more information, see Instance types.

    period

    No

    1

    If you set ecsChargeType to PrePaid, the subscription duration of the node is determined by this parameter and the periodUnit parameter. Valid values:

    • Valid values when periodUnit is set to Year: 1 to 3.

    • Valid values when periodUnit is set to Month: 1 to 9.

    • Valid values when periodUnit is set to Hour: 1.

    • Valid values when periodUnit is set to Week: 1 to 4.

    periodUnit

    No

    Year

    The unit of the subscription duration of the node if you set ecsChargeType to PrePaid. Valid values:

    • Year

    • Month

    • Hour

    • Week

    computeSpotStrategy

    No

    NoSpot

    The bidding policy of the compute node. Valid values:

    • NoSpot: The compute node is created as a pay-as-you-go instance.

    • SpotWithPriceLimit: The compute node is created as a preemptible instance that has a user-defined maximum hourly price.

    • SpotAsPriceGo: The compute node is created as a preemptible instance for which the market price at the time of purchase is used as the bid price.

    computeSpotPriceLimit

    No

    0.034

    The maximum hourly price of the node when you set computeSpotStrategy to SpotWithPriceLimit. This value is accurate to up to three decimal places.

    clusterVersion

    No

    2.0

    The version of the cluster. By default, the latest version is used.

    clientVersion

    No

    1.0.64

    The version of the E-HPC client. By default, the latest version is used. We recommend that you use the default value.

    isHybridCluster

    No

    false

    Specifies whether the cluster is a hybrid cloud cluster. Valid values:

    • true

    • false

    Default value: false.

    location

    No

    PublicCloud

    The type of the cluster. Valid values:

    • PublicCloud: public cloud cluster

    • ProxyOnline: hybrid cloud cluster

    • OnPremise: Deadline-integrated rendering cluster

    Default value: PublicCloud.

    [Node] settings

    Parameter

    Required

    Example

    Description

    deployMode

    Yes

    Standard

    The mode in which the E-HPC cluster is deployed. Valid values:

    • Standard: Two management nodes, one logon node, and multiple compute nodes are deployed. The domain account service and scheduling service are deployed on management nodes, respectively.

    • Simple: One management node, one logon node, and multiple compute nodes are deployed. The domain account service and scheduling service are deployed on the management node.

    • Tiny: One management node and multiple compute nodes are deployed. The domain account service, scheduling service, and logon service are deployed on the management node.

    Default value: Standard.

    systemDiskSize

    Yes

    40

    The size of the system disk. Unit: GiB.

    Valid values: 40 to 500

    Default value: 40.

    systemDiskType

    No

    cloud_essd

    The category of the system disk. Valid values:

    • cloud_essd: enhanced SSD (ESSD)

    • cloud_ssd: SSD

    • cloud_efficiency: ultra disk

    Default value: cloud_essd.

    systemDiskLevel

    No

    PL0

    The ESSD performance level (PL) when you set systemDiskType to cloud_essd. Valid values:

    • PL0

    • PL1

    • PL2

    • PL3

    For more information about the performance data of each level of disks, see ESSDs.

    ecsOrderLoginInstanceType

    No

    ecs.c7.xlarge

    The instance type of the logon node. If you do not specify this parameter, the system automatically uses the value of ecsOrderManagerInstanceType as the value of this parameter.

    ecsOrderManagerInstanceType

    No

    ecs.c7.xlarge

    The instance type of the management node.

    If you set location to PublicCloud or ProxyOnline, this parameter is required.

    ecsOrderComputeInstanceType

    No

    ecs.c7.xlarge

    The instance type of compute node. If you do not specify this parameter, the system automatically uses the value of ecsOrderManagerInstanceType as the value of this parameter.

    ecsOrderLoginCount

    No

    1

    The number of logon nodes. Set the value to 1.

    ecsOrderComputeCount

    No

    2

    The number of compute nodes in the cluster. Valid values: 1 to 99.

    computeEnableHt

    No

    true

    Specifies whether to enable hyper-threading for the compute nodes. Valid values:

    • true

    • false

    Default value: true.

    ramRoleName

    No

    AliyunECSInstanceForEHPCRole

    The Resource Access Management (RAM) role that you want to assign to the node.

    ramNodeTypes

    No

    [manager, login, compute]

    The type of the node to which you want to bind the RAM role. Valid values:

    • Valid values when deployMode is set to Standard: scheduler, account, login, and compute. You can specify multiple values. Separate multiple values with commas (,).

    • Valid values when deployMode is set to Simple: manager, login, and compute. You can specify multiple values. Separate multiple values with commas (,).

    • Valid values when deployMode is set to Tiny: manager and compute.

    localNodesCfg

    No

    [{\"Role\":\"AccountManager\",\"IpAddress\":\"172.16.XX.XX\",\"AccountType\":\"custom\",\"HostName\":\"proxymgr\"},{\"Role\":\"ResourceManager\",\"IpAddress\":\"172.16.XX.XX\",\"SchedulerType\":\"custom\",\"HostName\":\"manager\"}]

    The on-premises scheduler or domain account service to which the hybrid cloud cluster connects when you configure the on-premises management node of the cluster. The format is a JSON array that contains objects as elements in the string format. The object parameters include:

    • Role: the role of the on-premises scheduler or domain account service. Valid values: AccountManager and ResourceManager.

    • IpAddress: the IP address of the server.

    • HostName: the hostname of the server.

    • AccountType: the type of the domain account service. Valid values: nis and ldap.

    • SchedulerType: the type of the scheduler. Valid values: pbs, slurm, and custom.

    [Image] settings

    Parameter

    Required

    Example

    Description

    osTag

    Yes

    CentOS_7.2_64

    The tag of the OS image. You can call the ListImages API operation to query the image tags supported by E-HPC.

    imageOwnerAlias

    Yes

    system

    The type of the image. Valid values:

    • system: public image

    • self: custom image

    • others: shared image

    imageId

    No

    m-m5egogbgwjj2n1******

    The image ID. This parameter is required if you set imageOwnerAlias to self or others.

    [Network] settings

    Parameter

    Required

    Example

    Description

    vpcId

    Yes

    vpc-b3f3edefefeep0760yju****

    The ID of the VPC to which the cluster belongs.

    vSwitchId

    Yes

    vsw-bp1lfcjbfb099rrjn****

    The vSwitch ID.

    securityGroupName

    No

    ehpc-SecurityGroup

    The name of the security group when a new security group is created.

    If you do not specify securityGroupId, a new security group is created.

    securityGroupId

    No

    sg-bp13n61xsydodfyg****

    The ID of the security group when an existing security group is used.

    withoutElasticIp

    No

    false

    Specifies whether to assign an elastic IP address (EIP) to the logon node. Valid values:

    • true

    • false

    Default value: false.

    Note

    If you set deployMode to Tiny and no logon node is available, the EIP is assigned to the management node.

    sccClusterId

    No

    hpc-m5e2qpb2cxfnet******

    The remote direct memory access (RDMA) network ID of the E-HPC cluster whose compute node instance type is Super Computing Cluster (SCC). The value of this parameter is the SCC cluster ID. You can obtain the SCC cluster ID on the Super Computing Clusters page in the Elastic Compute Service (ECS) console. This parameter takes effect only if the instance type of compute nodes is SCC.

    If the compute node uses the SCC instance and you do not configure this parameter, the E-HPC automatically creates an RDMA network ID.

    [Storage] settings

    Parameter

    Required

    Example

    Description

    volumeType

    No

    nas

    The type of the additional shared storage. Only NAS is supported. Valid value: nas.

    volumeProtocol

    No

    NFS

    The type of the protocol that is used by the NAS file system. Valid values:

    • NFS

    • SMB

    Default value: NFS.

    volumeId

    No

    008b64****

    The ID of the NAS file system.

    remoteDirectory

    No

    /

    The remote directory of the NAS file system.

    volumeMountOption

    No

    -t nfs -o vers=4

    The mount parameters when you manually mount an NFS file system by using the mount command.

    volumeMountpoint

    No

    008b64****-s****.cn-hangzhou.nas.aliyuncs.com

    The mount target of the NAS file system.

    storageConfigByDirectory

    No

    0

    Specifies whether to mount different file systems to the /home directory and the /opt directory. Valid values:

    • 1: Yes

    • 0: No

    Default value: 0.

    homeVolumeId

    No

    008b64****

    The ID of the NAS file system that you want to mount to the /home directory. This parameter takes effect only when storageConfigByDirectory is set to 1.

    homeVolumeMountpoint

    No

    008b64****-s****.cn-hangzhou.nas.aliyuncs.com

    The mount target of the NAS file system to which the /home directory is mounted. This parameter takes effect only when storageConfigByDirectory is set to 1.

    homeRemoteDirectory

    No

    /

    The remote directory of the NAS file system to which the /home directory is mounted. This parameter takes effect only when storageConfigByDirectory is set to 1.

    optVolumeId

    No

    00da34****

    The ID of the NAS file system to which the /opt directory is mounted. This parameter takes effect only when storageConfigByDirectory is set to 1.

    optVolumeMountpoint

    No

    00da34****-a****.cn-hangzhou.nas.aliyuncs.com

    The mount target of the NAS file system to which the /opt directory is mounted. This parameter takes effect only when storageConfigByDirectory is set to 1.

    optRemoteDirectory

    No

    /

    The remote directory of the NAS file system to which the /opt directory is mounted. This parameter takes effect only when storageConfigByDirectory is set to 1.

    [Scheduler] settings

    Parameter

    Required

    Example

    Description

    schedulerType

    No

    pbs

    The scheduler type. If you do not configure the isHybridCluster parameter or you set isHybridCluster to false, this parameter is required. Valid values:

    • slurm

    • pbs

    • pbs19

    • slurm22

    • opengridengine

    jobQueue

    No

    cpuworkq

    The name of the queue. If you do not configure this parameter, the system automatically creates a queue when you create a cluster and add compute nodes to the queue.

    [Account] settings

    Parameter

    Required

    Example

    Description

    accountType

    No

    nis

    The service type of the domain account. Valid values:

    • nis

    • ldap

    Default value: nis.

    domain

    No

    example.com

    The domain name of the on-premises cluster.

    This parameter takes effect only if you set AccoutType to Idap.

    openldapParam

    No

    "{\"LdapServerIp\": \"19.16.XX.XX\", \"BaseDn\":\" example.com\" }"

    The parameters of the on-premises OpenLDAP server when you create a hybrid cloud cluster. The value is a JSON string that contains the following fields:

    • LdapServerIp: the IP address of the OpenLDAP server

    • BaseDn: the OpenLDAP domain name

    winAdParam

    No

    "{ \"AdUser\": \"Administrator\", \"AdUserPasswd\": \"pwd***\", \"AdDc\": \"example.com\", \"AdIp\": \"12.13.XX.XX\" }"

    The parameters that are used to connect to the AD server. The value is a JSON string that contains the following fields:

    • AdUser: the administrator user of the AD server

    • AdUserPasswd: the password of the administrator user of the AD server

    • AdDc: the AD domain name

    • AdIp: the IP address of the AD server

    [Application] settings

    Parameter

    Required

    Example

    Description

    postScriptUrl

    No

    http://xxx.xxxx.com/post_exec.sh

    The URL to download the post-installation script.

    postScriptArgs

    No

    -v -p xxx

    The parameter of the post-installation script.

    remoteVisEnable

    No

    false

    Specifies whether to enable the Virtual Network Computing (VNC) feature. Valid values:

    • true

    • false

    Default value: false.

    plugin

    No

    {"pluginMod": "oss","pluginLocalPath": "/opt/plugin","pluginOssPath":"https://bucket.oss-cn-hangzhou.aliyuncs.com/plugin/plugin.tgz"}

    The scheduler plug-in mode. This parameter takes effect only if you set schedulerType to custom.

    The value is a JSON string that contains the following fields:

    • pluginMod: the mode of the plug-in. The following modes are supported:

      • oss: The plug-in is downloaded and decompressed from OSS to a local path that is specified by the pluginLocalPath parameter.

      • image: By default, the plug-in is stored in a pre-defined local path. The local path is specified by the pluginLocalPath parameter.

    • pluginLocalPath: the local path where the plug-in is stored. We recommend that you select a shared directory in the oss mode and a non-shared directory in the image mode.

    • pluginOssPath: the remote path where the plug-in is stored in OSS. This parameter takes effect only when the pluginMod parameter is set to oss.

  4. In the upper-right corner of the page, read and select Alibaba Cloud International Website Product Terms of Service, and then click OK.

Result

After you create the cluster, you can check the status of the cluster on the Cluster page. If the cluster and all nodes in the cluster are in the Running state, the cluster is created.