All Products
Search
Document Center

Elastic High Performance Computing:Nodes and queues overview

Last Updated:Apr 23, 2025

This topic introduces concepts related to nodes and queues in an Elastic High Performance Computing (E-HPC) cluster.

compute node

Compute nodes are a key component of an E-HPC cluster. Computes nodes are used to execute computing jobs and process data. Each compute node is an Elastic Compute Service (ECS) instance. With abundant computing power and storage capacity, the instances are suitable for running various workloads, such as parallel computing, large-scale data processing, and deep learning training. You can build a powerful computing cluster by combining multiple compute nodes and use the cluster to run ultra-large computing tasks in a fast and efficient manner.

queue

Queues are resource allocation units that are used to manage and schedule jobs in E-HPC clusters. Queues can be perceived as compute node pools. You can define the job priority, limits, and scheduling policies in a queue based on your business requirements. This helps balance loads and improve user experience. In addition, you can categorize queues by job type, user group, or resource requirement to manage and utilize cluster resources more efficiently.

Relationship between nodes and queues

Both nodes and queues are key components of clusters. Proper joint use of nodes and queues makes resource management and job scheduling more efficient.

  • Nodes as queue elements

    Each compute node can be regarded as an element of a queue in an E-HPC cluster, executing each job assigned to it. The state of a node, such as idle, busy, and maintenance, affects its position and priority in the queue.

  • Queues as node managers

    The queue system receives, sorts, and assigns jobs to nodes. It schedules tasks based on their priority, resource requirements, and node availability. In addition, it monitors node health to ensure the optimal execution of jobs.

  • Dynamic resource allocation

    The queue system dynamically assigns jobs to nodes that have the required resources. For example, it can assign memory-intensive jobs to memory-rich nodes to optimize resource utilization and improve cluster performance.

  • Load balancing

    The queue system uses an intelligent scheduling algorithm to balance loads within a cluster. This helps ensure that no single node is overloaded or idle, thus preventing cluster performance deterioration and improving cluster efficiency.