GPU-accelerated instances are a type of Elastic Compute Service (ECS) instance and are managed in the same way as standard ECS instances. Common operations include making remote connections, changing operating systems, scaling out cloud disks, upgrading or downgrading instance configurations, and using snapshots or images.
Purchase and use an ECS instance
Follow these steps to understand the complete lifecycle of an instance, from selection and purchase to use, operations and maintenance (O&M), and release.
Understand instance families:
Before you purchase an ECS instance, understand the features, available types, and scenarios for different ECS instance families. This helps you select the instance family that is right for your business. For more information, see Instance families.
Understand billing methods:
Different billing methods are suitable for different business scenarios. For example, the subscription billing method is typically used for services that run 24/7. The pay-as-you-go billing method is suitable for applications or services with traffic bursts. For more information, see Overview of billing methods.
Purchase an ECS instance:
Quickly purchase a subscription instance: You can purchase an ECS instance in minutes using the simplest method. However, only specific instance types and images are supported, and most configurations cannot be customized.
Customize your instance purchase: You can flexibly select configurations such as the image type, instance type, storage, bandwidth, and security group based on your business scenario.
For more information about how to purchase an instance, see Create an instance.
Connect to an instance remotely:
You can use methods such as Connect to an ECS instance, Connect to an ECS instance, and Connect to an ECS instance to log on to an ECS instance for remote O&M. For more information, see Connect to an ECS instance.
If you did not set a logon password when you created the ECS instance, or if you forgot the password, you must reset it. For more information, see Reset the logon password of an instance.
Use ECS to deploy common environments, websites, and applications:
If you want to upload or download files, see Upload or download files.
If you want to deploy a basic environment, see Build an environment.
If you want to deploy common website services, see Build a website.
If you want to deploy common application services, such as databases or code hosting platforms, see Build an application.
Manage the state of an ECS instance:
Start an instance: If an instance is stopped, it cannot provide services. You must start the instance before you can use it.
Stop an instance: Before you perform certain operations, you must stop the instance. These operations include changing the operating system, modifying the private IP address, or changing the instance type of a pay-as-you-go instance.
Restart an instance: Restarting is a common way to maintain an ECS instance, such as after a system update or to apply configuration changes.
Release an instance:
When you no longer need an ECS resource, release the instance promptly to avoid further charges. For more information, see Release an instance.
Change instance configurations
If the current configuration of your ECS instance does not meet your business needs, you can change the instance type (vCPUs and memory), public bandwidth configuration, disk size, or operating system.
Change the instance type
Upgrade the instance type of a subscription instance and downgrade the instance type of a subscription instance.
Change the instance type across zones: Migrate an ECS instance to another zone in the same region. You can also change the instance type (vCPUs and memory) within the same instance family.
Change the bandwidth configuration
Modify the bandwidth of a subscription instance that has a static public IP address: If you use a static public IP address, you can change its bandwidth billing method and bandwidth value.
Temporarily upgrade the static public bandwidth of a subscription instance for a continuous period: If you have a subscription instance and require high traffic for a specific period, you can temporarily upgrade the instance's static public bandwidth. The bandwidth automatically reverts to its original value after the period ends. This helps avoid unnecessary long-term costs.
Change the bandwidth of an EIP: If you use an elastic IP address (EIP), you can adjust the EIP's peak bandwidth and billing method.
Expand a disk
You can expand the capacity of an existing disk to meet greater data storage needs. For more information, see Disk expansion guide.
Change the instance operating system
Change the operating system (replace the system disk): This operation replaces the system disk and its image. The old system disk is released and all its data is cleared. Before you perform this operation, create a snapshot of the system disk to back up its data.
Migrate the operating system: After an operating system is no longer supported due to its lifecycle, third-party support changes, or open source project evolution, you can migrate the ECS instance to a new operating system. This operation retains the data on the instance's system disk.
Manage data
You can use block storage to store operating system and business data for your instances. You can also use snapshots to back up data periodically and improve data reliability.
Block storage
Elastic Block Storage is a block device product that Alibaba Cloud provides for ECS. It includes three types: disks, local disks, and elastic ephemeral disks. You can format and create a file system on a block storage device attached to an ECS instance, just as you would with a physical hard drive. For more information, see Block storage overview. The following are common operations for block storage:
Creating and Using Disks
A disk can be attached to an ECS instance as a system disk (to store operating system data) or a data disk (to store business data). You can create and use disks to provide persistent storage for your ECS instances. For more information, see Create and use a disk.
Reinitialize a disk
You can re-initialize a disk to clear its data and restore it to its initial state. For more information, see Re-initialize a disk.
Expand a disk
You can expand the capacity of an existing disk to meet greater data storage needs and prevent issues such as data loss due to insufficient storage space. For more information, see Disk expansion guide.
Snapshots
A snapshot is a full copy of a disk's data at a specific point in time. It is an important disaster recovery tool. You can use snapshots to periodically back up business data on your disks to mitigate the risk of data loss from accidental operations, attacks, or viruses. For more information, see Overview.
Create a manual snapshot
Before you perform major operations such as rolling back a disk, modifying critical system files, or changing the operating system, create a snapshot of the disk (system disk or data disk). If the operation causes unexpected problems or data loss, you can use the snapshot to recover data and ensure business continuity.
For more information about how to manually create a snapshot for a single disk, see Create a snapshot.
Create an automatic snapshot
You can create an automatic snapshot policy and apply it to your disks. After the policy is applied, Alibaba Cloud automatically creates snapshots for the disks at the specified time points. For more information, see Create policy and Apply an automatic snapshot policy to a disk.
Manage networks
Building a scalable private network in the cloud and implementing strict access control are important parts of network security.
Build a VPC network
A virtual private cloud (VPC) is a custom private network that you create on Alibaba Cloud. You can customize the IP address range, subnets, route tables, and network security policies. Different VPCs are logically isolated at Layer 2. A VPC gives you better control over resource access and improves data security and flexibility. You can learn about the components of a VPC, and then plan, create, and manage your VPCs. For more information, see Virtual private cloud (VPC).
Enable Internet access
After you enable Internet access for an instance, the ECS instance can communicate with the Internet. You can enable Internet access for an instance by assigning a static public IP address or an elastic IP address (EIP). For more information, see Enable Internet access.
Access resources within a VPC
Compared to public network access, private network access is completely isolated from the outside. It is suitable for internal communication that requires high security and transmission speed. You can use private IP addresses or private endpoints for private network access. For more information, see Access resources within a VPC.
Improve network performance
You can use elastic Remote Direct Memory Access (eRDMA) to improve network performance. eRDMA is a proprietary elastic RDMA network developed by Alibaba Cloud. It has the advantages of traditional RDMA network interface cards and applies traditional RDMA technology to VPC networks. The ultra-low latency lets you experience the superior performance of RDMA in a cloud network. You can use eRDMA as follows:
Improve IP address management efficiency
You can use prefix lists to manage IP addresses more efficiently. A prefix list is a collection of network prefixes (CIDR blocks). You can reference a prefix list when you configure network rules for other resources. You can add frequently used CIDR blocks to a prefix list. This saves you from repeatedly adding multiple rules for different CIDR blocks when you configure network rules and improves O&M efficiency.
Prefix lists can be referenced when you configure security group rules. For more information, see Use prefix lists and port lists to efficiently manage security group rules.
Use multiple IP addresses for multiple applications
You can use elastic network interfaces (ENIs) to use multiple IP addresses for multiple applications. An ENI is a virtual network interface that provides network interfaces and IP addresses for ECS instances in a VPC. You can attach one or more ENIs to each ECS instance. ENIs support multiple IP addresses. This lets a single instance provide services to the public or access external resources through multiple IP addresses. For more information, see Create and use an ENI.
Security protection
You can use the following features to improve the security of your instances from different dimensions. For more information about how to improve security, see ECS security.
Security groups
A security group is a virtual firewall that controls inbound and outbound traffic for ECS instances based on the configured security group rules. It prevents unauthorized access and malicious intrusions. The components and common operations for security group rules are as follows:
Rule components: A security group rule consists of an authorization object, a port range, a protocol type, an authorization policy (allow or deny), and a priority. For more information, see Security group rules.
Create and use a security group: You can create a security group and associate it with an ECS instance to control the instance's inbound and outbound traffic. This achieves network isolation and communication. For more information, see Create a security group and Associate a security group with an instance (primary ENI).
Manage security group rules: You can add, modify, or delete rules for a security group. These changes automatically apply to all ECS instances in the security group. For more information, see Manage security group rules.
Security group use cases
We provide several use cases for security groups in common scenarios. These cases show how to configure security group rules to meet your network traffic management needs. For more information, see Security group application guide and use cases.
Key pairs
A key pair is a logon credential used to connect to an instance over the Secure Shell Protocol (SSH). It is much more secure than a regular user password and can prevent brute-force attacks. You can associate a key pair with an instance to enable passwordless logon. For more information, see Manually associate a key pair to enable passwordless SSH logon.
Deployment and elasticity
Resource scaling
You can create ECS instances quickly and automatically to handle sudden bursts of Internet traffic.
Launch templates
A launch template is a tool for quickly creating instances. It can store custom configuration information for creating ECS instances. Each template can have multiple versions, and each version can have different parameters. You can use a specific version of a template to quickly create an instance.
For more information about how to create a launch template and use it to create an ECS instance, see Create launch template and Use a launch template to create an instance.
Scaling groups
To automatically increase or decrease the number of ECS instances when your business demand fluctuates, you can configure a scaling group to automatically adjust your computing capacity (the number of instances). You can create a scaling group based on an existing ECS instance. For more information, see Create a scaling group based on an ECS instance.
Achieve high availability with deployment sets
A deployment set is a placement policy for ECS instances on physical servers. A suitable policy can help you avoid single points of failure and reduce network communication latency. You can select a deployment policy as needed for high availability, network latency, and deployment scale. Then, create a deployment set based on the policy and create or add ECS instances to the deployment set. For more information, see Deployment sets.
IaC tools
You can use infrastructure as code (IaC) tools to create and manage ECS resources more simply.
Resource Orchestration Service (ROS): An automated deployment service provided by Alibaba Cloud that simplifies cloud computing resource management and adopts the IaC design philosophy. You only need to define the required cloud resources in a ROS template. The ROS orchestration engine then automatically creates and configures all resources based on the template, achieving automated deployment and O&M.
You can use the ROS console or call an API to create a stack template to quickly create and manage resources. For more information, see Create a stack or API overview. The following are sample templates for common ECS instances and related resources:
Terraform
Terraform is an open source IaC tool. It lets developers define and manage infrastructure configurations using a declarative language. It provides a simple way to create, modify, or delete ECS resources, reducing the complexity and errors of manual operations and improving the manageability and maintainability of the infrastructure.
You can install and configure Terraform, and then use it to manage ECS instances. For more information, see Terraform reference.
O&M and monitoring
Set alerts for an instance
You can enable one-click alerts or set custom alert rules for your ECS instances. This helps you promptly detect instance anomalies and handle potential risks. For more information, see Set alert rules for an ECS instance.
System events
System events are defined by Alibaba Cloud to record and notify you of information about your cloud resources. This helps you understand risks and anomalies and achieve automated O&M. For more information, see Overview of ECS system events.
Automated O&M tools
Cloud Assistant is a native automated O&M tool designed for ECS. It lets you run commands (such as Shell, PowerShell, and Bat) in batches without a password, logon, or jump server. You can use it to perform tasks such as running automated O&M scripts, polling processes, installing or uninstalling software, starting or stopping services, and installing patches or security updates.
CloudOps Orchestration Service (OOS)
OOS is an automated O&M service provided by Alibaba Cloud that can automatically manage and execute tasks. You can use templates to define the tasks to be executed, the execution order, and the inputs and outputs. Then, you can run the templates to automate the tasks.
Migration services
Now that you know about the capabilities of ECS, it is time to migrate your on-premises services to the cloud.
Migrate to the cloud
You can migrate on-premises physical machines, on-premises virtual machines, or cloud servers from other cloud service providers to Alibaba Cloud. You can migrate by importing custom images or using Server Migration Center, a migration platform provided by Alibaba Cloud. For more information, see Migrate to the cloud.
Migrate within the cloud
To migrate an Alibaba Cloud ECS instance from one account or region to another due to insufficient resource inventory, cost optimization, disaster recovery, or disk scale-in, you can migrate resources. You can also migrate a Simple Application Server instance to an ECS instance, or a Dedicated Host (DDH) to an ECS instance. You can select a suitable migration method based on the scenario. For more information, see Migrate within the cloud.
Development
You can programmatically integrate ECS capabilities into your business systems. These capabilities include but are not limited to creating, changing, and performing O&M on instances. This simplifies operations and manages costs. For more information, see Integration overview.
ECS OpenAPI: The OpenAPI that ECS provides.
Integration methods: ECS supports resource management through methods such as software development kits (SDKs) and the command-line interface (CLI).