The computing power and efficiency of Application Specific Integrated Circuit (ASIC) chips are customized based on the needs of specific algorithms. ASIC chips have the following benefits: small size, low power consumption, high reliability, strong confidentiality, high computing performance, and high computing efficiency. This topic describes how to create a managed ASIC-accelerated cluster in the Container Service for Kubernetes (ACK) console and how to use ASIC devices.

Background information

ACK performs the following operations when a cluster is created:

  • Creates ECS instances, configures a public key to enable SSH logon from master nodes to other nodes, and then configures the ACK cluster through CloudInit.
  • Creates a security group that allows access to the VPC over Internet Control Message Protocol (ICMP).
  • If you do not specify an existing VPC, ACK creates a VPC and a vSwitch and creates SNAT entries for the vSwitch.
  • Adds route entries to the VPC.
  • Creates a NAT gateway and EIPs.
  • Creates a Resource Access Management (RAM) user and an AccessKey pair. Grants the following permissions to the RAM user: permissions to query, create, and delete ECS instances, permissions to add and delete disks, and full permissions on SLB, CloudMonitor, VPC, Log Service, and Apsara File Storage NAS (NAS). The ACK cluster automatically creates SLB instances, disks, and VPC route entries based on your configuration.
  • Creates an internal-facing SLB instance and opens port 6443.
  • Creates an Internet-facing SLB instance and opens ports 6443, 8443, and 22. If you enable SSH logon over the Internet when you create the cluster, port 22 is opened. Otherwise, port 22 is not exposed.

Limits

  • The Kubernetes version of the cluster is 1.20.11 or later.
  • ACK clusters support only virtual private clouds (VPCs).
  • By default, each account has specific quotas on cloud resources that can be created. You cannot create clusters if the quota is reached. Make sure that you have sufficient resource quotas before you create a cluster.

    To request a quota increase, Submit a ticket.

    • For more information, see Quota limits.
      Notice By default, you can add up to 48 route entries to a VPC. This means that you can deploy up to 48 nodes in an ACK cluster that uses Flannel. This limit does not apply to ACK clusters that use Terway. To deploy more nodes in a cluster, submit a ticket.
    • By default, you can create at most 100 security groups with each account.
    • By default, you can create at most 60 pay-as-you-go Server Load Balancer (SLB) instances with each account.
    • By default, you can create at most 20 elastic IP addresses (EIPs) with each account.
  • Limits on Elastic Compute Service (ECS) instances:

    The pay-as-you-go and subscription billing methods are supported.

    Note After an ECS instance is created, you can change its billing method from pay-as-you-go to subscription in the ECS console. For more information, see Change the billing method of an ECS instance from pay-as-you-go to subscription.

Procedure

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. In the upper-right corner of the Clusters page, click Cluster Template.
  4. In the Select Cluster Template dialog box, find Heterogeneous Computing Cluster in the Managed Clusters section and click Create.
  5. On the Managed Kubernetes tab, configure the cluster.
    1. Configure basic settings of the cluster.
      Parameter Description
      Cluster Name Enter a name for the cluster.
      Note The name must be 1 to 63 characters in length, and can contain digits, letters, and hyphens (-).
      Cluster Specification Select the required edition. By default, Professional is selected.
      Region Select a region to deploy the cluster.
      Billing Method
      The pay-as-you-go and subscription billing methods are supported. If you select the subscription billing method, you must set the following parameters:
      Note If you set Billing Method to Subscription, only Elastic Compute Service (ECS) instances and Server Load Balancer (SLB) instances are billed on a subscription basis. Other cloud resources are billed on a pay-as-you-go basis. For more information about the billing rules of Alibaba Cloud resources, see Billing of cloud services.
      • Duration: You can select 1, 2, 3, or 6 months. If you require a longer duration, you can select 1 year, 2 years, or 3 years.
      • Auto Renewal: Specify whether to enable auto-renewal.
      Resource Group Move the pointer over All Resources at the top of the page and select the resource group to which the cluster belongs. The name of the selected resource group is displayed. Resource Group
      Kubernetes Version Select a Kubernetes version. The following versions are supported: 1.18.8-aliyun.1 and 1.16.9-aliyun.1.
      Container Runtime

      The containerd, Docker, and Sandboxed-Container runtimes are supported. For more information, see Comparison of Docker, containerd, and Sandboxed-Container.

      VPC Select a VPC to deploy the cluster. Standard VPCs and shared VPCs are supported.
      • Shared VPC: The owner of a VPC (resource owner) can share the vSwitches in the VPC with other accounts in the same organization.
      • Standard VPC: The owner of a VPC (resource owner) cannot share the vSwitches in the VPC with other accounts.
      Note ACK clusters support only VPCs. You can select a VPC from the drop-down list. If no VPC is available, click Create VPC to create one. For more information, see Create and manage a VPC.
      vSwitch Select vSwitches.

      You can select up to three vSwitches that are deployed in different zones. If no vSwitch is available, click Create vSwitch to create one. For more information, see Work with vSwitches.

      Network Plug-in Select a network plug-in for the cluster. Flannel and Terway are available. For more information, see Work with Terway.
      • Flannel: a simple and stable Container Network Interface (CNI) plug-in that is developed by open source Kubernetes. Flannel provides a few simple features. However, it does not support standard Kubernetes network policies.
      • Terway: a network plug-in that is developed by ACK. Terway allows you to assign elastic network interfaces (ENIs) of Alibaba Cloud to containers. It also allows you to customize Kubernetes network policies to regulate how containers communicate with each other and implement bandwidth throttling on individual containers.
        Note
        • The number of pods that can be deployed on a node depends on the number of ENIs that are attached to the node and the maximum number of secondary IP addresses provided by these ENIs.
        • If you select a shared VPC for a cluster, you must select Terway as the network plug-in.
      IP Addresses per Node
      If you select Flannel as the network plug-in, you must set IP Addresses per Node.
      Note
      • IP Addresses per Node specifies the maximum number of IP addresses that can be assigned to each node. We recommend that you use the default value.
      • After you select the VPC and specify the number of IP addresses per node, recommended values are automatically generated for Pod CIDR Block and Service CIDR. The system also provides the maximum number of nodes that can be deployed in the cluster and the maximum number of pods that can be deployed on each node. You can modify the values based on your business requirements.
      Pod CIDR Block If you select Flannel as the network plug-in, you must set Pod CIDR Block.

      The CIDR block specified by Pod CIDR Block cannot overlap with that of the VPC or those of the existing clusters in the VPC. The CIDR block cannot be modified after the cluster is created. The Service CIDR block cannot overlap with the pod CIDR block. For more information about how to plan CIDR blocks for an ACK cluster, see Plan CIDR blocks for an ACK cluster.

      Terway Mode If you set Network Plug-in to Terway, the Terway Mode parameter is available.
      When you set Terway Mode, you can select or clear Assign One ENI to Each Pod.
      • If you select Assign One ENI to Each Pod, an ENI is assigned to each pod.
      • If you clear Assign One ENI to Each Pod, an ENI is shared among multiple pods. A secondary IP address of the ENI is assigned to each pod.
      Note If you want to use this feature, Submit a ticket to be added to the whitelist.
      Service CIDR Set Service CIDR. The CIDR block specified by Service CIDR cannot overlap with that of the VPC or those of the existing clusters in the VPC. The CIDR block cannot be modified after the cluster is created. The Service CIDR block cannot overlap with the pod CIDR block. For more information about how to plan CIDR blocks for an ACK cluster, see Plan CIDR blocks for an ACK cluster.
      Configure SNAT Specify whether to configure SNAT rules for the VPC.
      • If the specified VPC has a NAT gateway, ACK uses this NAT gateway.
      • If the VPC does not have a NAT gateway, the system automatically creates one. If you do not want the system to create a NAT gateway, clear Configure SNAT for VPC. In this case, you must manually create a NAT gateway and configure SNAT rules to enable Internet access for the VPC. Otherwise, resources in the VPC cannot access the Internet and the cluster cannot be created.
      Access to API Server
      By default, an internal-facing SLB instance is created for the Kubernetes API server of the cluster. You can modify the specification of the SLB instance. For more information, see Instance types and specifications.
      Notice If you delete the SLB instance, you cannot access the Kubernetes API server of the cluster.
      Select or clear Expose API Server with EIP. The ACK API server provides multiple HTTP-based RESTful APIs, which can be used to create, delete, modify, query, and monitor resources, such as pods and Services.
      • If you select this check box, an elastic IP address (EIP) is created and associated with an SLB instance. Port 6443 used by the API server is opened on master nodes. You can connect to and manage the cluster by using kubeconfig files over the Internet.
      • If you clear this check box, no EIP is created. You can connect to and manage the cluster by using kubeconfig files only from within the VPC.
      RDS Whitelist Configure the whitelist of the ApsaraDB RDS instance. Add the IP addresses of nodes in the cluster to a whitelist of the ApsaraDB RDS instance.
      Security Groups

      You can select Create Basic Security Group, Create Advanced Security Group, or Select Existing Security Group. For more information about security groups, see Overview.

      Deletion Protection

      Specify whether to enable deletion protection for the cluster. Deletion protection prevents the cluster from being deleted in the console or by calling the API. This prevents user errors.

    2. Configure advanced settings of the cluster.
      Parameter Description
      Time Zone

      Select a time zone for the cluster. By default, the time zone of your browser is selected.

      Kube-proxy Mode
      iptables and IPVS are supported.
      • iptables is a mature and stable kube-proxy mode. It uses iptables rules to conduct service discovery and load balancing. The performance of this mode is restricted by the size of the Kubernetes cluster. This mode is suitable for Kubernetes clusters that manage a small number of Services.
      • IPVS is a high-performance kube-proxy mode. It uses Linux Virtual Server (LVS) to conduct service discovery and load balancing. This mode is suitable for clusters that manage a large number of Services. We recommend that you use this mode in scenarios where high-performance load balancing is required.
      Labels
      Add labels to cluster nodes. Enter a key and a value, and then click Add.
      Note
      • Key is required. Value is optional.
      • Keys are not case-sensitive. A key must be 1 to 64 characters in length and cannot start with aliyun, http://, or https://.
      • Values are not case-sensitive. A value cannot exceed 128 characters in length and cannot contain http:// or https://. A value can be empty.
      • The keys of labels that are added to the same resource must be unique. If you add a label with a used key, the label overwrites the label that uses the same key.
      • If you add more than 20 labels to a resource, all labels become invalid. You must remove excess labels for the remaining labels to take effect.
      Cluster Domain
      Set the domain name of the cluster.
      Note The default domain name is cluster.local. You can enter a custom domain name. A domain name consists of two parts. Each part must be 1 to 63 characters in length and can contain only letters and digits. You cannot leave these parts empty.
      Custom Certificate SANs

      You can enter custom subject alternative names (SANs) for the API server certificate of the cluster to accept requests from specified IP addresses or domain names.

      For more information, see Customize the SAN of the API server certificate when you create an ACK cluster.

      Service Account Token Volume Projection

      Service account token volume projection reduces security risks when pods use service accounts to access the API server. This feature enables kubelet to request and store the token on behalf of the pod. This feature also allows you to configure token properties, such as the audience and validity duration. For more information, see Enable service account token volume projection.

      Secret Encryption If you select Select Key, you can use a key that is created in the Key Management Service (KMS) console to encrypt Kubernetes Secrets. For more information, see Use KMS to encrypt Kubernetes Secrets.
  6. Click Next:Node Pool Configurations to configure worker nodes.
    1. Set worker instances.
      • If you select Create Instance, you must set the parameters as described in the following table.
        Parameter Description
        Node Pool Name
        The name of the node pool.
        Note The name must be 1 to 63 characters in length, and can contain digits, letters, and hyphens (-).
        Instance Type Select Heterogeneous Computing GPU/FPGA/NPU and Heterogeneous Service Type. In this example, the ecs.video-trans.26xhevc instance type is selected. For more information about instance types, see Instance family.
        Note If no instance type is available, you can change vSwitches on the Cluster Configurations wizard page.
        Selected Types

        The selected instance types are displayed.

        Quantity

        The number of ECS instances that you want to add to the cluster.

        System Disk
        Enhanced SSDs, standard SSDs, and ultra disks are supported.
        Note
        • You can select Enable Backup to back up disk data.
        • If you select enhanced SSD as the system disk type, you can set a custom performance level for the system disk.

          You can select higher performance levels for enhanced SSDs with larger storage capacities. For example, you can select performance level 2 for an enhanced SSD with a storage capacity of more than 460 GiB. You can select performance level 3 for an enhanced SSD with a storage capacity of more than 1,260 GiB. For more information, see Capacity and PLs.

        Mount Data Disk

        Enhanced SSDs, standard SSDs, and ultra disks are supported. You can select Encrypt Disk and Enable Backup when you mount data disks.

        Operating System
        ACK supports the following node operating systems:
        • Alibaba Cloud Linux 2.x. This is the default operating system.
          If you select Alibaba Cloud Linux 2.x, you can configure security reinforcement for the operating system:
          • Disable: disables security reinforcement for Alibaba Cloud Linux 2.x.
          • CIS Reinforcement: enables security reinforcement for Alibaba Cloud Linux 2.x. For more information about Center for Internet Security (CIS) reinforcement, see CIS reinforcement.
        • CentOS 7.x
          Note CentOS 8.x or later are not supported.
        Logon Type
        • Key pair logon.

          Set Logon Type to Key Pair. If no key pair is available, click create a key pair to create one in the ECS console. For more information, see Create an SSH key pair. After the key pair is created, set it as the credential that is used to log on to the cluster.

        • Password logon.
          • Password: Enter the password that is used to log on to the nodes.
          • Confirm Password: Enter the password again.
          Note The password must be 8 to 30 characters in length, and must contain at least three of the following character types: uppercase letters, lowercase letters, digits, and special characters. The password cannot contain underscores (_).
        Key Pair
      • If you select Add Existing Instance, you must select ECS instances that are deployed in the specified region. Then, set the Operating System, Logon Type, and Key Pair parameters based on the preceding description.
    2. Configure advanced settings.
      Parameter Description
      Node Protection
      Specify whether to enable node protection.
      Note By default, this check box is selected. Node protection prevents nodes from being accidentally deleted in the console or by calling the API. This prevents user errors.
      User Data

      For more information, see Overview of ECS instance user data.

      Custom Image
      • You can select a custom ECS image. After you select a custom image, all nodes in the cluster are deployed by using this image. For more information about how to create a custom image, see Create a Kubernetes cluster by using a custom image.
      • You can select a shared ECS image. After you select a shared image, all nodes in the cluster are deployed by using this image. For more information about shared images, see Share a custom image.
      Note
      • Only custom images based on CentOS 7.x and Alibaba Cloud Linux 2.x are supported.
      • To use this feature, submit a ticket to apply to be added to a whitelist.
      Custom Node Name

      Specify whether to use a custom node name.

      A custom node name consists of a prefix, an IP substring, and a suffix.
      • The prefix and suffix can contain multiple parts that are separated by periods (.). Each part can contain lowercase letters, digits, and hyphens (-), and must start and end with a lowercase letter or digit.
      • The IP substring length specifies the number of digits to be truncated from the end of the node IP address. The IP substring length ranges from 5 to 12.

      For example, if the node IP address is 192.1xx.x.xx, the prefix is aliyun.com, the IP substring length is 5, and the suffix is test, the node name will be aliyun.com00055test.

      CPU Policy
      Set the CPU policy.
      • none: This policy indicates that the default CPU affinity is used. This is the default policy.
      • static: This policy allows pods with specific resource characteristics on the node to be granted with enhanced CPU affinity and exclusivity.
      Taints

      Add taints to all worker nodes in the cluster.

  7. Click Next:Component Configurations to configure components.
    Parameter Description
    Ingress Specify whether to install an Ingress controller. By default, Nginx Ingress is selected. For more information, see Advanced NGINX Ingress configurations.
    Note If you want to select Create Ingress Dashboard, you must first enable Log Service.
    Service Discovery

    Specify whether to install NodeLocal DNSCache. By default, NodeLocal DNSCache is installed.

    NodeLocal DNSCache runs a Domain Name System (DNS) caching agent to improve the performance and stability of DNS resolution. For more information about NodeLocal DNSCache, see Configure NodeLocal DNSCache.

    Volume Plug-in Select a volume plug-in. FlexVolume and CSI are supported. ACK clusters can be automatically bound to Alibaba Cloud disks, Apsara File Storage NAS (NAS) file systems, and Object Storage Service (OSS) buckets that are mounted to pods. For more information, see Storage management-FlexVolume and Storage management-CSI.
    Monitoring Agents

    Specify whether to install the CloudMonitor agent. By default, Install CloudMonitor Agent on ECS Instance and Enable Prometheus Monitoring are selected. After the CloudMonitor agent is installed on ECS nodes, you can view monitoring data about the nodes in the CloudMonitor console.

    Alerts

    Select Use Default Alert Rule Template to enable the alerting feature and use the default alert rules. For more information, see Alert management.

    Log Service

    Specify whether to enable Log Service. You can select an existing Log Service project or create one. By default, Enable Log Service is selected. When you create an application, you can enable Log Service with a few steps. For more information, see Collect log data from containers by using Log Service.

    After you select Enable Log Service, you can specify whether to select Create Ingress Dashboard and Install node-problem-detector and Create Event Center.

    Log Collection for Control Plane Components

    If you select Enable, the logs of control plane components are collected to the specified Log Service project that belongs to the current account. For more information, see Collect the logs of control plane components in a managed Kubernetes cluster.

    Workflow Engine
    Specify whether to enable Alibaba Cloud Genomics Service (AGS).
    Note To use this feature, submit a ticket.
    • If you select this check box, the system automatically installs the AGS workflow plug-in when the system creates the cluster.
    • If you clear this check box, you must manually install the AGS workflow plug-in. For more information, see Introduction to AGS CLI.
  8. Click Next:Confirm Order.
  9. Select Terms of Service and click Create Cluster.
    Note It requires about 10 minutes to create an ACK cluster with multiple nodes.
  10. After the cluster is created, you can view the ASIC devices that are attached to the nodes.
    1. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
    2. In the left-side navigation pane of the details page, choose Nodes > Nodes.
    3. On the Nodes page, find the node that you created and choose More > Details in the Actions column. On the details page of the node, you can view the ASIC device that is attached to the node.

Use ASIC devices

NETINT provides base images on Docker Hub. You can use these images to build container images. For more information, see netint/ni_xcoder_release.

The following section describes how to submit a job that requests NETINT ASIC devices.

  1. Use kubectl to connect to the ACK managed cluster. For more information, see Connect to ACK clusters by using kubectl.
  2. Run the following command to view the total number of NETINT ASIC devices that are attached to a specified node of the cluster.
    The extended resource name of NETINT ASIC devices is netint.ca/ASIC.
    kubectl get nodes <NODE_NAME> -o yaml
    Expected output:
    ......   #Information about other resources is not shown. 
    status:
      allocatable:
        cpu: 101900m
        ephemeral-storage: "114022843818"
        hugepages-1Gi: "0"
        hugepages-2Mi: "0"
        memory: 185176524Ki
       #The node contains 12 NETINT ASIC devices. 
        netint.ca/ASIC: "12" 
        pods: "64"
      capacity:
        cpu: "104"
        ephemeral-storage: 123722704Ki
        hugepages-1Gi: "0"
        hugepages-2Mi: "0"
        memory: 196499916Ki
        netint.ca/ASIC: "12"
        pods: "64"
    The output indicates that the node contains 12 NETINT ASIC devices.
  3. Submit a job that requests NETINT ASIC devices.
    1. Create a file named test-asic.yaml by using the following YAML template.
      apiVersion: batch/v1
      kind: Job
      metadata:
        name: test-asic
      spec:
        parallelism: 1
        template:
          metadata:
            labels:
              app: test-asic
          spec:
            containers:
            - name: test-asic
              image: registry.cn-beijing.aliyuncs.com/ai-samples/asic_258:asic
              command:
              - sleep
              - "500"
              resources:
                limits:
                  netint.ca/ASIC: 2 #Request 2 NETINT ASIC devices. 
            restartPolicy: Never
    2. Run the following command to submit the job:
      kubectl create -f test-asic.yaml
  4. Run the following command to check whether the pod is in the Running state:
    kubectl get po -l app=test-asic
    Expected output:
    NAME              READY   STATUS    RESTARTS   AGE
    test-asic-zt6ck   1/1     Running   0          19s
    The output indicates that the status of the pod is Running.
  5. Run the following command to log on to the pod:
    kubectl exec -ti test-asic-zt6ck  bash
  6. Run the following command in the pod to initialize the NETINT ASIC devices:
    ni_rsrc_mon
    Expected output:
    NI resource not init'd, continue ..
    Reading device file: nvme11
    Reading device file: nvme9
    Compatible minimum FW ver: 258, FW API flavors: 1E, minimum API ver: 9
    1. /dev/nvme9 /dev/nvme9n1 num_hw: 2
    2. /dev/nvme11 /dev/nvme11n1 num_hw: 2
    Creating shm_name: NI_SHM_CODERS  lck_name: /dev/shm/NI_LCK_CODERS
    0. nvme9
    decoder h/w id 0 create
    Creating shm_name: NI_shm_d0 , lck_name /dev/shm/NI_lck_d0
    ni_rsrc_get_one_device_info written out.
    decoder h/w id 0 update
    encoder h/w id 1 create
    Creating shm_name: NI_shm_e0 , lck_name /dev/shm/NI_lck_e0
    ni_rsrc_get_one_device_info written out.
    encoder h/w id 1 update
    1. nvme11
    decoder h/w id 0 create
    Creating shm_name: NI_shm_d1 , lck_name /dev/shm/NI_lck_d1
    ni_rsrc_get_one_device_info written out.
    decoder h/w id 0 update
    encoder h/w id 1 create
    Creating shm_name: NI_shm_e1 , lck_name /dev/shm/NI_lck_e1
    ni_rsrc_get_one_device_info written out.
    encoder h/w id 1 update
    **************************************************
    2 devices retrieved from current pool at start up
    Wed Dec  1 06:46:07 2021 up 00:00:00 v258R1E09
    Num decodes: 2
    BEST INDEX LOAD MODEL_LOAD MEM  INST DEVICE         NAMESPACE
    L    0     0    0          0    0    /dev/nvme9     /dev/nvme9n1
         1     0    0          0    0    /dev/nvme11    /dev/nvme11n1
    Num encodes: 2
    BEST INDEX LOAD MODEL_LOAD MEM  INST DEVICE         NAMESPACE
    L    0     0    0          0    0    /dev/nvme9     /dev/nvme9n1
         1     0    0          0    0    /dev/nvme11    /dev/nvme11n1
    **************************************************
  7. Run the following command to check whether the NETINT ASIC devices are ready for use:
    bash run_ffmpeg.sh
    Expected output:
    Choose an option:
    1) check pci device           6) test 265 decoder
    2) check nvme list           7) test 264 encoder
    3) rsrc_init               8) test 265 encoder
    4) ni_rsrc_mon               9) test 264->265 transcoder
    5) test 264 decoder          10) Quit
    #? 6
    
    You chose 6  which is test 265 decoder
    Input #0, hevc, from '../libxcoder/test/akiyo_352x288p25.265':
      Duration: N/A, bitrate: N/A
        Stream #0:0: Video: hevc (Main), yuv420p(tv), 352x288, 25 fps, 25 tbr, 1200k tbn, 25 tbc
    Stream mapping:
      Stream #0:0 -> #0:0 (hevc (h265_ni_dec) -> rawvideo (native))
    Output #0, rawvideo, to 'output_6.yuv':
      Metadata:
        encoder         : Lavf58.29.100
        Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 352x288, q=2-31, 30412 kb/s, 25 fps, 25 tbn, 25 tbc
        Metadata:
          encoder         : Lavc58.54.100 rawvideo
    frame=  300 fps=0.0 q=-0.0 Lsize=   44550kB time=00:00:11.88 bitrate=30720.0kbits/s speed=44.4x
    video:44550kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
    Decoder HW[0] INST[127]-average usage:27%
    Complete! output_6.yuv has been generated.
    PASS: output_6.yuv matches checksum.
    If the test result is PASS, the NETINT ASIC devices are ready for use.