Introduction: On September 20, 2022, the "Alibaba Cloud EDA Cloud Solution" program was officially launched. Three experts from Alibaba Cloud will show you how Alibaba Cloud helps chip design enter the "cloud highway" from multiple perspectives. The first guest to share is Shao Qi, the solution architect of Alibaba Cloud's elastic computing products. He brought a theme sharing titled "Flexibility, Security, and High Performance: Alibaba Cloud EDA Cloud Solution". The following is his speech Content organized for reading:

01 Industry overview

1. The computing power challenge of typical chip design

The computing power challenge of typical chip design is mainly reflected in three aspects:

◾ Fast growth: The improvement of chip manufacturing process has led to a substantial increase in computing power, and the computing resources consumed by EDA simulation tools have also increased significantly;
◾ Inaccurate calculation: commercial software cannot predict power requirements, resulting in budget deviations;
◾ Urgent project: The tape-out time cannot be delayed, the development progress is uncontrollable, and the project is greatly affected by various aspects.

The following are typical computing challenges that a chip may face during development:

At the beginning of the chip design, the simulation budget was set at 10,000 cores, including 3,000 cores in the front-end and 7,000 in the back-end; as the front-end entered the operation stage, it was found that the demand for the front-end had reached 7,000 cores, exceeding the original As expected, the budget originally used for the back-end began to be appropriated; when the back-end tasks entered the operation stage, the computing power was insufficient, and the R&D waited for the simulation results; the project fell behind the plan, and the project team began to increase the budget for procurement; when the back-end tasks continued Some bugs may cause the front-end verification task to be re-executed, and the computing power will be further strained; as the chip tape-out is completed, most of the computing power is released, and the device enters an idle state.

As shown in the figure above, there will be peaks and troughs in computing power during the entire chip development process, and the chip emulation business has obvious demands for resource flexibility.

2. Analysis of pain points in the integrated circuit industry

(1) time
◾ EDA verification takes a lot of time, and insufficient resources will cause the verification work to fail to converge;
◾ The hardware equipment procurement cycle is long, and the deployment and construction take a lot of time;
(2) Cost
◾ The task has obvious peak characteristics, and the cost of holding a large amount of hardware for a long time is high;
◾ How to accurately measure the project cost, especially the cost analysis caused by the occupation of IT resources;
◾ Startups need to spend more money on license and IP procurement;
(3) Safety
◾ The architecture design is mainly implemented in word, which is easy to leak;
◾ The data delivery is complex and the volume is huge, there are many authorization and audit links, and there are loopholes in management and control;
(4) Collaboration in multiple places
◾ Multi-region office collaboration;
◾ Home office environment and security controls.

02 EDA cloud solution

1. The value of EDA on the cloud
(1) Increase productivity (accelerate TTM)
◾ On-demand capacity expansion and elastic scaling: Provide resources for elastic expansion on the cloud for peak services to avoid production being affected by over-utilization of resources;
◾ Minute-level delivery of resource application: Resource application does not need to wait for complicated processes such as procurement, project approval, installation and deployment, and resources are used immediately, no need to wait for scheduling;
(2) Reduce the difficulty of IT operation and maintenance
◾ Free from basic operation and maintenance: IT operation and maintenance departments do not need to worry about physical facilities and underlying operation and maintenance, and focus more on business support;
◾ Automated resource delivery: With the help of automated delivery tools, full-link resource delivery is provided for business parties;
◾ Centralized management and control of resources: centralized operation and maintenance and management and control tools, realize resource monitoring, and support the splitting of bills by project;
(3) Cost optimization (improving RIO)
◾ Supply exactly matching demand: linear expansion/reduction of resources to avoid unnecessary waste of resources;
◾ Savings on supporting costs: Save the investment in supporting costs such as computer room, air conditioning, hardware maintenance, etc.;
◾ Capital cost optimization: flexible payment mode, saving capital occupation cost;
(4) Improve user experience
◾ Team collaboration: optimize the efficiency of collaborative work in multiple places and reduce resource crowding between different teams and tasks;
◾ Transparent and insensitive user experience: Provide business parties with readily available resources through a unified development environment.

2. EDA design the whole process of cloud architecture

Through Alibaba Cloud Cloud Security, build a unified security control domain:

The front-end design department (the domestic design branch in the upper right of the picture) uses the Alibaba Cloud Wuying Cloud Desktop to meet the security requirements of R&D offices, and deploys Wuying in a separate security domain and VPC to ensure that the front-end tasks are isolated from other departments, so that the data does not fall. The goal that data cannot be taken away; for the back-end simulation verification cluster (the green part on the left of the figure), Alibaba Cloud E-HPC products are used to manage computing and high-performance distributed storage to achieve a high-performance supercomputing environment; through Alibaba Cloud Expressway Connect to IDC resources under the cloud (bottom left of the figure), realize the hybrid cloud on and off the cloud, easily connect the data on the cloud and off the cloud, expand the data center to the cloud, and realize the elastic computing power on the cloud; through the Alibaba Cloud enterprise network product CEN helps to integrate The circuit company realizes the high-speed interconnection of branches around the world, thereby creating a set of enterprise private network for data exchange and office collaboration.

3. Build a comprehensive security environment from the basic side to the data side

Alibaba Cloud provides comprehensive security protection capabilities, and provides corresponding security products from the network side to the data side, including bastion machines, data auditing functions, hardware encryption machines, etc., to help customers build solid barriers.

The storage products CPFS and OSS used by EDA on the cloud all have the ability to encrypt disks, and can provide the highest level of data encryption guarantee services to create a safe and stable cloud data storage space for customers.

(1) Network side
◾ High-defense IP for DDOS attack cleaning;
◾ Cloud firewall for intrusion prevention and traffic control;
(2) Client
◾ SASE, for terminal security control and data DLP management;
(3) Host side
◾ Cloud Security Center, for server host intrusion prevention, baseline check, and patch management;
(4) Account
◾ IDP, integrated AD domain account, and unified account management, improve the ease of use of different systems;
(5) Audit
◾ ActionTrail;
◾ Database audit;
◾ Bastion machine, virtual machine operation and maintenance management screen recording, log retention;
(6) Data security
◾ Data disk encryption: encryption machine, hardware encryption and decryption capabilities of data on the cloud, encrypted storage and use of sensitive data;
◾ KMS, key lifecycle management on the cloud, transparent encryption and decryption capabilities on the cloud;
◾ Data Security Center, for data classification and data protection for OSS and databases.

4. Quickly build an E-HPC cluster on the cloud
◾ E-HPC can provide a complete set of supercomputing PaaS products, which can be quickly built through a graphical interface;
◾ E-HPC can provide login and management nodes, graph nodes, and realize domain control and post-job graphing capabilities;
◾ E-HPC can provide various types of open source scheduler calls natively in the cloud, and can also provide commercial scheduler interfaces for the needs of commercial schedulers, following the offline usage habits;
◾ E-HPC products can automatically complete cluster construction across availability zones, integrate and utilize the resources of multiple data centers, and improve flexible scheduling capabilities.

5. E-HPC-based elastic scaling to automatically match peak business demands
E-HPC can be linked with the scheduler to achieve automatic elastic expansion and contraction based on the load and scheduler policy.
◾ The elastically expanded computing nodes have the ability to automatically mount shared storage and join domains, and can automatically receive schedulers to schedule jobs and improve efficiency;
◾ After the job is completed, E-HPC can release on-demand resources according to pre-configured resource rules, save usage costs, and provide customers with an on-demand elastic scaling environment.

6. Build multiple types of hybrid cloud architectures

Based on the strong compatibility of E-HPC, it can provide a variety of hybrid cloud architectures.

Option 1: Off-cloud management and control, supplemented by on-cloud elastic expansion

Most semiconductor companies have offline computer rooms and equipment, and have completed the deployment and debugging of management and control. Therefore, Alibaba Cloud can provide an EDA hybrid cloud solution that focuses on off-cloud management and control and provides elastic resources on the cloud. This solution is compatible with the original use. It is customary, and deployed in multiple industry customer production environments, is the best practice for EDA hybrid cloud.
◾ Application scenarios: Local construction is the main purpose, and the cloud is used to meet unexpected business needs;
◾ Cluster management: The off-cloud is the main one, the off-cloud queue load reaches the threshold, and the on-cloud resources are called. The proxy manager on the cloud synchronizes the expansion resource information to the manager off the cloud, and writes it to the local domain controller through the script off the cloud;
◾ Security boundary: There is no exit on the cloud, and local security is the main priority;
◾ License deployment: The license server is deployed offline and is authorized to be used by nodes on the cloud;

Option 2: Cloud-based management and control, and offline resources

For some cloud-native semiconductor companies, it is recommended to use the hybrid cloud architecture of option 2, that is, the cloud-based management and control are mainly used, and the resources under the cloud are managed through the E-HPC management and control platform to ensure that the original equipment is not wasted.
◾ Application scenario: The local computer room will no longer be expanded, and the subsequent construction will be based on the cloud;
◾ Shared storage: mainly on the cloud;
◾ Security boundary: cloud security control is the main, no local exit;
◾ Scheduler deployment: The scheduler is deployed online, and the hybrid cloud scheduling on the cloud and off the cloud is realized by loading the agent;
◾ License deployment: The license server is deployed offline and is authorized to be used by nodes on the cloud.

03 EDA cloud recommended products

1. High-performance computing products on the cloud - introduction to elastic computing examples

Alibaba Cloud uses industry-leading hardware solutions based on the latest hardware adaptation optimization; at the same time, Alibaba Cloud fully supports the capability of one cloud and multiple cores, and can provide various CPUs such as Intel, AMD and ARM.

◾ For front-end services, Alibaba Cloud provides a variety of instance specifications with high frequency and large memory. These instance specifications are based on the Dragon architecture, providing high reliability and super performance;
◾ For back-end business, Alibaba Cloud provides bare metal products with ultra-large memory, especially for scenarios with demand exceeding 2T, and provides instance products based on persistent memory. These instances can significantly increase the memory capacity of a single machine, reduce procurement costs, and improve Concurrency efficiency of server jobs in the backend.

2. High-performance storage products on the cloud - Introduction to CPFS

◾ Alibaba Cloud File Storage CPFS is a large-scale parallel file system designed for high-performance computing, with fully parallel architecture, millions of IOPS and OPS, and Tbps-level throughput;
◾ Support Fileset, you can use a variety of enterprise-level functions on Fileset, including: snapshot, quota, data flow, life cycle, QoS, etc.;
◾ Support enterprise-level functions such as ACL, file auditing, and encryption;
◾ Support data flow function, making CPFS a high-performance accelerator for OSS data, and applications can easily access massive low-cost data in OSS through the high-performance file interface of CPFS;

3. Multi-site security R&D environment - introduction of Wuying products

Alibaba Cloud Shadowless products provide shadowless access capabilities for scenarios such as multi-site, home, and business travel R&D.
(1) Ensure code security
◾ The code does not land;
◾ Operation log;
◾ Screen recording audit;
◾ Virus vulnerability scanning;
(2) Improve development efficiency
◾ Quickly build a development environment;
◾ Pre-installed/distributed development tools;
◾ Provide an efficient management console;
(3) Safe and efficient flow of data
◾ Desktop, development, and production are all transmitted in the cloud, which is more efficient and secure;
◾ Multi-location and multi-country office environment integration;
◾ Safe and reliable home office;

No matter where the user is, Wuying products can provide a unified R&D environment and create unified security prevention and control requirements, thus providing the best R&D, data, resources and security for the EDA industry. That's it for my sharing, thank you all.

