EDA cloud case sharing of a leading IC design enterprise

Introduction: On September 20, 2022, the program "Alibaba Cloud EDA Cloud Solution" was officially launched. Three experts from Alibaba Cloud showed you how Alibaba Cloud helped chip design enter the "cloud highway" from multiple perspectives. Wang Xuwen, an architect of biomedical&integrated circuit industry solutions in Alibaba Cloud Intelligent Shanghai Branch, delivered a speech entitled "Sharing cases of EDA in the cloud of a leading IC design enterprise". The following is a summary of his sharing content for reading:

1、 Customer Project Background

1. Customer introduction

In recent years, with the maturity and commercialization of new technologies such as 5G, the Internet of Things and artificial intelligence, and the accelerated development of smart cities, autonomous driving, intelligent manufacturing and other fields, the semiconductor industry has experienced a wave of rapid growth.

The client of this project is a world-famous semiconductor design company, which is one of the few enterprises in the world that fully master 2G/3G/4G/5G, Wi Fi, Bluetooth, TV FM, satellite communication and other full scene communication technologies. Its products include mobile communication central processor, baseband chip, AI chip, RF front-end chip, RF chip and other communication, computing and control chips, covering hundreds of countries worldwide.

2. Project Background

Moore's Law continues to play a role in improving the chip process. The number of transistors per unit area doubles every 18-24 months, which means that the computing power required for chip R&D and design also increases. At the same time, IC design enterprises are very concerned about R&D efficiency. If the chip is streamed one day earlier, it means that they will start to make profits in the market one day earlier; However, projects that cannot be listed on time may miss the best window period and thus lose market opportunities.

The customer has realized that the traditional offline computing power deployment mode can no longer meet the needs of rapid business growth, but is still very cautious about the decision to design, develop and go to the cloud.

3. Main challenges and concerns faced by customers

a. Challenges of traditional offline deployment:

① Insufficient computing power and elasticity

The customer has established a large data center offline, with thousands of servers, but still cannot meet the needs of the R&D department. Especially after the R&D from front-end logic design to back-end physical design, the demand for computing power has doubled; If some BUGs are encountered and the task needs to be rerun, the job queuing is serious. However, the capacity expansion potential of customers' offline data centers has been exhausted due to room space and power consumption indicators;

② Long lead time, affecting R&D progress

Before going to the cloud, IT departments need to go through a series of steps such as project approval, procurement, bidding, arrival and deployment to purchase machines offline, ranging from three months to six months. The epidemic situation has further exacerbated the uncertainty of the supply chain, making offline computing power procurement more uncontrollable;

③ The operation and maintenance workload is heavy, and the IT department is under pressure

In the face of thousands of offline servers, the IT department has to spend a lot of manpower to carry out basic operation and maintenance, from power, air conditioning, security, to hardware operation and maintenance, leaving front-line operation and maintenance personnel overwhelmed;

④ Lack of effective control over the use of computing power

In order to save computing power and storage resources, IT departments need to take quota restrictions, utilization monitoring statistics and other measures to urge R&D personnel to release resources in a timely manner. This process is also unpleasant.

b. The customer's concern about the cloud application of design and R&D is mainly reflected in four aspects:

① Data security

The research and development of a chip requires a large amount of capital investment, so customers are extremely sensitive to the security of data. R&D on cloud means that the data leaves the original physical boundary. How to ensure that the data security is still controllable is the bottom line issue for customers to consider;

② Performance meets requirements

After long-term optimization, customers can maximize their offline computing power. Due to the lack of understanding of cloud computing, customers worry that virtualization, resource oversold, etc. may lead to the fact that the actual computing resources can not achieve the same performance as offline, affecting the operational efficiency of R&D operations;

③ Business experience

Whether the workflow and usage habits established by R&D personnel in offline clusters can be seamlessly migrated to the cloud is the key to the successful promotion of cloud projects;

④ Input output ROI

The customer's purchasing department preliminarily estimated that the cloud purchasing power, including the costs of special lines and security, has no price advantage over offline purchasing machines.

2、 Alibaba Cloud EDA Cloud Solution

1. Give full play to the advantages of public cloud to solve customers' business challenges

First of all, compared with the customer's original offline computing deployment model, Alibaba Cloud's public cloud solution can perfectly solve the problems of insufficient computing power, long delivery cycles, heavy operation and maintenance workload, and lack of control.

a. In terms of computing power supply

• Elastic computing power supply: relying on Alibaba Cloud's rich cloud resources and supply chain coordination capabilities, it can provide customers with sufficient supply guarantee;

• Not limited to the space of the computer room, with adequate resource guarantee: Alibaba Cloud has up to 12 zones in Shanghai, providing flexible computing and storage resources, so customers no longer need to worry about the shortage of offline computer room space;

b. In terms of lead time

• Minute level delivery: Alibaba Cloud can achieve minute level resource delivery;

• Capacity expansion on demand and out of the box: at the peak stage of design operations, customers can expand capacity on demand and out of the box, avoiding the lengthy link of original equipment procurement, arrival, installation and deployment;

c. In terms of operation and maintenance management

• Operation and maintenance free infrastructure: cloud services can free IT departments from inefficient infrastructure operation and maintenance;

• Unified console management: the operation and maintenance personnel can easily manage and dispatch the full amount of resources through the unified console;

• Automatic deployment: One click automatic deployment can be realized for the running environment, application software, scheduler agent, etc;

d. In terms of operation control

• Integrated resource monitoring: Alibaba Cloud provides an integrated resource monitoring platform;

• Usage quota management and monitoring: 24x7 continuous usage monitoring for computing, storage and other resources;

• Multidimensional performance analysis: help operation and maintenance personnel carry out refined resource control.

2. Through quantitative analysis and POC measurement, customers' doubts about going to the cloud can be dispelled

In response to the customer's concerns about cloud, Alibaba Cloud has successfully dispelled the doubts of various stakeholders within the customer through POC testing, technical discussion, demonstration and analysis, etc.:

a. In terms of data security

• Data security commitment: Alibaba Cloud solemnly promises in the form of a formal contract that the security and privacy terms designed by the project will not touch customer data;

• Download encryption scheme: on the technical level, it provides users with the ability to encrypt their own secret keys on the cloud and download to ensure that the ownership of data is firmly in the hands of customers;

• Security operation audit: Alibaba Cloud provides the ability to audit security operations. Customers can request to audit Alibaba Cloud's operation and maintenance logs of related cloud resources through work orders;

b. In terms of performance satisfaction

• Model specifications and performance parameters benchmarking: benchmarking the specifications and performance parameters of customers' offline models, using high frequency, large memory and bare metal servers with local disks;

• Third party tool pressure testing: use a third party pressure testing tool to test the performance of cloud instances. The actual pressure testing shows that the computing storage performance provided by Alibaba Cloud can fully benchmark with offline, and some test items are even better than offline. When disk dropping encryption is enabled, the performance loss of cloud instances is generally less than 10%;

c. In terms of business experience

• Use the same operation scheduler as offline;

• Adopt the same dispatching strategy as offline;

• Adapt to the use habits of R&D personnel to the greatest extent;

d. In terms of input and output

• By sorting out and analyzing the total cost of ownership (TCO) of research and development on the cloud, Alibaba Cloud's on cloud elastic computing services can save the hidden costs of offline machine room construction, power, operation and maintenance, reduce the risk costs caused by machine failures, avoid the waste costs caused by machine restrictions in the low peak period of business, and improve the opportunity costs caused by insufficient computing power in the peak period of business;

• For the company's finance, purchasing cloud services on demand can also turn CAPEX investment into OPEX expenses to improve the company's cash flow.

3. Alibaba Cloud EDA Cloud Solution

Based on EDA business characteristics and requirements, Alibaba Cloud has tailored the following solutions for customers: (see the figure below)

On the left side of the figure is the customer's offline computer room. The customer deployed a high-performance computing cluster. With the NetApp storage scheme, the IBM LSF scheduler, which is widely used in the industry, is used for job scheduling; On the right is the EDA zone of Alibaba Cloud East China II public cloud region, which is interconnected through two 10GB high-speed channels.

a. Machine room location:

Select the location of the computer room nearest to the customer, so that the data transmission delay is controlled at the millisecond level;

b. Calculation node:

Elastic bare metal servers with high frequency and large memory are provided as required. The bare metal server can physically ensure that the customer completely monopolizes the machine resources, turn off overclocking and Remax from the BIOS level, and cooperate with Alibaba Cloud's independently developed MOC card technology to avoid virtualization losses, so as to ensure that each bare metal server can play its utmost computing performance;

C. Storage

The parallel file system CPFS is adopted, which is characterized by high performance, high scalability and high reliability. A single cluster can expand to 9620 storage nodes at most, and support 2.5TB/s throughput at most;

d. Cluster management

Alibaba Cloud E-HPC is used to uniformly control elastic bare metal computing nodes. E-HPC's scheduler plug-in can support the automatic deployment of LSF agents and seamlessly connect to E-HPC to provide corresponding node management, job management, automatic scaling and other capabilities.

3、 Features and advantages of Alibaba Cloud EDA cloud solution

Compared with the customer's offline physical machine deployment and the solutions provided by other manufacturers, Alibaba Cloud's EDA cloud solution has the following advantages:

1. Extreme performance

Alibaba Cloud's high frequency, large memory bare metal servers specially built for the chip design industry have excellent performance through actual measurement, fully meeting the extreme requirements of EDA software for computing power;

2. Visual architecture

Introduced the cloud fast build product CADT, the first in Alibaba Cloud's industry, to help customers create and manage cloud architectures. It can show cloud architectures with deployable architecture diagrams, clearly express the deployment relationship of various basic product components, and reduce the time cost of customer solution design and evaluation stages;

3. Elastic deployment

With E-HPC's flexible deployment, elastic resources, and unified operation and maintenance capabilities, online computing cluster management is simpler and more efficient;

4. Security Compliance

As an extension of offline computer rooms, the cloud environment eliminates external attacks because there is no public network exit. The Cloud Security Center identifies, analyzes and alerts security threats in real time, and disk encryption can be used as the last line of defense for data protection.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us