Best Practices for S-Class New Game Cloud

1. Background introduction

1. The game experience and quality requirements are improved, and higher requirements are placed on cloud computing

In today's game market, the types of games can be described as rich and diverse, including RPG (such as Paladin), e-sports (such as CSGO, DOTA2), MMORPG (such as Yuanshen, Magic Tower), etc. Another name is an open world game with a light science fiction theme. In addition, there are many games such as card games and SLG games.

In order to attract more players, game manufacturers have racked their brains to improve the game experience and quality, and the demand for underlying resources has become larger and more targeted, and the game industry has already reached a consensus in the industry on cloud migration.

Cloud computing can greatly help game manufacturers to improve the technology of underlying resources and reduce costs and increase efficiency, among which elastic computing is particularly important.

2. The demand of game services for elastic computing

When a new game is launched, it needs to provide various types of services. Generally, there will be platform services, operation BI services, game services, etc. Among these services, the "game service" has the greatest impact on the game player experience, which is also the core part of Xinyou.

In most game projects, from the perspective of elastic computing, there are three main forms of game services:

a. Monolithic Architecture
The monolithic architecture is a very common architecture in the early days, which is characterized by simplicity and ease of construction. Game manufacturers only need to focus on the business itself, and hand over all the underlying capabilities to cloud computing manufacturers. This architecture is still used by many companies today;

b. Monomer + non-core business form
This architecture is a relatively common architecture for game servers at present. It inherits the characteristics of a single architecture, and splits non-core businesses, and then launches servers in groups;

c. Service Oriented Architecture
Using this architecture requires some technology stacks. Modularizing game services can improve development efficiency, but it is not suitable for scenarios such as e-sports and combat uniforms; at the same time, due to the complexity of its architecture, the probability of high performance problems such as delays is low. High, many game manufacturers are unwilling to take risks.
In e-sports, MMORPG and other scenarios, the first two architectures are more commonly used. Manufacturers are more willing to focus on the game business itself and hand over the underlying capabilities to cloud computing manufacturers.

2. Case sharing: Elastic Computing Helps Perfect World’s New Game Flexible Launch

1. Project Background

Perfect World is a world-leading cultural and entertainment industry group. The main business of the group covers games, e-sports, film and television and other business sectors, the most important of which is the game sector.

Perfect World representative game works

In 2021, Perfect World will release a number of new games on Ali, including the ARPG game "Relics of God of War", and the MMORPG light science fiction open world game "Magic Tower", as well as an online battle platform produced by Perfect E-sports "Perfect World Competitive Platform" also has the exclusive right to operate CSGO and DOTA2 in mainland China, and hosts some e-sports events every year, attracting e-sports fans from all over the world to participate.

The picture below shows the overall structure of a new MMORPG S-class game launched by Perfect World last year:

The core services involved in the game framework are platform services, operation BI services, game services, leaderboards, etc.; in addition, disaster recovery and backup for various services have also been built.

2. Project challenges

In order to provide the ultimate performance experience, low latency, and carry thousands of players on a single server, the game decided to deploy each game server on a super-large cloud server with 1T memory, which poses a very big challenge to the cloud server ECS :

• A single server needs to provide enough configuration and performance to host thousands of players, supporting the high resource requirements of each player;
• In the case of using a large-memory instance, it is necessary to provide super stability to ensure uninterrupted and non-destructive operation of the business;
• Multiple availability zones are required to provide tens of thousands of core resources during the server opening period, and the elastic capacity can meet the server opening and merging requirements during the peak period, low peak period, and stable period switching of the game;
The game server needs to meet the above core requirements to achieve the desired effect. To this end, Alibaba Cloud provides targeted solutions.

3. Solutions

a. Advantage 1: Extreme performance, seventh-generation IceLake instance

In response to the high-performance requirements of the game server CPU, after multiple rounds of selection tests, it was finally decided to adopt the seventh-generation instance based on the Shenlong architecture to be released in 2021.

The Shenlong architecture has been running stably for four years, ranking first in the function and performance tests of many cloud vendors, and the total score far exceeds other competing vendors

• Processor: using the third-generation Intel® Xeon® scalable processor (Ice Lake), with a base frequency of 2.7 GHz, an all-core turbo frequency of 3.5 GHz, and a processor-to-memory ratio of 1:8;
• Support to enable or disable hyperthreading configuration, support ESSD cloud disk;
• Support IPv6, ultra-high network PPS sending and receiving packet capacity of 24 million, 50G network bandwidth;
• Supports vTPM features, relying on TPM/TCM chips, realizes the trusted measurement of the startup chain from the server to the instance, and provides ultra-high security capabilities;
• Applicable scene:
--High-performance game scenarios, high-performance databases, memory databases, etc.;
--High network packet sending and receiving scenarios, such as video barrage, telecom service forwarding, etc.;
--Data analysis and mining, distributed memory cache;
--Secure and trusted computing scenarios;

During the performance test, the CPU was stress-tested using benchmark models such as specint. At the business level, multiple robot generators were used to concurrently access a single game server for performance testing, simulating the player’s game scene, and the test results fully met the customer’s needs. , and far ahead of several other cloud computing vendors.

b. Advantage 2: Stability guarantee, the industry's first 1T game server lossless migration

For stability requirements, it mainly relies on Alibaba Cloud's powerful stability system, which provides capabilities including:

• Regularly run daily scans based on the AI model, and after eliminating potential software and hardware failures, find out and solve them in a timely manner, and make necessary scheduling;
• Disperse resources to a high degree through a strong resource market capability, differentiate and disperse according to core business and non-core business, and minimize business loss caused by single point of failure;
• Prevent business interruption through lossless migration capability, and the industry's first 1T ultra-large memory lossless migration capability is used in this project;

Advanced Migration Features

• Flexible migration mode: flexibly select the matching hot migration mode according to the load type of VM and the pressure condition of NC;
• Shenlong model hot migration: Shenlong virtual machine provides the industry's first direct device hot migration capability; hot migration stability:
• During migration of super-large VMs, services are kept stable.
Key Thermal Migration Results

• Achieve ECS's world-leading stability SLA (99.975%) through fault prediction and hot migration;
• Non-inductive failure avoidance rate: 95%+;

c. Advantage 3: resource planning, multi-region, stress testing guarantee, batch server deployment

In addition to providing the above capabilities, a series of systematic work has also been carried out throughout the project:

• Initial capacity estimation/Buffer estimation:
Estimate the resource usage of each module according to the number of users designed by a single gameServer and the expected total number of users, and determine the reserved buffer resources according to historical experience;

• Instance selection:
Determine the appropriate instance specifications according to the characteristics of each business, and recommend 7th-generation or 7th-generation high-frequency instances + ESSD in conjunction with specification planning (upgrade), resource distribution volume, and customer demand time points. For example, we recommend 7th-generation high-speed instances for gaming. The main frequency instance meets the requirements of e-sports single-thread performance and fast feedback experience. For example, in MMORPG games, we recommend the 7th generation instance to meet the high configuration and high performance requirements;

• Regional and availability zone planning:
Determine the appropriate availability zone in the target region, ensure that the availability zone has sufficient cabinet resources, and select an availability zone with a longer life cycle to ensure the smooth expansion of the subsequent life cycle of S-level games;

• Pressure testing guarantee:
A professional team guarantees multiple rounds of stress testing before going live, and conducts verification and performance stress testing from multi-layered perspectives such as function, performance, and stability. Alibaba Cloud provides stress testing tools, or customers have their own stress testing methods for pre-launch guarantees;

Guaranteed server opening or merging:
After determining the specifications and resources, determine the batch automatic server opening plan, combined with the elastic characteristics of the cloud, and cooperate with the peak and low peak periods of the game business to ensure that the business can be opened or merged normally. The ROS+OOS solution is recommended.

4. Customer value

Combining the above-mentioned solution capabilities and full-process guarantee support, the customer business side has also obtained corresponding business value, including:

a. Ultimate performance experience
The seventh-generation instance of Shenlong architecture, high-performance, low-latency experience;

b. No business interruption
Super stability capability, non-destructive migration capability of large memory specifications;

c. Carrying millions of players
The supply of tens of thousands of core resources in multiple availability zones ensures fast service launch.

