Session-based games refer to a type of game where players are gathered in a specific game scenario within a limited time frame. In general, a session is equivalent to a match, and after the match ends, the game relationship between players also ends, and the session terminates as well. Therefore, in the industry, session-based games are commonly understood as "room-based games," where a room hosts the corresponding game session. These types of games often have the following characteristics:
It should have the following capabilities:
Session-based games require direct network connectivity, and there are usually the following considerations:
In the era of traditional game operations, game server business and infrastructure were tightly coupled. Often, when developing server-side programs, an additional port allocation manager was required to avoid port conflicts between different rooms on the same machine. In an ideal state, after cloud-native transformation, game developers no longer need to worry about port allocation for game rooms. Room servers can be horizontally scaled, so each room server should have its own public network access address. However, the native service load balancing model in Kubernetes cannot meet this requirement.
OpenKruiseGame (OKG) provides various networking models to automate the management of public network addresses (EIP + port) for room servers. From then on, game developers do not need to worry about network configuration of the infrastructure, and game operators only need to provide simple parameters to efficiently deploy and automate the management of room server networks. In the OKG mode, each room server corresponds to a pod. For session-based games, the currently available networking models include Kubernetes-HostPort, AlibabaCloud-NATGW, AlibabaCloud-SLB, and AlibabaCloud-EIP. Each networking model has its own characteristics and is suitable for different scenarios.
This model uses the host node's EIP + port as the public network address. This mode is suitable for scenarios where pods are densely deployed on nodes. When the number of small room servers is high, and each node hosts multiple pods, this mode fully utilizes the bandwidth of the EIP on that node, thus maximizing the cost savings of EIP resources.
This model maps different ports of the same SLB to different room servers, allowing each room server to have an independent public network address. This mode has the simplest configuration and requires very few EIPs. However, note that SLB is limited by the maximum number of backend instances. Each SLB can associate a maximum of 200 room servers. When the number of room servers under GameServerSet is about to exceed 200, additional SLB instances need to be added to meet scaling requirements.
This model automates the management of Dnat mapping rules associated with room servers. Users need to install the ack-extend-network-controller component and configure NAT gateway-related parameters. The NATGW model has greater scalability and flexibility compared to the SLB model, but it is also more complex to configure.
This model assigns a separate EIP for each room server pod. This mode consumes more EIPs and is suitable for scenarios where game servers already have a port manager. The exposed port range of the container is the same as the accessed port range, and there is no mapping behavior.
For detailed explanations and examples, refer to the OKG Networking Model documentation.
Once the game rooms have their own independent public network access addresses, the remaining question is how to provide this address to the player's client. Generally, session-based games have a matching service role, and the matching service commonly perceives the network address of room servers in two ways:
1) Active retrieval;
2) Room server self-registration and reporting.
The matching service actively retrieves the available room servers in the current cluster and obtains their corresponding public network addresses. It selects the appropriate room server and returns it to the client. In this case, the matching service needs to call the Kubernetes API to retrieve the NetworkStatus of the GameServer object. For GameServer CRUD operations, you can refer to the e2e test cases in the official Kruise-Game repository.
Alternatively, the room server business can report its network information. In this case, the network information is passed down to the container using the Kubernetes Downward API mechanism and can be accessed by the business logic within the container. Once the business logic obtains the corresponding address, it can be reported. For specific examples, refer to the documentation.
The process of game matching in session-based games can be roughly divided into two stages:
1) Players finding teammates/opponents to form a match;
2) Finding a suitable room server for the match and returning the network address to the players.
In the open-source community, there are game matching frameworks like Open Match, where users only need to implement the matching logic according to the framework's standards. OKG provides the kruise-game-open-match-director component based on Open Match, which mainly helps implement the second stage of the matching process mentioned above - finding a game server for the match and returning the address. This way, users only need to focus on the matching logic in the first stage. For a guide on developing matching services using Open Match and OKG, you can watch the Cloud-Native Game series courses. You can also join the Cloud-Native Game community group (DingTalk ID: 44862615) to participate in discussions or ask questions.
Of course, if there is a custom matching framework/system, you can also integrate it with OKG through simple secondary development. As mentioned in the section on obtaining network information, there are two ways for the matching service to interact with the room servers: actively retrieving the status and network information of the room servers or having the room servers self-register and report. However, for the second method, it is important to confirm the status of the corresponding room server before assigning the address to the player's client. This is because the retrieval and assignment of network addresses are asynchronous, and there may be situations where the room server is not available during the process. As for the first method, the recommended approach is to use the Kubernetes Informer mechanism to listen to the GameServer objects. When there is a need to assign a room server for a match, you can retrieve the currently available GameServer and return the corresponding network information. In this case, the retrieval and assignment of network information are synchronous. For the specific implementation, you can refer to the allocator code in kruise-game-open-match-director.
In the previous section on game matchmaking, we mentioned that the matchmaking service needs to retrieve the status of the game server when assigning the server address to each player's client to ensure that the assigned server is available. So, how should we define the availability of a game server? According to the design principles of OKG, an available game server should be = Infrastructure Runtime State Ready + Infrastructure Network Ready + Business State Ready.
What does "Business State Ready" mean? This involves the definition of the server state based on the game's requirements. We recommend that the game server business state includes at least the following states:
None (No abnormal or special state exists, indicating availability, and this is the default state after the server initialization) Allocated (Already assigned, indicating that players are currently or about to play the game) WaitToBeDeleted (About to be deleted, waiting for OKG to recycle the pod) These three states can be recorded in the GameServer Spec using OpsState. OKG provides two ways to mark the server state:
The simplest state transition model is shown in the figure:
Of course, if you want to start and stop pods less frequently, you can also change OpsState to None after the game. The overall state transformation model is shown in the figure:
After completing the room server state flow design, we will find that some states are determined by the room server business, and these states also need to be exposed to the Kubernetes level so that they can be linked to the automatic scaler, matching system, etc. Therefore, a mechanism is needed to mark the business status on the Kubernetes object, that is, GameServer, and this is the Customized Service Quality function.
Customized service quality automatically marks the room server status on the GameServer through the results of executing the detection script and the detection result corresponding status set by the user.
The following is a status detection script named waitToBeDeleted.sh, which detects whether the value of the GS_STATE environment variable in the container is WaitToBeDeleted.
#!/bin/bash
if [ -z "$GS_STATE" ]; then
exit 1
elif [ "$GS_STATE" = "WaitToBeDeleted" ]; then
echo "$GS_STATE"
else
exit 1
fi
The corresponding GameServerSet yaml should be as follows
...
spec:
...
serviceQualities:
- name: waitToBeDeleted
containerName: battle
permanent: false
exec:
command: ["bash", "./waitToBeDeleted.sh"]
serviceQualityAction:
- state: true
opsState: WaitToBeDeleted
Of course, there can be multiple custom service qualities. For example, when the room server needs to reveal its None status, the script named none.sh is as follows:
#!/bin/bash
if [ -z "$GS_STATE" ]; then
exit 1
elif [ "$GS_STATE" = "None" ]; then
echo "$GS_STATE"
else
exit 1
fi
The corresponding GameServerSet yaml should be as follows:
...
spec:
...
serviceQualities:
- name: waitToBeDeleted
containerName: battle
permanent: false
exec:
command: ["bash", "./waitToBeDeleted.sh"]
serviceQualityAction:
- state: true
opsState: WaitToBeDeleted
- name: none
containerName: battle
permanent: false
exec:
command: ["bash", "./none.sh"]
serviceQualityAction:
- state: true
opsState: None
So far, we have found that the room service business program only needs to set the corresponding GS_STATE environment variable value at the appropriate time node. for example:
In the previous section, we designed three room server states: None / Allocated / WaitToBeDeleted. In this section, we will perform corresponding elastic scaling configuration based on the above room server status.
The ideal state for elastic scaling of conversational games is that during peak business periods, the number of room servers is sufficient to allow players to access the game in seconds; during low business periods, the number of room servers is reduced to save resource costs. OKG provides an automatic scaler that can sense the status of the room server and automatically adjust the replicas value of GameServerSet to achieve the ideal effect of scaling according to the game business status.
In the state management section, we also mentioned that the GameServer whose opsState is WaitToBeDeleted will be automatically recycled by OKG. In this way, as long as the business decides that it will no longer provide services, it can set WaitToBeDeleted through custom service quality. For specific configuration of the scaling strategy, please refer to https://openkruise.io/kruisegame/user-manuals/gameservers-scale/
OKG's core strategy for providing automatic expansion is to ensure that there is an available and sufficient number of room servers. This number is equivalent to buffer and is determined by the user. In OKG, this parameter is called minAvailable.
When the current number of game servers with opsState of None is less than the set minAvailable value, OKG will automatically expand new game servers so that the number of game servers with opsState of None meets the set minimum number.
Kubernetes elastic scaling covers two levels, application layer elasticity and resource layer elasticity. Among them, OKG provides the flexibility of the room server application layer and automatically adjusts the number of pods corresponding to the room server. Only adjusting the number of pods cannot save resource costs. The number of nodes needs to be automatically adjusted. This is how Kubernetes cluster-autoscaler achieves resource layer elasticity. The core principle of cluster-autoscaler is:
For game scenarios, the best practices for automatic scaling are as follows:
Implementing Security Protection Capability in Cloud-Native Gateway
495 posts | 48 followers
FollowAlibaba Container Service - July 16, 2024
Alibaba Container Service - July 5, 2024
Alibaba Cloud Native Community - February 1, 2024
Alibaba Cloud Native Community - November 15, 2023
Alibaba Container Service - July 5, 2024
Alibaba Cloud Community - August 16, 2024
495 posts | 48 followers
FollowWhen demand is unpredictable or testing is required for new features, the ability to spin capacity up or down is made easy with Alibaba Cloud gaming solutions.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreAlibaba Cloud’s world-leading database technologies solve all data problems for game companies, bringing you matured and customized architectures with high scalability, reliability, and agility.
Learn MoreAccelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreMore Posts by Alibaba Cloud Native Community