• UID623
  • Fans2
  • Follows1
  • Posts72

Play among clouds (I): elastic expansion

More Posted time:Jan 3, 2017 10:56 AM
From the speculation of cloud computing concepts a few years ago, to the vigorous development of various public and private clouds today, more and more users have started reaching out to the cloud and using it as their business operation carrier. Many enterprising users have experienced the flexibility and cost advantages brought about by cloud computing. I personally have learned a lot from my inkling of the cloud, my first interaction with it, to the extensive use of it. Cloud computing technologies and platforms are constantly improving and stabilizing. So I want to put down some of my opinions for reference, and discuss how to utilize cloud computing technology to better enhance development and operation quality of games.

Because of its closeness and the absence of mature and standardized solutions, private cloud is not applicable to most companies in the gaming industry, except a few high-end client game developers or operators. Setting aside the cost and technical thresholds, private clouds have many restrictions against the varied and changing gaming industry in the internet era. In fact, various clouds share similar underlying technologies. But the domestic constraining policies on data centers and networks also contribute to the various restrictions of private clouds. This series will explore the features and some technical architecture of cloud computing technologies, especially the public cloud platforms, from multiple perspectives from development to O&M. Welcome to discuss and share your experiences with cloud technology in the gaming field.

Play among clouds (I): elastic expansion
Speaking of cloud computing, the two most-talked-about advantages are: 1. cost advantages 2. flexibility or elastic expansion.
The cost topic involves many elements and we'll leave it for now. Let’s talk about the flexibility of the cloud (elastic expansion). So what is elastic expansion? It is a little like the tap water we use. We keep the tap water running until enough water for our use has poured out, and pay for the amount of water used. While in the past, we had to count the people first, and then plan the number of wells needed to be dug. For game developers and operators, cloud computing is like tap water. The demand for network, computing, and storage resources changes frequently in different phases throughout the lifecycle of the game. In the age of client games, these demand changes could be accurately estimated based on experience, even including the curves of user volume. But what about in the mobile internet age? The gaming field features fast development, explosive growth of user volumes and short lifecycles.

For example, the Crazy Guess Figure game last year went viral as soon as it was launched, with its users increasing by more than 300,000 every single day. The explosive growth of its user volume was in the lap of the gods. Now the development time of many games is very short, especially web-based games and mobile phone games. The development may take as short as several weeks, or a few months at the longest. Considering the features of domestic players and playability among other factors, many games can survive for 1 to 2 years. The outbreak of user volume is also basically within the beginning period after the game is launched to the market. The whole IT system structure of the game will always experience an outbreak, followed by a stable volume, and then a gradual reduction.

Imagine the traditional practices when cloud computing hadn’t yet emerged. We have to predict the user volume, and estimate the number of concurrent users online and the IT resource consumption status. With the advertisement rolled out promoting the game, the user number quickly increases to 1 million. No problem, because we have deployed the gaming servers two months ago. But in the mobile internet age, it is very hard to estimate the number of game players. Apart from the playability of the game itself, many external factors will also influence the user count, such as fashion trends, national policy or public events. For example, I believe no one would think that the Flappy Bird game could be so popular when it was still in development. Luckily it is just a standalone game. If it is a web game, the traditional practice of deploying servers in advance may face a dilemma that: either the user number is less than expected and IT resources go idle, or the user volume surges, causing servers to run at their full load within a short period of time (one or two weeks). Even if the game operator reacts quickly and immediately initiates the procurement procedures, places orders, and deploys and debugs servers, at least two weeks will be needed, and this two weeks is vital determining the fate of the game. Any failure or access delay events during the two weeks will lead to user loss and the lost users will never come back.

What will happen with cloud computing?
In the development stage, you only need to purchase a few just-enough low-configuration units of cloud hosts. In the alpha test and beta test stages, you can add new servers based on cloud host loads as needed. After the game is officially launched online and in promotion, you can adjust the resources or expand servers at any time based on the increase of users and server loads. While in the later half of the game’s lifecycle when the user volume drops, you can gradually shut down some cloud servers to ensure the in-service cloud servers are always busy. You can even choose to add cloud servers or increase bandwidth during the peak periods of game playing, such as 7 p.m. to 0 a.m., or during city wars that generate large visitor traffic, and reduce corresponding resources when the visits are few.

Isn’t this idea nice? Yes, indeed. However, the hardware or the virtual IT resources can be elastically expanded, but how can we elastically expand the gaming software itself? How can we expand the gaming software along with the increase of servers, and carry out the expansion online without influencing the existing system at all?
Independent development of distributed gaming software architecture is one way, but it is too technically demanding and requires a large investment to ensure the stability and performance.
Gaming applications share the same logic. No matter whether there are different zones or servers, only data varies. Currently, the most common practice is to utilize load balancing devices to achieve elastic expansion. What is the role of load balancing? In manifestation, the load balancer is a uniform IP address or access entry for the external requests, and it actually corresponds to multiple internal servers. Load balancing device serves to: 1. polls the back-end server loads and distributes requests to servers reasonably to ensure server loads are even; 2. ensures business continuity. When one or more servers fail, requests will only be forwarded to healthy servers. This can also avoid single point of failure of game servers or achieve disaster recovery.
So how is load balancing achieved? You can use open-source software to achieve load balancing automatically. If you want to use a cloud service, you can consider ELB by Amazon abroad, or SLB by Alibaba Cloud in China. You can use Google to find out more if interested.
With load balancing, we have the foundation to achieve elastic expansion. Game servers behind the load balancer are equal. When the number of visits increases, you can just install and configure a new game server and deploy it behind the load balancer, and the system is seamlessly expanded. Some may ask: it doesn’t seem so easy to install and configure a new game server, right? Yes if you are using traditional physical servers, as you need at least a disc to set up the server. But what if it is a cloud host? An image is all you need. The so-called images are like the .iso file on a PC or the .vmdk file on the virtual machine. You can make the original game server system, software, and configuration into an image. When you add a new cloud host, the host can be created directly using the image, without any installation and configuration process. The new game server will be ready in just a few minutes. If you perform the operation through API, ten minutes will be enough, counting in the startup time. Isn’t it fast? With the improving cloud computing platform, elastic expansion will become a service in the future. That is to say, you only need to set the policy, such as adding a new server when the load of all servers exceeds 90%, and releasing a cloud host when the server load is lower than 60%. The whole system will become very flexible and you don’t need to worry about the surging user visits any more.

Some may also ask: placing game servers behind the load balancer does enable the computing elasticity, but what about the data elasticity? How can I expand my data volume or access capacity elastically when there is a bottleneck emerging? Speaking of data, we should first talk about the storage. You may have heard of cloud storage many times. Cloud storage is easier, so the storage-layer virtualization has long been achieved. Many cloud storage service providers claim to offer an infinite space, but of course the storage speed is also a very important factor to be considered. So what about databases and how can we avoid the bottleneck caused by a single database server? Those who are familiar with MySQL should know database- and table-based data splitting. In a database with a large scale of data volume and visits, the stress on the database is apportioned to multiple database instances based on a certain rule, such as by user ID. There are a lot of descriptions on the internet. Currently the database- and table-based data splitting has to be planned on your own. It is said that some cloud platforms have prepared to launch distributed database services, that is, what the external parties (application layer) see is a database, but in fact the underlying layer corresponds to multiple database instances. Database- and table-based data splitting and load apportioning are all automatically completed by the cloud system. If this service is available, the elasticity at the data layer will become very easy. The database scale will be automatically adjusted with the game’s data volume or visits, and everything will become very flexible and intelligent. This feature would be thrilling. Let’s wait and see which cloud platform will launch such a distributed database service first.

Next let’s move on to the bandwidth issue (to be further discussed later). Many gaming operators are nagged by the bandwidth problem. For many games, the bandwidth is the priority among priorities. Since it is a cloud, the elastic resizing of bandwidth will of course be indispensable. The cloud platform can adjust bandwidth immediately through the API, which is just much more convenient than expanding the bandwidth of IDC-hosted or self-built servers. You can gain full control over the bandwidth of the cloud platform, and even automate it.

You may all know the Dazhangmen game, whose user volume and revenue hold a safe lead in the industry. It can be said that the success of this game cannot go without its utilization of the cloud platform. It is just because of the various elasticity of cloud computing that the game can cope with the surging user volume and ensure the system stability at the same time. Another example will be the adoption of game updating servers. The updating servers are only needed when the game releases an update package. At that time, there will be large visitor traffic to the server. If there is only one updating server in service, the stress from client updating may be too much for it, compromising the user experience. But you can temporarily deploy multiple updating servers, five for example, on the cloud platform when releasing the update package, and with users completing the game updating, shut down the updating servers one by one. This can be conveniently achieved with the elastic expansion feature of the cloud platform.

Therefore, the cloud platform can be instantly available just like the tap water, used on demand, and paid as you go, throughout the game’s development, promotion, stabilizing and recession periods. It can also guarantee a significant improvement in efficiency and resource utilization in various phases of game development and operation.