Cloud Enterprise Network (CEN) is a networking service developed to build private networks for enterprises. It provides an intent-based global cloud network that enables interconnection between data centers and multiple regions worldwide, such as Beijing and Hangzhou. It can also connect various cloud services, such as OSS and RDS. Providing various private network connections is the basic capability provided by CEN.
Compared with CEN 1.0, version 2.0 has continuously expanded its rich connection capabilities to support the VPC loading of multiple CEN instances. It will soon launch cloud and cross-domain multicast to support nearby forwarding capabilities. In terms of scale, CEN 2.0 supports ultra-large-scale networking capabilities, with a maximum of 1000 VPC attachments in a single region and large-scale networking with 5000 routes worldwide, which is 100 times larger than the original networking scale.
CEN supports dynamic route propagation to simplify network management and maintenance. CEN also supports route aggregation and static routing to reduce the number of routes that need to be managed and help maintain networks in a fine-grained manner. CEN allows you to easily integrate security services (such as firewalls) into your private networks to enhance network security by providing support for multiple route tables, associated forwarding correlations, and service chaining. In addition, there are flowlog, traffic marking, ledger, and other capabilities to improve the manageability of the network.
The preceding figure shows the technical architecture behind CEN 2.0. CEN 2.0 is a networking service developed on top of Alibaba Cloud Luoshen Cloud Network Technology.
The underlying layer consists of infrastructure resources, such as data center networks, WANs, Internet, and Express Connect circuits.
The second layer is an architecture that integrates software and hardware, including servers, Mesh of Clusters (MoC), Field-Programmable Gate Array (FPGA), and programmable switch chips. This layer consists of high-performance network gateways and virtual machines that virtualize computing resources and networks.
The third layer is an elastic and open virtual network element (NE) platform named CyberStar. This platform provides environments, disaster recovery capabilities, and elastic scheduling for applications.
The top layer consists of various NEs. The NEs in this layer can focus on business logic without worrying about the underlying layers. In addition to TR, there are many other network elements, such as NAT Gateway, ALB, etc.
The CEN SDN control plane on the right side functions as the brain of the intent-based network of CEN 2.0. CEN 1.0 used Software-Defined Networking (SDN) as the brain to provide various features. Apsara Network Intelligence is used to analyze networks and collect metrics and insights into the network status.
The CEN SDN controller functions as a brain that translates user intents and configurations into resources and connectivity configurations that can be used to connect private networks. The CEN SDN controller can also receive events and trigger scheduling activities to optimize the underlying services. The CEN SDN controller provides the following benefits:
The controller used in CEN 2.0 stores most of the status data in RAM in a layered or distributed manner to ensure the reliability of data and increase the efficiency of data retrieval. This significantly improves the performance of the controller used in CEN 2.0.
Transit routers are the key component used to forward data in CEN 2.0. Transit routers run on the CyberStart platform and are visible to tenants. CyberStar is a network functions virtualization (NFV) platform provided in Luoshen 3.0. CyberStart manages Elastic Compute Service (ECS) clusters on demand to run business.
ECS instances that run workloads are deployed in VPCs. All VPCs use the ENI-bonding technology to redirect user traffic to the ECS cluster connected to the transit router. The adoption of the ENI-bonding technology reserves the features of VPC and Elastic Network Internet (ENI) for the VPC attachments between transit routers and tenants. For example, ENI-bonding can be used with subnet routing 2.0 to implement segmentation or service chaining.
The middle layer in the preceding figure is an ECS resource pool. Each ENI-bonding is associated with multiple ECS instances. User traffic from each VPC is redirected to the associated ECS instances. This horizontally scales the processing capability. An ECS cluster can be automatically scaled out to handle unexpected traffic spikes.
The ECS clusters are deployed in different zones. User traffic is preferably routed to the ECS cluster in the local zone. This way, user traffic can be processed on ECS instances in the local zone to reduce network latency. In addition, multiple zones are used to implement disaster recovery and horizontally scale the computing capacity.
The following methods are used to implement disaster recovery when ECS instances are down. One of the methods is using the scaling capability of ENI-bonding to isolate unhealthy ECS instances. When a small number of ECS instances in the cluster are down, the system can isolate the unhealthy ECS instances and create the same number of ECS instances for replacement. This ensures business continuity. When a large number of ECS instances are down, implementing disaster recovery based on a single cluster is difficult. The system redirects user traffic in VPC attachments to the ECS clusters specified by tenants in other zones to resolve this issue. This minimizes the impact of service interruptions.
CEN 2.0 allows you to specify only one zone when you create a VPC attachment. However, we recommend specifying multiple zones for disaster recovery.
The sandbox method is used to allow tenants to isolate their workloads in a sandbox cluster when traffic spikes adversely affect the services of other tenants.
VPC attachments use the ENI-bonding technology to redirect traffic through cloud-native connections. Before adopting the ENI-bonding technology, we used only ENIs. This approach has the following disadvantages:
ECS clusters cannot be horizontally scaled. In addition, ECS clusters lack disaster recovery capabilities or require a long period to complete failovers.
Due to the limited device virtualization capability, you can create only 16 to 32 ENIs for each ECS instance.
When OS processes the addition of equipment, it will not treat it as a Time-Critical task. Therefore, adding and deleting ENI equipment requires many steps. You need to perform the timing scanning of the PCI bus, operating system response, identifying the equipment type according to the ID of the equipment, querying and loading the corresponding driver, and initializing the equipment allocation memory, etc., before delivering the service processing to the network element. It usually takes minutes, which cannot meet the requirements of fast and elastic scale-in of NFV network elements.
The underlying layer of the CyberStar platform relies on Alibaba Cloud's ENI-bonding technology to resolve the preceding issues. The technology allows you to bind an ENI to multiple ECS instances and add the ENI as a subinterface to the virtual network interface controller (NIC) on each ECS instance. The ENI-bonding technology enables a single ECS instance to support more than 1,000 ENIs and shortens the duration of associating or disassociating an ENI from seconds to subseconds. In the event of a failure, it can health check in real-time and converge in real-time on the conversion plane, which can be switched within or between clusters in seconds. The ENI-bonding technology also provides shuffle sharding to reduce the blast radius of an outage significantly.
Numerous traffic scheduling solutions are provided for traditional networking. Network engineers must configure routes, policy-based routes, and MAC or ARP proxies to deploy network services (such as firewalls and WAN acceleration). The engineers must centrally provision these mandatory resources at the data egress of the network.
Traffic in cloud networks is controlled by SDN. Therefore, only a few solutions are available to address the traffic scheduling issue. Most of these solutions are incompatible with the traditional networking architectures unless you modify or redesign the architectures.
The preceding figure shows how workloads are deployed using CEN 2.0. The user's network consists of the following components:
The first part is Internet access, where services on the Internet are placed, such as NAT, SLB, and EIP. As shown in the preceding figure, two AZs are shown, indicating multi-AZ disaster recovery. The component in the lower-right corner of the figure consists of applications deployed on the cloud. Applications or tenants that belong to different organizations are isolated using VPCs. The component in the lower-left corner of the figure consists of services deployed at the data ingress of the network, such as VPN Gateway, Express Connect, and Smart Access Gateway (SAG). User traffic must pass through security services before the traffic can reach applications. These security services are used to filter east-west traffic between private networks and north-south traffic from private networks to the Internet.
In CEN 2.0, transit routers support associated forwarding correlations and multiple route tables. CEN 2.0 combines these features with subnet routing provided by VPCs to allow you to schedule traffic to your desired network services.
The two route tables used by the transit router in the preceding figure are the key components. The green route table for trusted traffic is only used to route scrubbed traffic to different NEs. The route table for untrusted traffic is used to route all user traffic to the firewall before routing the traffic to the NEs. After the firewall scrubs the user traffic, it routes the traffic back to the transit router. Then, the transit router routes the traffic to the NEs.
Our solution is open to third-party NEs and other service providers, with two forwarding modes: transparent and proxy. We are the first cloud vendor among domestic products to provide such solutions.
CEN 2.0 is developed based on the architecture of Luoshen 3.0.
Luoshen 3.0 will continue to help enterprises manage and analyze large-scale, high-performance, and complex networks and make informed decisions in the future. Luoshen 3.0 is application-oriented and ecological networking platform. It will continuously use cloud-native technologies to help enterprises and institutions expand their networks from the cloud to the edge and connect the digital world.
Wen Shuguang (Fengzhe) is an Alibaba Cloud Networking Senior Technical Expert, currently responsible for the design and development of CEN transit router products. He has long been engaged in virtual, software-defined, and high-performance networks. He has a wide range of interests and research on operating systems, distributed systems, and applications in the cloud era.
Alibaba Cloud Community - April 24, 2022
Alibaba Clouder - April 23, 2020
Alibaba Container Service - May 13, 2019
AlibabaCloud_Network - October 12, 2019
Alibaba Cloud Community - November 4, 2022
Alibaba Clouder - January 9, 2020
Connect your business globally with our stable network anytime anywhere.Learn More
A global network for rapidly building a distributed business system and hybrid cloud to help users create a network with enterprise level-scalability and the communication capabilities of a cloud networkLearn More
This solution helps you improve and secure network and application access performance.Learn More
Alibaba Cloud offers an accelerated global networking solution that makes distance learning just the same as in-class teaching.Learn More
More Posts by Alibaba Cloud Community