×
Community Blog Dpath Traffic Isolation Solution for New Retail

Dpath Traffic Isolation Solution for New Retail

Alibaba Dpath, a traffic isolation solution, is developed to provide support for new retail scenarios during the Double 11 Shopping Festival.

By Alibaba Middleware Aliware Team

During this year's Double Eleven Shopping Festival, technical preparations had to be changed from previous years to support New Retail. The Business Division required that traffic from new retail will enter independent servers and be isolated from other common traffic. Although conceptually straightforward, this poses higher requirements on the stability of new retail systems.

We have proposed a solution known as Dpath (dedicated path) to cope with these new requirements. The general idea of Dpath can be described as follows:

  1. We can choose some apps on the link as needed, and specify some servers from the public cluster as dedicated servers for special traffic. Then we can provide special support for the special traffic.
  2. Common traffic does not enter dedicated servers, but the special traffic can use common servers as needed. If an app on the link, app_x, does not have a dedicated server, then the special traffic and common traffic share all servers of app_x (public cluster). If app_x has dedicated servers, but these servers are unreachable for some reason, then the special traffic can decide whether or not to use the public cluster based on the configured failover policy.
  3. The specified dedicated servers for each application on the entire link form a dedicated channel for special traffic, which is similar to bus lanes.
  4. The existing routing function of our RPC framework is valid on a single call. It will be troublesome to implement routing of the full link based on the routing function of a single call. Therefore we proposed the Dpath solution for traffic isolation on the entire link.

Working Mechanism of the Solution

We'll introduce how Dpath works in three steps:

  1. Select dedicated servers
  2. Identify special traffic
  3. Direct traffic from the link to the corresponding servers

Dedicated Machine Selection

Simply put, the information we need is the machines, apps, and the relation between them in a dedicated environment. Such information is stored in the configuration center in JSON format. A sample are as follows:

{
"enable": true, 
"envRules": [
    {
        "envName": "newRetail", 
        "failoverPolicy": 0,
        "envAppRules": [
            {
                "appName": "app1", 
                "ips": [ ], 
                "machineGroups": [
                    "app1_newRetail_host"
                ]
            }, 
            {
                "appName": "app2", 
                "ips": [ ], 
                "machineGroups": [
                    "app2_newRetail_host"
                ]
            }, 
            {
                "appName": "app3", 
                "ips": [ ], 
                "machineGroups": [
                    "app3_newRetail_host"
                ]
            }, 
            {
                "appName": "newRetailEntryApp", 
                "ips": [ ], 
                "machineGroups": [
                    "newRetailEntryApp_host"
                ]
            }
        ]
    }
]
}    

The above configuration information describes a dedicated environment called newRetail, which has four apps, app1, app2, app3, and newRetailEntryApp, and their corresponding machines.

The Dpath toolkit subscribes to the configuration, and each middleware can use the Dpath toolkit to get the required information.

Traffic Identification

Dpath uses the dpath_env attribute carried by the trace module (the trace function of the entire link, which can transmit data on the link) to identify the corresponding dedicated environment for the current traffic. With regards to how to map a request information to a dedicated environment, the business division will do it with business logic. This identification can be done by any one of the following three methods:

Method 1: Nginx

Identifies traffic based on http request information. Maps the request to dpath_env in accordance with business rules, and adds the env information that is generated by the Nginx module to the context of the trace module.

Method 2: Entry App

If the environment information retrieved by the middleware is null, then the current environment information of the current machine will be used by default. So after determining the entrance app, simply include the entire entrance app to the dedicated network. All new retail apps use this model currently.

Method 3: Business Code

The business code can set the dpath_env in the context of the trace module as needed, and change the environment of the traffic at any time.

Traffic Direction

Dpath only defines the relationship between machines, environments, and the traffic, and does not specify how to direct traffic. Traffic direction is implemented by the middleware.

Here we take RPC as an example to describe how to direct traffic to the corresponding servers based on the Dpath rules.

To make it easier to understand, we will ignore other routing logic of RPC, and take a look at the processing of a single call in the simplest case. Without DPath, the RPC client gets the list of servers that are corresponding to the service from the registration center, and then calls these servers randomly. As shown in the following picture:

1

After we incorporate the Dpath function, a dpath_env logic is inserted into the mapping between the service name and the server. The RPC client first selects the addresses of the corresponding environments according to the environment information in the request context, and then calls them randomly. As shown in the following picture:

2

On the entire link, dedicated servers of all apps in a dedicated environment are connected to form a dedicated path for special traffic.

3

As shown in the picture above:

  1. newRetailEntryApp traffic that enters the newRetail environment uses dedicated servers
  2. Traffic of newRetail apps that do not have dedicated servers will use the public cluster.

In addition to RPC, middleware products such as message-oriented middleware (MOM) have achieved similar isolation effects using their respective ways. The details will not be described here. A brief effect diagram of RPC and message-support Dpath is shown as follows:

4

Wild Traffic Isolation

Based on the above description, the RPC isolation logic takes effect on the client side. Then, if the clients are not upgraded (it is difficult to quickly coordinate all clients to upgrade uniformly), unknown traffic may be directed to the dedicated servers. Non-conforming traffic sent by non-upgraded clients is called wild traffic.

In order to address the wild traffic issue, developers of the registration center provide a namespace function based on the publishing - subscription function. We will publish services of dedicated servers to the "DPath" namespace, and services of common servers to the "default" namespace by default. Latest clients will subscribe to the data of both default and dpath namespaces, which is equivalent to getting the full address. The registration center ensures that the non-upgraded clients can only see data in the default namespace, so that there will be no wild traffic reaching the dedicated servers.

Summary

Dpath is a universal traffic isolation solution that supports scenarios that need to isolate or direct traffic, such as normal isolation, A/B testing, and blue-green release.

So far, the business division mainly conducts normal traffic isolation on the entire link according to service attributes. This has been used on several new retail scenario lines, and has been successfully implemented during the Double Eleven Shopping Festival.

Some benefits of service traffic isolation are as follows:

  1. This allows the business division to provide service attribute-based supports, such as custom configuration and more comprehensive monitoring.
  2. Important services are not affected by other traffic. For example, important services will not be overloaded or be subject to throttling due to sudden burst of other traffic.
  3. Small clusters support single-service scenarios, the release and roll-back of which are very fast. This allows faster response to problems of a specific service.
1 1 1
Share on

Alibaba Clouder

1,439 posts | 229 followers

You may also like

Comments

Raja_KT February 15, 2019 at 4:39 am

Will not the routing be eating the latency? Of course, when the product is seen , more questions.