How difficult is full link tracking?

The Value of Full Link Tracking

The value of link tracking lies in "correlation", where end users, backend applications, and cloud components (databases, messages, etc.) together form the topology map of the trajectory of link tracking. The wider the coverage of this topology, the greater the value that link tracking can bring. And full link tracking is the best practice solution that covers all associated IT systems and can fully record user behavior, call paths, and states between systems.

Complete full link tracking can bring three core values to the business: end-to-end problem diagnosis, system dependency sorting, and custom tag transmission.

End-to-end problem diagnosis: VIP customers failed to place orders, internal test user requests timed out, and many end user experience issues were traced back to anomalies in backend applications or cloud components. And full link tracing is the most effective means to solve end-to-end problems, and there is no one.

Intersystem dependency sorting: New business is launched, old business is abolished, computer room relocation/architecture upgrade, and the dependency relationships between IT systems are complex and beyond the scope of manual sorting. Topology discovery based on full link tracking makes the above scenario decisions more agile and reliable.

Customized tag transmission: full link pressure testing, user level grayscale, order traceability, and traffic isolation. The hierarchical processing and data association based on custom tags have derived a prosperous full link ecosystem. However, once data disconnection or tag loss occurs, it will also trigger unpredictable logical disasters.

Challenges and Solutions for Full Link Tracking

The value of full link tracking is directly proportional to the coverage range, and its challenges are also the same. To ensure maximum link integrity, whether it is front-end applications or cloud components, whether it is Java language or Go language, whether it is public cloud or self built computer rooms, it is necessary to follow the same set of link specifications and achieve data interconnection. The three major challenges in achieving full link tracking are the unification of multilingual protocol stacks, front-end/back-end/cloud (multi) end linkage, and cross cloud data fusion, as shown in the following figure:

1. Unified Multilingual Protocol Stack

In the era of cloud native, multilingual application architectures are becoming increasingly common, and utilizing different language features to achieve optimal performance and research and development experience has become a trend. However, the maturity differences of different languages make it impossible to achieve complete capability consistency in full link tracking. The current mainstream approach in the industry is to ensure that the remote call protocol layer format is unified, and multi language applications implement call interception and context transparency internally, which can ensure the integrity of the basic link data.

However, most online problems cannot be effectively located and solved only through the basic ability of link tracking. The complexity of online systems determines that an excellent Trace product must provide more comprehensive and effective data diagnosis capabilities, such as code level diagnosis, memory analysis, thread pool analysis, lossless statistics, and so on. Fully utilizing the diagnostic interfaces provided by different languages and maximizing the release of multilingual product capabilities is the foundation for Trace's continuous development.

Standardization of transparent protocols: All applications on the entire link need to follow the same set of protocol transparent standards, ensuring that the link context can be fully transparent between different language applications without problems such as broken links or missing contexts. The current mainstream open source transparent protocols include Jaeger, SkyWalking, ZipKin, and others.

Maximizing the release of multilingual product capabilities: In addition to the most basic call chain functions, link tracing has gradually derived high-level capabilities such as application/service monitoring, method stack tracing, and performance analysis. However, the maturity of different languages leads to significant differences in product capabilities, for example, Java probes can achieve many high-level edge side diagnostics based on JVMTI. An excellent full link tracking solution will maximize the differentiation technology dividends of each language, rather than blindly pursuing convergence and mediocrity. Interested students can read the previous article 'How to Choose Open Source Self built/Hosted and Commercialized Self developed Traces'.

2. Front and rear cloud (multi) end linkage

At present, open-source link tracking implementations mainly focus on the backend business application layer, lacking effective embedding methods on user terminals and cloud components (such as cloud databases). The main reason is that the latter two are usually provided by cloud service providers or third-party vendors, depending on whether the vendors are friendly to open source compatibility and adaptability. And it is difficult for business parties to directly intervene in development.

The direct impact of the above situation is that the front-end page response is slow, making it difficult to directly locate which application or service on the backend is causing it, and it is not possible to provide a definitive root cause. Similarly, exceptions in cloud components are difficult to directly equate with business application exceptions, especially in scenarios where multiple applications share the same database instance, requiring more roundabout methods for verification, resulting in very low efficiency in troubleshooting.

To solve such problems, cloud service providers need to better support open source link standards, add core method buried points, and support open source protocol stack transparency and data backflow (such as Alibaba Cloud ARMS front-end monitoring supporting Jaeger protocol transparency and method stack tracking).

Secondly, due to issues such as ownership, different systems may not be able to achieve full link protocol stack unification. In order to achieve multi end linkage, the Trace system needs to provide a solution for bridging heterogeneous protocol stacks.

Heterogeneous protocol stack connectivity

In order to achieve the connectivity of heterogeneous protocol stacks (Jaeger, SkyWalking, Zipkin), the Trace system needs to support two capabilities: firstly, protocol stack conversion and dynamic configuration, such as the front end passing down the Jaeger protocol, and the newly accessed downstream external system using the ZipKin B3 protocol. The Node.js application between the two can receive the Jaeger protocol and transmit the ZipKin protocol downwards, ensuring the integrity of full link mark transmission. The second is the server-side data format conversion, which can convert different reported data formats into a unified format for storage, or be compatible on the query side. The former has relatively lower maintenance costs, while the latter has higher compatibility costs but is relatively more flexible.

3. Cross cloud data fusion

Many large enterprises have chosen multi cloud deployment due to factors such as stability or data security. For example, domestic systems are deployed on Alibaba Cloud, overseas systems are deployed on AWS Cloud, and systems involving sensitive internal data are deployed in self built computer rooms. Multi cloud deployment has become a typical cloud deployment architecture, but the network isolation of different environments and the differences in infrastructure have also brought huge challenges to operation and maintenance personnel.

Due to the fact that cloud environments can only communicate through the public network, in order to achieve link integrity under a multi cloud deployment architecture, methods such as cross cloud reporting and query of link data can be adopted. Regardless of the method, the goal is to achieve unified visibility of multi cloud data and quickly locate or analyze problems through complete link data.

Cross cloud reporting

The implementation difficulty of cross cloud reporting of link data is relatively low, making it easy to maintain and manage. It is currently the mainstream approach adopted by cloud manufacturers, such as Alibaba Cloud ARMS, which achieves multi cloud data fusion through cross cloud data reporting.

The advantages of cross cloud reporting are low deployment costs and easy maintenance of a set of servers; The disadvantage is that cross cloud transmission will occupy public network bandwidth, and the cost and stability of public network traffic are important limitations. Cross cloud reporting is more suitable for a one master multiple slave architecture, with the vast majority of nodes deployed within a cloud ring, while other cloud/self built computer rooms only account for a small amount of business traffic. For example, a certain enterprise's ToC business is deployed in Ax Cloud, and internal applications are deployed in self built computer rooms, which is more suitable for cross cloud reporting, as shown in the following figure.

Cross cloud query

Cross cloud query refers to storing the original link data in the current cloud network, distributing a user query separately, and then aggregating the query results for unified processing to reduce public network transmission costs.

The advantage of cross cloud query is that the amount of data transmitted across networks is small, especially the actual query volume of link data is usually less than one tenth of the original data volume, which can greatly save public network bandwidth. The disadvantage is that multiple data processing terminals need to be deployed, and complex calculations such as quantile and global TopN are not supported. It is more suitable for multi master architectures, and can support simple link splicing and max/min/avg statistics.

There are two modes for cross cloud query implementation. One is to build a centralized data processing terminal within the cloud network, and connect the user network through an intranet dedicated line, which can process data from multiple users simultaneously; Another approach is to build a separate set of VPC data processing terminals for each user. The former has lower maintenance costs and greater capacity elasticity; The latter has better data isolation.

Other methods

In addition to the above two schemes, mixed mode or transparent mode can also be used in practical applications.

The hybrid mode refers to the unified reporting of statistical data through the public network for centralized processing (small amount of data, high precision requirements), while the link data is retrieved by cross cloud query (large amount of data, low query frequency).

The transparent mode only refers to ensuring that the link context can be fully transparent between each cloud environment, and the storage and query of link data are independently implemented. The advantage of this model is that the implementation cost is extremely low, and each cloud only needs to follow the same set of transparent protocols, and the specific implementation scheme can be completely independent. By manually concatenating with the same TraceId or application name, it is more suitable for rapid integration of existing systems and has the lowest transformation cost.

Full Link Tracking Access Practice

The previous article provided a detailed introduction to the challenges and solutions faced by full link tracking in various scenarios. Next, taking Alibaba Cloud ARMS as an example, we will introduce how to build a complete and observable system that runs through the front-end, gateway, server, container, and cloud components from 0 to 1.

• Header transparent format: Unified use of Jaeger format, Key is uber trace id, Value is {trace id}: {span id}: {parent span id}: {flags}.

Front end access: CDN (Script injection) or NPM can be used as two low code access methods, supporting Web/H5, Weex, and various mini program scenarios.

Backend access:

Java applications recommend prioritizing the use of ARMS Agent, which is non-invasive and does not require code modification. It supports advanced functions such as edge diagnosis, lossless statistics, and precise sampling. User defined methods can be actively buried through the OpenTelemetry SDK.

Non Java applications are recommended to access through Jaeger and report data to ARMS Endpoint. ARMS is compatible with link transparency and display between multilingual applications.

The current full link tracking solution of Alibaba Cloud ARMS is based on the Jaeger protocol, and the SkyWalking protocol is being developed to support lossless migration of SkyWalking's self built users. The call chain effect of full link tracing for front-end, Java applications, and non Java applications is shown in the following figure:

1. Frontend Access Practice

ARMS front-end monitoring supports Web/H5, Weex, Alipay, WeChat applet, etc. This article takes the example of Web applications accessing ARMS front-end monitoring through CDN to briefly describe the access process. For detailed access guidelines, refer to the ARMS front-end monitoring official website documents.

1. Log in to the ARMS console, click the access center in the left navigation bar, and click to select the front-end Web/H5 access.

2. Enter the application name and click Create; Check the options required in the SDK extension configuration item area to quickly generate the BI probe code for the page to be inserted.

3. Choose asynchronous loading, copy and paste the following code into the first line inside the * * * element in the HTML of the page, and then restart the application.

In order to achieve connectivity between the front and rear links, the probe code mentioned above must include the following two parameters:

1. enableLinkTrace: true//Indicates that the front-end link tracking function is enabled

2. linkType: 'tracing'//Indicates the generation of Jaeger protocol format link data, and the Hearder allows uber trace id transparent transmission

In addition, if the API is not from the same source as the current application, the enableApiCors: true parameter needs to be added, and the backend server also needs to support cross domain requests and custom header values. Please refer to the front-end and back-end link association documentation for details. To verify whether the front-end and back-end link tracking configuration is effective, you can open the console to check if there is an uber trace id in the corresponding API request's Request Headers.

2. Java Application Access Practice

Java applications are recommended to access ARMS JavaAgent, which is a non-invasive probe that can be used out of the box without modifying the business code. For detailed access guidelines, please refer to the ARMS application monitoring official website documentation.

1. Log in to the ARMS console, click Access Center in the left navigation bar, and click to select backend Java access.

2. Choose any method of manual installation, script installation, or container service installation as needed.

3. According to the operation guide, ensure that the probe is downloaded and decompressed to the local location. After correctly configuring the appName, LicenseKey, and javaagent startup parameters, restart the application.

3. Non Java Application Access Practice

Non Java applications can report data to ARMS access points through open source SDKs (such as Jaeger). For detailed access guidelines, refer to the ARMS application monitoring official website document.

1. Log in to the ARMS console, click the access center in the left navigation bar, and click to select access methods such as Back end Go/C++/. NET/Node.js.

2. Replace the access point according to the operation guide, and restart the application after the configuration is completed.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us