Developer Content

1. Introduction to Pravega and Pravega connector

The name of the Pravega project comes from Sanskrit, meaning good speed. The project originated in 2016 and was open-sourced on Github based on the Apache V2 protocol. It joined the CNCF family in November 2020 and became a CNCF sandbox project.

The Pravega project is designed for large-scale data flow scenarios, and is a new enterprise-level storage system that makes up for the shortcomings of traditional message queue storage. It maintains unbounded, high-performance reading and writing of streams, and also adds some enterprise-level features: such as elastic scaling and tiered storage, which can help enterprise users reduce the cost of use and maintenance. At the same time, we also have many years of technical accumulation in the storage field, and can provide customers with persistent storage relying on the company's commercial storage products.

The above architecture diagram describes the typical reading and writing scenarios of Pravega, and introduces the terminology of Pravega to help you further understand the system architecture.

The middle part is a Pravega cluster, which is an abstract system of stream as a whole. Stream can be considered as a topic analogous to Kafka. Similarly, Pravega's Segment can be compared to Kafka's Partition as a concept of data partitioning while providing dynamic scaling.

Segment stores the binary data flow, and according to the size of the data flow, merge or split operations occur to release or concentrate resources. At this time, the Segment will perform a seal operation to prevent new data from being written, and then the newly created Segment will receive new data.

The left side of the picture is the scene of data writing, which supports append only writing. Users can specify a Routing key for each event to determine the ownership of the Segment. This can be compared to Kafka Partitioner. The data on a single routing key has order preservation, ensuring that the order of reading is the same as that of writing.

The right side of the picture is the scene of data reading, and multiple readers will be controlled by a Reader Group. Reader Group controls the load balancing between readers to ensure that all segments can be evenly distributed among readers. At the same time, it also provides a Checkpoint mechanism to form consistent stream segmentation to ensure data failure recovery. For "read", we support both batch and streaming semantics. For streaming scenarios, we support tail reading; for batch scenarios, we will consider high concurrency to achieve high throughput.

2. The past of Pravega Flink connector

The Pravega Flink connector is the connector originally supported by Pravega. This is also because the design concepts of Pravega and Flink are very consistent. They are both flow-based systems integrating batch and flow, which can form a complete solution for storage and computing.

1. The development history of Pravega

connector has been an independent Github project since 2017. In 2017, we developed based on Flink version 1.3. At that time, Flink PMC members including Stephan Ewen joined us to build the most basic Source / Sink function, which supports the most basic reading and writing, and also includes the integration of Pravega Checkpoint. This will be introduced later.

One of the most important bright spots in 2018 is end-to-end support for exactly-once semantics. At that time, the team had a lot of discussions with the Flink community. Pravega first supported the feature of transactional write client, and the community cooperated on this basis. Based on the Sink function, a set of two-phase commit semantics was used to realize the checkpoint-based distributed transaction function. Later, Flink further abstracted the two-phase commit API, which is the well-known TwoPhaseCommitSinkFunction interface, and it was also adopted by the Kafka connector. The community has blogs dedicated to this interface, and the end-to-end one-time semantics.

In 2019, there are more connectors to complement other APIs, including support for batch reading and Table API.

The main focus in 2020 is the integration of Flink 1.11, with a focus on the new feature integration of FLIP-27 and FLIP-95.

2. Checkpoint integration implementation

Taking Kafka as an example, we can first look at how Kafka integrates Flink Checkpoint.

The above figure shows a typical Kafka "read" architecture. Based on the Flink checkpoint implementation of the Chandy-Lamport algorithm, when the Job master Triggers a Checkpoint, it will send an RPC request to the Task Executor. After receiving it, it will merge the Kafka commit offset in its own state storage back to the Job Manager to form a Checkpoint Metadata.

After careful thinking, you can actually find some small problems:

Scaling and dynamic balance support. When the Partition is adjusted, or for Pravega, how to ensure the consistency of Merge when the Partition is dynamically expanded and reduced.
Another point is that Task needs to maintain an offset information, and the whole design will be coupled with Kafka's internal abstract offset.

Based on these shortcomings, Pravega has its own internally designed Checkpoint mechanism. Let's take a look at how it integrates with Flink's Checkpoint.

Also read Pravega Stream. Starting with Checkpoint, there is a difference here. The Job master no longer sends RPC requests to Task Executor, but instead sends a Checkpoint request to Pravega with the ExternallyInducedSource interface.

At the same time, Pravega will use the StateSynchronizer component internally to synchronize and coordinate all readers, and will send Checkpoint events between all readers. When the Task Executor reads the Checkpoint Event, the entire Pravega will mark the completion of the Checkpoint, and then the returned Pravega Checkpoint will be stored in the Job master state to complete the Checkpoint.

This kind of implementation is actually cleaner for Flink, because it does not couple the implementation details of external systems, and the entire checkpoint work is handed over to Pravega to implement and complete.

3. Review the experience sharing of Flink 1.11 high-level features

Flink 1.11 is an important release in 2020. There are actually many challenges for the connector, mainly focusing on the implementation of two FLIPs: FLIP-27 and FLIP-95. For these two new functions, the team also spent a lot of time to integrate, and encountered some problems and challenges in the process. Let's share with you how we stepped on and filled the pits. This article will take FLIP-95 as an example.

1. FLIP-95 Integration

FLIP-95 is a new Table API. Its motivation is similar to that of FLIP-27. It is also to realize the integrated interface of batch and stream, and at the same time, it can better support the integration of CDC. For the lengthy configuration keys, the corresponding FLIP-122 is also proposed to simplify the setting of configuration keys.

1.1 Pravega's old Table API

From the above figure, we can see a Table API of Pravega before Flink 1.10, and from the DDL of building a table in the figure, we can see:

Use update mode and append to distinguish between batches and streams, and the distinction between batch and stream data is not intuitive.

The configuration file is also very lengthy and complicated, and the read Stream needs to be configured through a very long configuration key such as connector.reader.stream-info.0.

At the code level, there are also many couplings with the DataStream API that are difficult to maintain.

In response to these problems, we have a great motivation to implement such a new set of APIs, so that users can better use the abstraction of tables. The entire framework is shown in the figure. With the help of the entire new framework, all configuration items are defined through the ConfigOption interface and are managed centrally in the PravegaOptions class.

1.2 Pravega's new Table API

The figure below is the implementation of the latest Table API table creation, which is greatly simplified compared with the previous ones. At the same time, there are also many optimizations in functions, such as the configuration of enterprise-level security options, the designation of multiple streams and the initial streamcut.

2. Flink-18641 solution process experience sharing

Next, I would like to share a little experience of Flink 1.11 integration, which is about the process of solving an issue. Flink-18641 is an issue we encountered while integrating version 1.11.0. During the upgrade process, a CheckpointException will be reported in the unit test. Next is our complete debug process.

First, I will go to step-by-step breakpoint debugging by myself. By viewing the error log and analyzing the source code of related Pravega and Flink, I will determine that it is some problem related to Flink CheckpointCoordinator;

Then we also checked some submission records in the community and found that after Flink 1.10, the CheckpointCoordinator thread model changed from the original lock-controlled model to the Mailbox model. This model led to some logic that we originally executed in synchronous serialization, which was wrongly executed in parallel, which led to this error;

After further reading the pull request of this change, I also got in touch with some related Committers by email. Finally confirmed the issue on the dev mailing list and opened this JIRA ticket.

We have also summarized the following precautions for our compatriots in the open source community:

Search the mailing list and JIRA to see if someone else has asked a similar question;
Completely describe the problem, provide detailed version information, error log and reproduction steps;
After getting feedback from community members, further meetings can be discussed to discuss solutions;
English is required in a non-Chinese environment.

In fact, as a developer in China, there are other things like mailing list and JIRA. We also have DingTalk groups and videos to connect with a lot of Committers. In fact, it is more of a process of communication. To do open source is to communicate more with the community, which can promote the common growth of projects.

4. Future Outlook

The bigger work in the future is the integration of Pravega schema registry. Pravega schema registry provides management of metadata of Pravega stream, including data schema and serialization method, and stores them. This feature came with the Pravega 0.8 release, the project's first open-source release. We will implement Pravega's Catalog based on this project in the later version 0.10, making the use of Flink table API easier;

Secondly, we also keep an eye on new trends in the Flink community, and actively integrate new versions and functions of the community. The current plans include FLIP-143 and FLIP-129;

The community is also gradually completing the transformation of the new Test Framework based on docker containers, and we are also paying attention and integrating.

Finally, I hope that small partners in the community can pay more attention to the Pravega project and promote the common development of Pravega connector and Flink.

The past, present and future of the Pravega Flink connector

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse