Flink CDC 2.3 Releases New Db2 Support
Related Tags:1.Realtime Compute for Apache Flink
2. Flink Python
Flink CDC [1] is a database based log CDC technology, which realizes the data integration framework of full incremental integrated reading. With Flink's excellent pipeline capability and rich upstream and downstream ecology, Flink CDC can efficiently realize real-time integration of massive data.
As a new generation of real-time data integration framework, Flink CDC has technical advantages such as full incremental integration, lockless reading, parallel reading, automatic synchronization of table structure changes, and distributed architecture. At the same time, the community provides complete document support [2]. In the more than two years of open source Flink CDC, the community has grown rapidly. At present, the Flink CDC community has 76 contributors, 7 maintainers, and more than 7800 community nail users.
With the joint efforts of community users and contributors, Flink CDC 2.3 officially released:
In Version 2.3, 49 community contributors participated in the contribution, 126 issue were solved, 133 PRs were merged, and 170+submissions were made by contributors. From the perspective of code distribution, MySQL CDC, MongoDB CDC, the flash cdc base module and the document module all bring users many features and improvements.
Faced with so many improvements and features, this article will take you 3 minutes to quickly understand the major improvements and core features of Flink CDC version 2.3 through the following figure.
Add Db2 CDC connector, unlock and read Db2 database, and support full and incremental integration synchronization.
Both MongoDB CDC connectors are connected to the incremental snapshot framework, which provides the capabilities of non lock read, concurrent read and breakpoint resume.
MySQL CDC connector has been greatly improved in performance optimization and stability in version 2.3, greatly improving production stability and performance.
Flink CDC version 2.2 is compatible with Flink 1.13 and Flink 1.14. On this basis, Flink CDC 2.3 continues to be compatible with Flink 1.15&1.16, which is compatible with the four major versions of Flink. This means that the SQL connector of CDC can run on different Flink clusters without any modification. If you are a DataStream user, you can also refer to the SQL connector packaging method to achieve cross version compatibility.
The OceanBase CDC connector supports docking all database types to Flink SQL, that is, all types of OceanBase fields support synchronization.
MySQL CDC and OceanBase CDC connectors provide Chinese documents, which can better help Chinese users.
Flink CDC version 2.3 has brought many important improvements and features. This article selects the most important four for in-depth interpretation.
Db2 is a relational database management system developed by IBM [3]. The Db2 CDC connector can capture row level changes of tables in the Db2 database. Its implementation principle is based on the SQL replication capability provided by ASN Capture/Apply agents to save changes of tables with capture mode enabled in the database to the specified change table. The Db2 CDC connector first reads the historical data in the table through JDBC, and then obtains the incremental change data from the change table to achieve full incremental synchronization.
In Flink CDC version 2.3, MongoDB CDC connector are connected to the Flink CDC incremental snapshot framework to implement the incremental snapshot algorithm, thus providing the functions of lock free read, parallel read and breakpoint resume.
So far, the data source of Flink CDC supporting incremental snapshot algorithm has been expanding. In the next version, the community is also planning to connect more connectors to the incremental snapshot framework.
As the MySQL CDC connector that attracts the most attention of users in the community, the community introduced many advanced features in version 2.3, which greatly improved the performance and stability, including:
3.3.1 Support the starting of the positioning point
MySQL CDC Connector supports starting jobs from the specified location. You can specify the specific location of the binlog when the job starts by timestamp, binlog offset, or binlog gtid. You can also set the earliest offset to start the job from the earliest binlog site.
3.3.2 Fragmentation algorithm optimization
Version 2.3 optimizes the full stage slicing algorithm. The current synchronous sharding is changed to asynchronous sharding. Users can specify a column in the primary key as the sharding column, and the sharding process supports checkpoint, which improves the performance problems caused by synchronous sharding blocking during the full read phase.
3.3.3 Stability improvement
MySQL CDC connector supports all character set pairs to receive Flink SQL, unlock more user scenarios, tolerate default values, improve job tolerance for non-standard DDL, and automatically obtain the time zone of the database to solve the time zone problem.
3.3.4 Performance improvement
MySQL CDC version 2.3 focuses on optimizing memory and read performance, and reduces memory usage of JM and TM through meta reuse in JM and streaming read in TM; At the same time, the binlog reading performance is improved by optimizing the binlog parsing logic.
• Flink CDC version 2.3 is compatible with four versions of Flink 1.13, 1.14, 1.15 and 1.16, greatly reducing the upgrade and operation and maintenance costs of users' connectors.
• OceanBase CDC fixes the time zone problem, supports full type docking to Flink SQL, provides more configuration items, and supports more flexible configurations. For example, the newly added table list configuration item supports access to multiple OceanBase data tables.
• MongoDB CDC supports more data types and optimizes the filtering process of capture tables.
• TiDB CDC fixes the problem of data loss during full incremental switching, and supports region switching during read.
• Postgres CDC supports the geometry type, opens more configuration items, and supports the configuration of changelog mode to filter the sent data.
• SqlServer CDC supports more versions and improves the document [4].
• MySQL CDC and OceanBase CDC connectors provide Chinese documents [5] [6], as well as video tutorials for OceanBase CDC connectors [7].
The development of the Flink CDC open source community benefits from the selfless contributions of contributors and the open source sermons of maintainer members, not to mention the positive feedback and propaganda of the vast number of Flink CDC user groups. The Flink CDC community will continue to build an open source community. At present, Flink CDC community is planning for version 2.4 [8], and all users and contributors are welcome to participate in feedback. In the next version 2.4, the community's main direction is planned to expand from the following four aspects:
• Complete data sources
Support more data sources, promote more CDC connectors to access the incremental snapshot framework, and support non lock read, concurrent read, breakpoint resume and other features.
• Improved observability
The function of current limiting is provided to reduce the query pressure on the database in the full volume stage; More abundant monitoring indicators are provided to obtain task progress related indicators to monitor task status.
• Performance improvement
The full volume phase supports the use of the Batch mode to synchronize the full volume phase data and improve the performance of the full volume phase; The idle reader resources are automatically released after the full read phase.
• Improved usability
Improve the usability of connectors, such as simplifying out of the box configuration parameters and providing examples of Datastream API programs.
2. Flink Python
1、 About Flink CDC
Flink CDC [1] is a database based log CDC technology, which realizes the data integration framework of full incremental integrated reading. With Flink's excellent pipeline capability and rich upstream and downstream ecology, Flink CDC can efficiently realize real-time integration of massive data.
As a new generation of real-time data integration framework, Flink CDC has technical advantages such as full incremental integration, lockless reading, parallel reading, automatic synchronization of table structure changes, and distributed architecture. At the same time, the community provides complete document support [2]. In the more than two years of open source Flink CDC, the community has grown rapidly. At present, the Flink CDC community has 76 contributors, 7 maintainers, and more than 7800 community nail users.
2、 Flink CDC 2.3 Overview
With the joint efforts of community users and contributors, Flink CDC 2.3 officially released:
In Version 2.3, 49 community contributors participated in the contribution, 126 issue were solved, 133 PRs were merged, and 170+submissions were made by contributors. From the perspective of code distribution, MySQL CDC, MongoDB CDC, the flash cdc base module and the document module all bring users many features and improvements.
Faced with so many improvements and features, this article will take you 3 minutes to quickly understand the major improvements and core features of Flink CDC version 2.3 through the following figure.
Add Db2 CDC connector, unlock and read Db2 database, and support full and incremental integration synchronization.
Both MongoDB CDC connectors are connected to the incremental snapshot framework, which provides the capabilities of non lock read, concurrent read and breakpoint resume.
MySQL CDC connector has been greatly improved in performance optimization and stability in version 2.3, greatly improving production stability and performance.
Flink CDC version 2.2 is compatible with Flink 1.13 and Flink 1.14. On this basis, Flink CDC 2.3 continues to be compatible with Flink 1.15&1.16, which is compatible with the four major versions of Flink. This means that the SQL connector of CDC can run on different Flink clusters without any modification. If you are a DataStream user, you can also refer to the SQL connector packaging method to achieve cross version compatibility.
The OceanBase CDC connector supports docking all database types to Flink SQL, that is, all types of OceanBase fields support synchronization.
MySQL CDC and OceanBase CDC connectors provide Chinese documents, which can better help Chinese users.
3、 Explain core features and important improvements
Flink CDC version 2.3 has brought many important improvements and features. This article selects the most important four for in-depth interpretation.
3.1 Add Db2 CDC connector
Db2 is a relational database management system developed by IBM [3]. The Db2 CDC connector can capture row level changes of tables in the Db2 database. Its implementation principle is based on the SQL replication capability provided by ASN Capture/Apply agents to save changes of tables with capture mode enabled in the database to the specified change table. The Db2 CDC connector first reads the historical data in the table through JDBC, and then obtains the incremental change data from the change table to achieve full incremental synchronization.
3.2 MongoDB CDC connectors support incremental snapshot algorithm
In Flink CDC version 2.3, MongoDB CDC connector are connected to the Flink CDC incremental snapshot framework to implement the incremental snapshot algorithm, thus providing the functions of lock free read, parallel read and breakpoint resume.
So far, the data source of Flink CDC supporting incremental snapshot algorithm has been expanding. In the next version, the community is also planning to connect more connectors to the incremental snapshot framework.
3.3 MySQL CDC Connector Optimization
As the MySQL CDC connector that attracts the most attention of users in the community, the community introduced many advanced features in version 2.3, which greatly improved the performance and stability, including:
3.3.1 Support the starting of the positioning point
MySQL CDC Connector supports starting jobs from the specified location. You can specify the specific location of the binlog when the job starts by timestamp, binlog offset, or binlog gtid. You can also set the earliest offset to start the job from the earliest binlog site.
3.3.2 Fragmentation algorithm optimization
Version 2.3 optimizes the full stage slicing algorithm. The current synchronous sharding is changed to asynchronous sharding. Users can specify a column in the primary key as the sharding column, and the sharding process supports checkpoint, which improves the performance problems caused by synchronous sharding blocking during the full read phase.
3.3.3 Stability improvement
MySQL CDC connector supports all character set pairs to receive Flink SQL, unlock more user scenarios, tolerate default values, improve job tolerance for non-standard DDL, and automatically obtain the time zone of the database to solve the time zone problem.
3.3.4 Performance improvement
MySQL CDC version 2.3 focuses on optimizing memory and read performance, and reduces memory usage of JM and TM through meta reuse in JM and streaming read in TM; At the same time, the binlog reading performance is improved by optimizing the binlog parsing logic.
3.4 Other improvements
• Flink CDC version 2.3 is compatible with four versions of Flink 1.13, 1.14, 1.15 and 1.16, greatly reducing the upgrade and operation and maintenance costs of users' connectors.
• OceanBase CDC fixes the time zone problem, supports full type docking to Flink SQL, provides more configuration items, and supports more flexible configurations. For example, the newly added table list configuration item supports access to multiple OceanBase data tables.
• MongoDB CDC supports more data types and optimizes the filtering process of capture tables.
• TiDB CDC fixes the problem of data loss during full incremental switching, and supports region switching during read.
• Postgres CDC supports the geometry type, opens more configuration items, and supports the configuration of changelog mode to filter the sent data.
• SqlServer CDC supports more versions and improves the document [4].
• MySQL CDC and OceanBase CDC connectors provide Chinese documents [5] [6], as well as video tutorials for OceanBase CDC connectors [7].
4、 Future planning
The development of the Flink CDC open source community benefits from the selfless contributions of contributors and the open source sermons of maintainer members, not to mention the positive feedback and propaganda of the vast number of Flink CDC user groups. The Flink CDC community will continue to build an open source community. At present, Flink CDC community is planning for version 2.4 [8], and all users and contributors are welcome to participate in feedback. In the next version 2.4, the community's main direction is planned to expand from the following four aspects:
• Complete data sources
Support more data sources, promote more CDC connectors to access the incremental snapshot framework, and support non lock read, concurrent read, breakpoint resume and other features.
• Improved observability
The function of current limiting is provided to reduce the query pressure on the database in the full volume stage; More abundant monitoring indicators are provided to obtain task progress related indicators to monitor task status.
• Performance improvement
The full volume phase supports the use of the Batch mode to synchronize the full volume phase data and improve the performance of the full volume phase; The idle reader resources are automatically released after the full read phase.
• Improved usability
Improve the usability of connectors, such as simplifying out of the box configuration parameters and providing examples of Datastream API programs.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00