This topic describes the data synchronization process of an advanced application.
Use reindexing to synchronize full data
The preceding figure shows that three steps are involved from the construction to the use of a new application version after reindexing is triggered:
1.OpenSearch pulls full data from a data source to an offline application. In this step, to reduce the workload of the data source, data is pulled at a speed that does not exceed 20 MB/s.
2. After all the data in the data source is synchronized to the offline application, incremental data starts to be synchronized. In this step, the following incremental data is synchronized:
The incremental data that is pushed by using API operations. In addition to the new data or data updates in the data source, a portion of data is synchronized to OpenSearch by using the traffic operations of OpenSearch.
The incremental data that is generated when you build the new application version. This portion of data is synchronized to OpenSearch from the data source or by using API operations.
3. Data is processed in the offline application. For example, wide tables are created or data is labeled in the data processing process. Then, the processed data that contains metadata is synchronized to the engine at a speed that does not exceed 20 MB/s to protect the engine. Take note that the actual data amount may be three or four times that in the data source because the processed data contains metadata.
Speed of synchronizing data from the data source during reindexing (Unit: MB/s)
Raw data of the data source is synchronized.
Speed of synchronizing the incremental data during reindexing (Unit: MB/s)
Speed of synchronizing data from the offline application to the engine during reindexing (Unit: MB/s)
Metadata is added.
Synchronize the real-time incremental data
The preceding figure shows that the incremental data consists of two portions: the data updates in the data source and the data pushed by using API operations. The incremental data is synchronized to OpenSearch in three steps:
1.The data updates in the data source are synchronized or the data is pushed by using API operations to an offline application in OpenSearch. You can subscribe to the binary logs of the data source in Data Transmission Service (DTS) to synchronize the data updates. The total number of transactions per second (TPS) in the primary and secondary tables cannot exceed 1,500.
2. When the incremental data is synchronized to the offline application, the incremental data is updated to an existing wide table. An update in a secondary table triggers N updates in the primary table. If the updates triggered in the primary table are greater than or equal to 1,500 TPS, the speed of updating secondary tables is limited to reduce the data synchronization latency in the primary table. For more information, see Data synchronization latency caused by multi-table joins.
3. The offline application writes the data that contains metadata to the engine. After the metadata is added, the data amount may be three or four times that in the data source. To protect the engine, the speed of writing data is limited to 10 MB/s.
Total number of TPS in the primary and secondary tables when data is synchronized from the data source to the offline application. Unit: TPS. In this case, no trigger relationship is configured between the primary and secondary tables.
Speed of writing the real-time incremental data from the offline application to the engine. Unit: MB/s
Metadata is added.
Updates in the primary table that are triggered by the updates in secondary tables. Unit: TPS.