Connect a PolarDB for MySQL cluster to OpenSearch as a data source to automatically sync data into your search index. OpenSearch pulls full data on demand or on a schedule, and keeps the index up to date with incremental changes via binary logging.
Prerequisites
Before you begin, make sure you have:
A PolarDB for MySQL cluster running version 5.6, 5.7, or 8.0
The cluster in the same region as your OpenSearch application
The cluster created under the same Alibaba Cloud account you use to log in to the OpenSearch console (OpenSearch cannot access clusters owned by other accounts)
Binary logging enabled on the cluster (disabled by default — see Enable binary logging below)
Enable binary logging
Set the following parameters in your PolarDB for MySQL cluster before adding it as a data source:
| Parameter | Required value | Default | Notes |
|---|---|---|---|
loose_polar_log_bin | ON_WITH_GTID | OFF | Enables binary logging with GTID mode |
binlog_row_image | FULL | FULL | Already set correctly by default; verify it has not been changed |
To change these parameters, go to the Parameter Configuration page in the PolarDB console.
If binary logging is not enabled, reindexing may stall. If that happens, create a test table in the source database and write or update one or two rows per minute to generate continuous binary logs until reindexing completes.
Set up the database account
The database account used to connect the data source needs permission to read all tables in the source database via Data Transmission Service (DTS). Verify that:
The account can run
SHOW CREATE TABLE *.*successfully.The account has
SELECTpermissions on all tables you want to sync.
Do not change the database account or its permissions after the data source is configured. Account changes break real-time synchronization and may prevent new application versions from being created. If you must change the account credentials, remove the data source and re-add it with the new credentials.
Add OpenSearch IP addresses to the cluster whitelist
OpenSearch accesses your PolarDB for MySQL cluster from a fixed set of IP addresses. Add the CIDR blocks for your region to the cluster's IP address whitelist.
| Region | CIDR blocks |
|---|---|
| China (Hangzhou) | 100.104.190.128/26, 100.104.241.128/26 |
| China (Beijing) | 100.104.16.192/26, 100.104.179.0/26 |
| China (Shanghai) | 100.104.37.0/26, 100.104.46.0/26 |
| China (Shenzhen) | 100.104.87.192/26, 1100.104.132.192/26 |
| China (Qingdao) | 100.104.240.128/26, 1100.104.111.128/26 |
| China (Zhangjiakou) | 100.104.155.192/26, 100.104.238.64/26 |
| Germany (Frankfurt) | 100.104.127.0/26, 100.104.35.192/26 |
| US (Silicon Valley) and US (Virginia) | 100.104.193.128/26, 100.104.119.128/26 |
| Singapore | 100.104.58.192/26, 100.104.74.192/26 |
No traffic fee is charged when OpenSearch accesses PolarDB for MySQL data, regardless of whether you use the internal or public endpoint. Switch between endpoints at any time.
Connect a PolarDB for MySQL data source
Configure a PolarDB for MySQL data source when you create a new application. To add a data source to an existing application, use offline changes.
In the Select Data Source panel, select PolarDB and click Connect to Database.
Fill in the connection parameters and click Connect. If the connection fails, OpenSearch displays an error:
Parameter Description Cluster ID The ID of your PolarDB for MySQL cluster. Find it in the PolarDB console. The ID is case-sensitive. Example: pc-uf6c056ny9tiaj1l7Database Name The name of the database to connect. Not case-sensitive. Username The database account for reading table schemas and full data. Case-sensitive. Password The password for the database account. Must not contain the percent sign ( %).Error Resolution The PolarDB for MySQL cluster does not reside in the same region as the OpenSearch application. Verify the cluster ID and confirm both the cluster and application are in the same region. If the region matches but the error persists, submit a ticket. The connection to a PolarDB for MySQL data source failed. Check that the cluster ID, database name, username, and password are all correct. The table does not exist in the current PolarDB for MySQL cluster. Verify the table name and confirm the table exists in the cluster. The configurations of the PolarDB for MySQL cluster are invalid. Go to the parameter configuration page for this cluster in the PolarDB console and correct the settings. Select the table to sync and click OK.
Table names are case-sensitive.
Use an asterisk (
*) as a wildcard to match multiple tables with the same name prefix. For example,table_*matchestable_a,table_b, and similar tables. All matched tables must share the same schema.
Configure field mappings. OpenSearch automatically loads source fields from the table. Map them to destination fields in your application. For details on transforming field values during sync, see Data processing plug-ins.
Specify filter conditions to control which rows are synced, then click Submit to save the application schema. For filter syntax details, see Configure conditions to filter fields in ApsaraDB RDS and PolarDB data sources.
Filter conditions are not supported for standard applications.
How data sync works
Understanding how OpenSearch processes each row helps you predict index behavior and avoid data inconsistencies.
Full data sync
OpenSearch pulls all rows from the primary database. For each row:
If the row does not exist in the index, OpenSearch creates a new document.
If the row already exists in the index, OpenSearch replaces the existing document.
Rows that do not meet filter conditions are excluded from the index. Rows whose primary key values match existing documents in the destination table are also excluded.
After full data sync completes, OpenSearch switches to incremental sync using binary logs to apply subsequent changes in near real time.
Incremental sync
OpenSearch captures row-level changes via binary logs. If no incremental data is generated at the source for 15 days or more, data synchronization may fail. Recover by triggering a manual offline change. For details, see Perform offline changes.
Supported features
Scheduled or on-demand full data sync: Pull all data in the source database or table manually, or on an automatic schedule.
Multi-table row merging: Merge rows from multiple tables across one or more data sources into a single table in your application. Source tables must share the same schema and the same data processing plug-in configuration. Duplicate primary key values can be overwritten.
Field value transformation: Use data processing plug-ins to convert source field values before indexing.
Filter conditions: Apply filter conditions to include only matching rows from full or incremental data.
Wildcard table matching: Use
*to match tables with a common name prefix.Cloned instances: PolarDB for MySQL clusters support cloned instances as a data source.
Read/write splitting: PolarDB for MySQL clusters support read/write splitting.
Limitations
| Limitation | Details |
|---|---|
| Supported versions | PolarDB for MySQL 5.6, 5.7, and 8.0 only |
| Same-account requirement | The PolarDB for MySQL cluster must be created by the same Alibaba Cloud account used to access the OpenSearch console |
| Same-region requirement | The cluster and the OpenSearch application must be in the same region |
| Standard applications | Cannot use OpenSearch SDKs or API operations to push incremental data. Filter conditions are not supported. |
| Unsupported SQL statements | REPLACE INTO, TRUNCATE, and DROP are not supported. Use DELETE to remove data. |
| Password restriction | The password cannot contain the percent sign (%). A password with % causes reindexing to fail with the error "Illegal hex characters in escape (%) pattern". |
| Cross-schema field concatenation | Cannot concatenate field values from source tables that use different schemas |
| Parameter recommendations | Set both loose_max_statement_time and connect_timeout to 0 before reindexing or performing offline changes. Adjust the values based on your workload after full data sync completes. |
Usage notes
Data types
Fields of type
DATETIMEorTIMESTAMPin the source table are converted to milliseconds. Set the corresponding destination field type toTIMESTAMPin your application.When filtering rows by a
DATEorDATETIMEfield, use the full format:createtime>'2018-03-01 00:00:00'. A short format such as2018-3-1 00:00:00'returns an error.
Multiple data sources
Configure multiple data sources for a single application table, provided the source tables share the same schema and data processing plug-in configuration.
If an ApsaraDB RDS for MySQL or PolarDB data source is attached to a PolarDB-X instance, specify the name of the database shard actually used on the PolarDB-X instance. After database sharding, each logical database on PolarDB-X has one shadow database and eight shards. Data is written to shards at random.
Full data import timing
OpenSearch pulls full data from the primary database only. Schedule full data imports and index rebuilds during off-peak hours.
SSL encryption
If SSL encryption is enabled on your PolarDB for MySQL cluster, keep SSL certificates valid. Expired certificates cause connection failures.
Troubleshooting
Reindexing stalls after the data source is configured
Binary logs may not be generating continuously. Create a test table in the source database and write or update one or two rows per minute until reindexing completes. This ensures a steady stream of binary log events.
Reindexing fails due to overdue payment
If an advanced application for which a PolarDB for MySQL data source is configured has overdue payments, pay the outstanding balance, then manually trigger reindexing from the OpenSearch console.
Duplicate primary keys when syncing from multiple tables
When you sync from multiple source tables with the same schema, duplicate primary key values across tables cause documents to overwrite each other. To create unique keys, use the StringCatenateExtractor plug-in to concatenate the original primary key with the table name.
Set the source fields to pk (replace with your actual primary key field name) and $table (a system variable that resolves to the table name), and use a hyphen (-) or another custom character as the separator.
For example, if the table is my_table_0 and the primary key value is 123456, the concatenated key becomes 123456-my_table_0.