This topic describes how to use Logtail to collect MySQL binary logs.

Prerequisites

  • Logtail V0.16.0 or later is installed on a Linux server, or Logtail V1.0.0.8 or later is installed on a Windows server. For more information, see Install Logtail on a Linux server or Install Logtail on a Windows server.
    Important You must make sure that the server on which Logtail is installed can connect to the MySQL database from which you want to collect binary logs.
  • The IP address of the server on which Logtail is installed is added to a whitelist of the MySQL database from which you want to collect binary logs.

    For information about how to configure an IP address whitelist for an ApsaraDB RDS for MySQL instance, see Configure an IP address whitelist for an ApsaraDB RDS for MySQL instance.

Implementation

Logtail acts as a secondary MySQL node to communicate with the primary MySQL node. The following list describes the communication process:
  1. Logtail acts as a secondary MySQL node and sends a dump request to the primary MySQL node.
  2. After the primary MySQL node receives the dump request, the node sends binary logs to Logtail in real time.
  3. Logtail performs operations such as event parsing, event filtering, and data parsing on binary logs. Then, Logtail uploads the parsed data to Log Service.
Implementation

Features

  • Binary logs can be incrementally collected. This way, you can collect data related to the update operations that are performed on your databases. MySQL databases such as ApsaraDB RDS for MySQL are supported.
  • Multiple methods are provided to filter data in databases, such as regular expressions.
  • You can specify the positions of binary log files.
  • Checkpoints are used to synchronize data storage status.

Limits

  • Logtail V1.0.31 and later can collect binary logs from MySQL 8.0.
  • The binary logging feature must be enabled for your MySQL database, and the binlog_format parameter must be set to ROW for the database. By default, the feature is enabled for an RDS database.
    # Check whether the binary logging feature is enabled.
    mysql> show variables like "log_bin";
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | log_bin       | ON    |
    +---------------+-------+
    1 row in set (0.02 sec)
    # View the format of binary logs.
    mysql> show variables like "binlog_format";
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | binlog_format | ROW   |
    +---------------+-------+
    1 row in set (0.03 sec)
  • The ID of the secondary MySQL node whose role Logtail assumes must be unique on the primary MySQL node.
  • Limits on RDS databases:
    • Logtail cannot be installed on a server where an RDS instance resides. You must install Logtail on a server that can communicate with the RDS instance.
    • You cannot collect binary logs from a secondary RDS database. You must configure your primary RDS database to collect binary logs.

Scenarios

The MySQL binary logging feature applies to scenarios in which you need to synchronize large amounts of data and require high performance.
  • Query and analyze the incremental data of databases in real time.
  • Audit the operations that are performed on databases.
  • Use Log Service to query and analyze database updates, visualize query and analysis results, transform data for stream computing, export log data to MaxCompute for offline computing, and export log data to Object Storage Service (OSS) for long-term storage.

Usage notes

We recommend that you increase resource limits on Logtail to accommodate traffic surges and prevent data risks. If the limits are exceeded, Logtail may be forced to restart.

You can modify the related parameters in the /usr/local/ilogtail/ilogtail_config.json file. For more information, see Configure the startup parameters of Logtail.

The following example shows how to increase the limit on CPU utilization to two cores and the limit on memory usage to 2,048 MB:
{
    ...
    "cpu_usage_limit":2,
    "mem_usage_limit":2048,

    ...
}

Data reliability

We recommend that you enable the global transaction identifier (GTID) feature on your MySQL server and upgrade Logtail to V0.16.15 or later. This prevents data from being repeatedly collected after a primary/secondary switchover is triggered on your database and ensures data reliability.
  • Incomplete data collection: If the network between Logtail and your MySQL server is disconnected for a long period of time, some data may not be collected.

    If the network between Logtail and your primary MySQL node is disconnected, the primary node still generates binary logs and deletes expired binary logs. After the network connection is re-established and your primary MySQL node and Logtail are reconnected, Logtail uses a checkpoint to request binary log data from the primary MySQL node. However, if the network is disconnected for a long period of time, the data that is generated after the checkpoint may be deleted. In this case, the recovery mechanism is triggered. The recovery mechanism identifies the most recent binary log file position from which Logtail resumes collection on the primary MySQL node. The data that is generated between the checkpoint and the most recent binary log file position is not collected. This leads to incomplete data collection.

  • Repeated data collection: If a primary/secondary switchover is triggered when the sequence numbers of binary logs are inconsistent between your primary MySQL node and secondary MySQL node, binary logs may be repeatedly collected.

    If you configure primary/secondary synchronization for MySQL, the primary node automatically synchronizes the binary logs to the secondary node. The secondary node stores the logs to local binary log files. If the sequence numbers are inconsistent between the primary and secondary nodes and a primary/secondary switchover is triggered, logs may be repeatedly collected. This issue occurs because the checkpoint mechanism identifies checkpoints based on the names and size offsets of binary log files.

    For example, a data block is in the checkpoint range from (binlog.100, 4) to (binlog.105, 4) on the primary MySQL node, and in the checkpoint range from (binlog.1000, 4) to (binlog.1005, 4) on the secondary MySQL node. Logtail has obtained the data from the primary node and updated the local checkpoint to (binlog.105, 4). If a primary/secondary switchover is triggered and no error occurs, Logtail continues to collect binary logs from the new primary node based on the local checkpoint (binlog.105, 4). However, the sequence number of the data that is in the checkpoint range from (binlog.1000, 4) to (binlog.1005, 4) on the new primary node is greater than the sequence number of the data that is requested by Logtail. The new primary node returns all data in the range to Logtail. This leads to repeated data collection.

Create a Logtail configuration

  1. Log on to the Log Service console.
  2. In the Import Data section, select MySQL BinLog - Plug-in.
  3. Select the project and Logstore. Then, click Next.
  4. Create a machine group.
    • If a machine group is available, click Use Existing Machine Groups.
    • If no machine groups are available, perform the following steps to create a machine group. In this example, an Elastic Compute Service (ECS) instance is used.
      1. On the ECS Instances tab, select Manually Select Instances. Then, select the ECS instance that you want to use and click Create.

        For more information, see Install Logtail on ECS instances.

        Important If you want to collect logs from an ECS instance that belongs to a different Alibaba Cloud account, a server in an on-premises data center, or a server of a third-party cloud service provider, you must manually install Logtail. For more information, see Install Logtail on a Linux server or Install Logtail on a Windows server.

        After you manually install Logtail, you must configure a user identifier for the server. For more information, see Configure a user identifier.

      2. After Logtail is installed, click Complete Installation.
      3. In the Create Machine Group step, configure the Name parameter and click Next.

        Log Service allows you to create IP address-based machine groups and custom identifier-based machine groups. For more information, see Create an IP address-based machine group and Create a custom identifier-based machine group.

  5. Select the new machine group from Source Server Groups and move the machine group to Applied Server Groups. Then, click Next.
    Important If you apply a machine group immediately after you create the machine group, the heartbeat status of the machine group may be FAIL. This issue occurs because the machine group is not connected to Log Service. To resolve this issue, you can click Automatic Retry. If the issue persists, see What do I do if no heartbeat connections are detected on Logtail?
  6. Specify a data source and click Next.
    You can specify a data source by using form configuration or editor configuration in JSON. For more information, see Details of the Logtail configuration.
  7. Preview data, configure indexes, and then click Next.
    By default, full-text indexing is enabled for Log Service. You can also configure field indexes based on collected logs in manual mode or automatic mode. To configure field indexes in automatic mode, click Automatic Index Generation. This way, Log Service automatically creates field indexes. For more information, see Create indexes.
    Important If you want to query and analyze logs, you must enable full-text indexing or field indexing. If you enable both full-text indexing and field indexing, the system uses only field indexes.
  8. Click Log Query. You are redirected to the query and analysis page of your Logstore.
    You must wait approximately 1 minute for the indexes to take effect. Then, you can view the collected logs on the Raw Logs tab. For more information, see Query and analyze logs.

Details of the Logtail configuration

You can specify a data source by using form configuration or editor configuration in JSON.

Form configuration

In the Specify Data Source step, configure the following parameters.

ParameterDescription
Config NameThe name of the Logtail configuration.
Database HostThe address of the host on which the database resides.
Database PortThe port number of the database.
Database UsernameThe username of the account that is used to log on to the database.
Make sure that the account has the read permissions on the database and the MySQL REPLICATION permission. Example:
CREATE USER canal IDENTIFIED BY 'canal';  
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;
Database PasswordThe password of the account that is used to log on to the database.
If you have high requirements for data security, we recommend that you set the username and password to xxx. After your configurations are synchronized to the Logtail server, find the two parameters in the /usr/local/ilogtail/user_log_config.json file and change the values based on your business requirements. For more information, see Modify the configurations on the server where Logtail is installed.
Important If you modify this parameter in the Log Service console, the parameter setting in the Logtail configuration on the Logtail server is overwritten after the modification is synchronized to the server.
ServerIDThe ID of the secondary MySQL node whose role Logtail assumes.
Important The value of the ServerID parameter must be unique for your database. Otherwise, data fails to be collected.
Included TablesThe names of tables from which data is collected. When you specify an exact name, you must include the name of the database to which a table belongs and the name of the table. Example: test_db.test_table. You can also specify a regular expression for the parameter.
  • If you want to implement exact match, add ^ to the start of the regular expression and append $ to the end. Example: ^test_db\\.test_table$.
  • If you want to collect data from all tables, specify .*\\..*.
  • If a table does not meet the conditions that are specified by the Included Tables parameter, the data in the table is not collected.
Ignored TablesThe names of tables from which data is not collected. When you specify an exact name, you must include the name of the database to which a table belongs and the name of the table. Example: test_db.test_table. You can also specify a regular expression for the parameter.
  • If you want to implement exact match, add ^ to the start of the regular expression and append $ to the end. Example: ^test_db\\.test_table$.
  • If a table meets one of the conditions that are specified by the Ignored Tables parameter, the data in the table is not collected.
Binary Log File Name of First CollectionThe name of the binary log file from which data is collected for the first time. If you do not configure this parameter, Logtail starts to collect data from the current time.
If you want Logtail to collect data from a specified position, set the Binary Log File Name of First Collection parameter to the name of the binary log file from which you want Logtail to collect data and set the Binary Log File Offset of First Collection parameter to the offset of the file. Example:
# Set the Binary Log File Name of First Collection parameter to mysql-bin.000063 and set the Binary Log File Offset of First Collection parameter to 0. 
mysql> show binary logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000063 |       241 |
| mysql-bin.000064 |       241 |
| mysql-bin.000065 |       241 |
| mysql-bin.000066 |     10778 |
+------------------+-----------+
4 rows in set (0.02 sec)
Note

If you specify a value for the Binary Log File Name of First Collection parameter, a large volume of traffic is generated the first time data is collected from the file.

Binary Log File Offset of First CollectionThe offset of the binary log file from which data is collected for the first time.
Add GTIDSpecifies whether to add GTIDs to the data that is uploaded to Log Service. For more information, see GTIDs.
Collect INSERT EventsSpecifies whether to collect the data on INSERT events.
Collect UPDATE EventsSpecifies whether to collect the data on UPDATE events.
Collect DELETE EventsSpecifies whether to collect the data on DELETE events.
Collect DDL EventsSpecifies whether to collect the data on data definition language (DDL) events.
Note If you select Collect DDL Events, the values of the Included Tables and Ignored Tables parameters that are used for data filtering do not take effect.
Encoding MethodThe encoding format of the data.
Convert Text to StringSpecifies whether to convert the data of the text type to the string type.
Compress Events into JSONSpecifies whether to pack event data in the JSON format. If you select Compress Events into JSON, Logtail packs event data into the data and old_data fields in the JSON format. The old_data field is available only for ROW_UPDATE events.

For example, a table contains three columns named c1, c2, and c3. If you clear Compress Events into JSON, the data on ROW_INSERT events contains the c1, c2, and c3 fields. If you select Compress Events into JSON, Logtail packs all data in the c1, c2, and c3 columns into the data field whose values are in the {"c1":"...", "c2": "...", "c3": "..."} format.

Important This parameter is available only for Logtail V0.16.19 and later.
Collect Event MetadataSpecifies whether to collect the metadata of events. The metadata of binary log events includes event_time, event_log_position, event_size, and event_server_id.
Important This parameter is available only for Logtail V0.16.21 and later.
Data TransformationThe configuration that is used to parse data. You can extract fields, extract log time, mask data, and filter logs. This parameter is optional. You can specify one or more data processing methods. For more information, see Use Logtail plug-ins to process data.

Editor configuration in JSON

In the Plug-in Configuration field, specify information about your Logtail configuration.
  • inputs is required and is used to configure the data source settings for the Logtail configuration.
    Important You can specify only one type of data source in inputs.
  • processors is optional and is used to configure the data processing settings for the Logtail configuration to parse data. For example, you can extract fields, extract log time, mask data, and filter logs. You can specify one or more processing methods. For more information, see Overview.
{
 "inputs": [
     {
         "type": "service_canal",
         "detail": {
             "Host": "************.mysql.rds.aliyuncs.com",
             "Port": 3306,
             "User" : "user1",
             "ServerID" : 56321,
             "Password": "*******",
             "IncludeTables": [
                 "user_info\\..*"
             ],
             "ExcludeTables": [
                 ".*\\.\\S+_inner"
             ],
             "TextToString" : true,
             "EnableDDL" : true
         }
     }
 ]
}
ParameterTypeRequiredDescription
typestringYesThe type of the data source. Set the value to service_canal.
HoststringNoThe address of the host on which the database resides. Default value: 127.0.0.1.
PortintNoThe port number of the database. Default value: 3306.
UserstringNoThe username of the account that is used to log on to the database. Default value: root.
Make sure that the account has the read permissions on the database and the MySQL REPLICATION permission. Example:
CREATE USER canal IDENTIFIED BY 'canal';  
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;
PasswordstringNoThe password of the account that is used to log on to the database. This parameter is empty by default.
If you have high requirements for data security, we recommend that you set the username and password to xxx. After your configurations are synchronized to the Logtail server, find the two parameters in the /usr/local/ilogtail/user_log_config.json file and change the values based on your business requirements. For more information, see Modify the configurations on the server where Logtail is installed.
Important If you modify this parameter in the Log Service console, the parameter setting in the Logtail configuration on the Logtail server is overwritten after the modification is synchronized to the server.
ServerIDintNoThe ID of the secondary MySQL node whose role Logtail assumes. Default value: 125.
Important The value of the ServerID parameter must be unique for your database. Otherwise, data fails to be collected.
IncludeTablesstring arrayYesThe names of tables from which data is collected. When you specify an exact name, you must include the name of the database to which a table belongs and the name of the table. Example: test_db.test_table. You can also specify a regular expression for the parameter.
  • If you want to implement exact match, add ^ to the start of the regular expression and append $ to the end. Example: ^test_db\\.test_table$.
  • If you want to collect data from all tables, specify .*\\..*.
  • If a table does not meet the conditions that are specified by the IncludeTables parameter, the data in the table is not collected.
ExcludeTablesstring arrayNoThe names of tables from which data is not collected. When you specify an exact name, you must include the name of the database to which a table belongs and the name of the table. Example: test_db.test_table. You can also specify a regular expression for the parameter.
  • If you want to implement exact match, add ^ to the start of the regular expression and append $ to the end. Example: ^test_db\\.test_table$.
  • If a table meets one of the conditions that are specified by the ExcludeTables parameter, the data in the table is not collected.
StartBinNamestringNoThe name of the binary log file from which data is collected for the first time. By default, Logtail starts to collect data from the current time.
If you want Logtail to collect data from a specified position, set the StartBinName parameter to the name of the binary log file from which you want Logtail to collect data and set the StartBinlogPos parameter to the offset of the file. Example:
# Set the StartBinName parameter to mysql-bin.000063 and set the StartBinlogPos parameter to 0.
mysql> show binary logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000063 |       241 |
| mysql-bin.000064 |       241 |
| mysql-bin.000065 |       241 |
| mysql-bin.000066 |     10778 |
+------------------+-----------+
4 rows in set (0.02 sec)
Note

If you specify a value for the StartBinName parameter, a large volume of traffic is generated the first time data is collected from the file.

StartBinlogPosintNoThe offset of the binary log file from which data is collected for the first time. Default value: 0.
EnableGTIDboolNoSpecifies whether to add GTIDs to the data that is uploaded to Log Service. For more information, see GTIDs. Valid values:
  • true (default)
  • false
EnableInsertboolNoSpecifies whether to collect the data on INSERT events. Valid values:
  • true (default)
  • false
EnableUpdateboolNoSpecifies whether to collect the data on UPDATE events. Valid values:
  • true (default)
  • false
EnableDeleteboolNoSpecifies whether to collect the data of DELETE events. Valid values:
  • true (default)
  • false
EnableDDLboolNoSpecifies whether to collect the data on DDL events. Valid values:
  • true
  • false (default)
Note If you set the EnableDDL parameter to true, the values of the IncludeTables and ExcludeTables parameters that are used for data filtering do not take effect.
CharsetstringNoThe encoding format of the data. Default value: utf8.
TextToStringboolNoSpecifies whether to convert the data of the text type to the string type. Valid values:
  • true
  • false (default)
PackValuesboolNoSpecifies whether to pack event data in the JSON format. If you set the PackValues parameter to true, Logtail packs event data into the data and old_data fields in the JSON format. The old_data field is available only for ROW_UPDATE events. Valid values:
  • true
  • false (default)

For example, a table contains three columns named c1, c2, and c3. If you set the PackValues parameter to false, the data on ROW_INSERT events contains the c1, c2, and c3 fields. If you set the PackValues parameter to true, Logtail packs all data in the c1, c2, and c3 columns into the data field whose values are in the {"c1":"...", "c2": "...", "c3": "..."} format.

Important This parameter is available only for Logtail V0.16.19 and later.
EnableEventMetaboolNoSpecifies whether to collect the metadata of events. The metadata of binary log events includes event_time, event_log_position, event_size, and event_server_id. Valid values:
  • true
  • false (default)
Important This parameter is available only for Logtail V0.16.21 and later.

Modify the configurations on the server where Logtail is installed

If you did not enter real information for parameters such as Host, User, and Password in the Plug-in Config field when you created the Logtail configuration, you can modify the parameters after the Logtail configuration is delivered to the Logtail server.

  1. Log on to the server where Logtail is installed.
  2. Find the service_canal keyword in the /usr/local/ilogtail/user_log_config.json file and modify parameters such as Host, User, and Password.
  3. Run the following command to restart Logtail:
    sudo /etc/init.d/ilogtaild stop; sudo /etc/init.d/ilogtaild start

Troubleshooting

If no data is displayed on the preview page or query page after logs are collected by using Logtail, you can troubleshoot the errors based on the instructions that are provided in What do I do if errors occur when I use Logtail to collect logs?

Sample database table and sample logs

For example, after you perform the INSERT, UPDATE, and DELETE operations on the specialalarm table of the user_info database, Logtail collects and sends binary logs to Log Service. The following section describes the schema of the table, the operations that are performed on the table, and the collected logs:

  • Sample table schema
    CREATE TABLE `specialalarm` (
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
    `time` datetime NOT NULL,
    `alarmtype` varchar(64) NOT NULL,
    `ip` varchar(16) NOT NULL,
    `count` int(11) unsigned NOT NULL,
    PRIMARY KEY (`id`),
    KEY `time` (`time`) USING BTREE,
    KEY `alarmtype` (`alarmtype`) USING BTREE
    ) ENGINE=MyISAM AUTO_INCREMENT=1;
  • Database-related operations
    Perform the INSERT, DELETE, and UPDATE operations.
    insert into specialalarm (`time`, `alarmType`, `ip`, `count`) values(now(), "NO_ALARM", "10.10.**.***", 55);
    delete from specialalarm where id = 4829235  ;
    update specialalarm set ip = "10.11.***.**" where id = "4829234";
    Create an index for zc.specialalarm.
    ALTER TABLE `zc`.`specialalarm` 
    ADD INDEX `time_index` (`time` ASC);
  • Sample logs
    You can view the logs that are collected for each operation on the Search & Analysis page of the Logstore that is specified in the Logtail configuration. Examples:
    • INSERT statement
      __source__:  10.30.**.**  
      __tag__:__hostname__:  iZbp145dd9fccu*****  
      __topic__:    
      _db_:  zc  
      _event_:  row_insert  
      _gtid_:  7d2ea78d-b631-11e7-8afb-00163e0eef52:536  
      _host_:  *********.mysql.rds.aliyuncs.com  
      _id_:  113  
      _table_:  specialalarm  
      alarmtype:  NO_ALARM  
      count:  55  
      id:  4829235  
      ip:  10.10.***.***  
      time:  2017-11-01 12:31:41
    • DELETE statement
      __source__:  10.30.**.**  
      __tag__:__hostname__:  iZbp145dd9fccu****
      __topic__:    
      _db_:  zc  
      _event_:  row_delete  
      _gtid_:  7d2ea78d-b631-11e7-8afb-00163e0eef52:537  
      _host_:  *********.mysql.rds.aliyuncs.com  
      _id_:  114  
      _table_:  specialalarm  
      alarmtype:  NO_ALARM  
      count:  55  
      id:  4829235  
      ip:  10.10.**.***
      time:  2017-11-01 12:31:41
    • UPDATE statement
      __source__:  10.30.**.**  
      __tag__:__hostname__:  iZbp145dd9fccu****  
      __topic__:    
      _db_:  zc  
      _event_:  row_update  
      _gtid_:  7d2ea78d-b631-11e7-8afb-00163e0eef52:538  
      _host_:  *********.mysql.rds.aliyuncs.com  
      _id_:  115  
      _old_alarmtype:  NO_ALARM  
      _old_count:  55  
      _old_id:  4829234  
      _old_ip:  10.10.22.133  
      _old_time:  2017-10-31 12:04:54  
      _table_:  specialalarm  
      alarmtype:  NO_ALARM  
      count:  55  
      id:  4829234  
      ip:  10.11.***.***
      time:  2017-10-31 12:04:54
    • DDL statement
      __source__:  10.30.**.**  
      __tag__:__hostname__:  iZbp145dd9fccu****  
      __topic__:    
      _db_:  zc  
      _event_:  row_update  
      _gtid_:  7d2ea78d-b631-11e7-8afb-00163e0eef52:539  
      _host_:  *********.mysql.rds.aliyuncs.com  
      ErrorCode:  0
      ExecutionTime:  0
      Query:  ALTER TABLE `zc`.`specialalarm` 
      ADD INDEX `time_index` (`time` ASC)
      StatusVars:
    FieldDescription
    _host_The hostname of the database.
    _db_The name of the database.
    _table_The name of the table.
    _event_The type of the event.
    _id_The auto-increment ID. IDs start from 0 and increment by 1 each time the data of a binary log event is collected.
    _gtid_The GTID.
    _filename_The name of the binary log file.
    _offset_The size offset of the binary log file. The value is updated only when a COMMIT operation is performed.