Synchronize log data from SLS to AnalyticDB for MySQL in real time - AnalyticDB

You can use the AnalyticDB Pipeline Service (APS) to synchronize data generated in Simple Log Service (SLS) after a specified point in time to an AnalyticDB for MySQL cluster in real time. This lets you perform real-time analysis of log data.

Prerequisites

The AnalyticDB for MySQL cluster, the SLS project, and the Logstore are in the same region. For more information, see Create a cluster and Create a project and a Logstore.
A database account is created for the AnalyticDB for MySQL cluster.
- If you use an Alibaba Cloud account, you need to only create a privileged account.
- If you use a Resource Access Management (RAM) user, you must create a privileged account and a standard account and associate the standard account with the RAM user.
A destination database and table are created in the AnalyticDB for MySQL cluster.
A job-specific resource group is created in the Enterprise Edition, Basic Edition, or Data Lakehouse Edition AnalyticDB for MySQL cluster.

Notes

A single table in an AnalyticDB for MySQL cluster can synchronize data from only one Logstore. To synchronize data from multiple Logstores, you must create multiple tables.

Billing

You are charged for elastic resources on a pay-as-you-go basis. The fees are calculated based on the number of AnalyticDB compute units (ACUs) that are used by the data link. For more information about billing, see Pricing.

Procedure

You can create a sync task in the SLS console or the AnalyticDB for MySQL console. The differences are as follows:

Create a sync task in the SLS console: This method supports importing SLS data only from the same Alibaba Cloud account. You only need to create a data link. The system automatically creates an SLS data source based on the SLS Project and SLS logstore parameters that you specify.
Create a sync task in the AnalyticDB for MySQL console: This method supports importing SLS data from other Alibaba Cloud accounts. You must create an SLS data source and then create a data link based on the data source.

Create a sync task in the SLS console

Step 1: Create a data source and a sync link

Log on to the Simple Log Service console.
In the Project list, click the target project. In the navigation pane on the left, click to go to the Log Storage tab and expand the tabs under the target Logstore.
Click Data Processing > Export. Then, click the + icon next to AnalyticDB.
In the Shipping Note dialog box that appears, select Create in AnalyticDB for MySQL Console.

On the AnalyticDB for MySQL Log Synchronization page, configure the parameters in the Source and Destination Settings, Destination Database and Table Settings, and Synchronization Settings tabs.

The following table describes the parameters in the Source and Destination Settings section.

Parameter	Metric descriptions
Job Name	The name of the data link. The system automatically generates a name based on the data source type and the current time. You can change the name as needed.
Simple Log Service Project	The SLS project.
Simple Log Service Logstore	The SLS Logstore.
Destination AnalyticDB for MySQL Cluster	Select the AnalyticDB for MySQL cluster.
AnalyticDB for MySQL Account	The database account of the AnalyticDB for MySQL cluster.
AnalyticDB for MySQL Password	The password of the database account for the AnalyticDB for MySQL cluster.

The following table describes the parameters in the Destination Database and Table Settings section.

Parameter	Description
Database Name	The name of the database in the AnalyticDB for MySQL cluster.
Table Name	The name of the data table in the AnalyticDB for MySQL cluster.
Source Data Preview	Click View Latest 10 Logstore Data Entries to view the 10 latest data records from the source SLS.
Schema Field Mapping	AnalyticDB for MySQL automatically populates the Destination Table Field and Source Field for the fields in a cluster table. If the mapping between the Destination Table Field and the Source Field is incorrect, modify it manually. For example, if the field name for a data table in an AnalyticDB for MySQL cluster is name, and the source data field name in SLS is user_name, the system automatically fills the Source Field and Destination Table Field with name. In this case, you need to manually change the Source Field to user_name.

The following table describes the parameters in the Synchronization Settings section.

Parameter	Description
Start Offset	When the sync task starts, it consumes SLS data from the specified point in time. For example, if you set Start Offset to 2024-04-09 13:10, the system starts consuming data from the first record after 13:10 on April 9, 2024.
Dirty Data Processing Mode	During data synchronization, if the data type of a field in the destination table does not match the data type of the actual data from the source SLS, the synchronization fails. For example, if the source data is `abc` and the destination field type is `int`, a synchronization error occurs because the data cannot be converted. The following values are valid for the dirty data handling mode: Stop Synchronization (Default): The data synchronization is stopped. You must modify the field type in the destination table or change the dirty data handling mode, and then restart the sync task. Treat as Null: The dirty data is written as a NULL value to the destination table, and the original dirty data is discarded. For example, a row of SLS data has three fields (col1, col2, and col3). If the col2 field is dirty data, the data for the col2 field is converted to a NULL value and written to the table. The data for the col1 and col3 fields is written normally.
Convert Unix Timestamp into Datetime	If a source SLS field is a Unix timestamp (for example, 1710604800) and the destination field type is DATETIME or TIMESTAMP, you must enable this feature for conversion. After you enable this feature, you can select Timestamp Accurate to Seconds, Timestamp Accurate to Milliseconds, or Timestamp Accurate to Microseconds based on the precision of the SLS timestamp data.
Job Resource Group	Select the job-specific resource group to run the incremental sync task. Important This parameter is required only when the AnalyticDB for MySQL cluster is of Enterprise Edition, Basic Edition, or Data Lakehouse Edition.
ACUs for Incremental Synchronization	The initial number of ACUs used to perform the incremental sync task. The value is fixed at 1 ACU. The value can range from 1 to the maximum resources of the job-specific resource group. After the sync link is created, AnalyticDB for MySQL automatically scales the number of ACUs based on the business workload. The number of ACUs can be scaled up to 64 or scaled down to 1. Important This parameter is required only when the AnalyticDB for MySQL cluster is of Enterprise Edition, Basic Edition, or Data Lakehouse Edition.

After you configure the parameters, click Submit.
The system automatically creates an SLS data source and a data link in AnalyticDB for MySQL and redirects you to the Simple Log Service/Kafka Data Synchronization page in the AnalyticDB for MySQL console.

Step 2: Start the sync task

In the Actions column of the target data link, click Start.
After the task starts, click Query in the upper-right corner. If the status changes to Running, the sync task has started.

Step 3: Manage the sync task

You can perform the following operations in the Actions column:

Operation	Description
Start	Starts the data sync job.
View Details	View the details of the data sync job, including the source and destination configurations, run logs, and run monitoring.
Edit	Edit the job's start offset, field mappings, and other settings.
Pause	Pauses the data sync job. You can click Start to resume the synchronization. The synchronization resumes from the offset where it was paused.
Delete	Deletes the data sync job. This operation cannot be undone. Proceed with caution.

Create a sync task in the AnalyticDB for MySQL console

Step 1: (Optional) Configure RAM authorization

Note

If you want to synchronize SLS data only from the current Alibaba Cloud account, you can skip this step and create a data source. For more information, see Step 2: Create a data source.

When you synchronize SLS data from another Alibaba Cloud account to AnalyticDB for MySQL, you must create a RAM role in the source account, grant fine-grained permissions to the RAM role, and modify the trust policy of the RAM role.

Create a RAM role. For more information, see Create a RAM role for a trusted Alibaba Cloud account.
Note
When you configure the Principal Name parameter, select Other Account and enter the ID of the Alibaba Cloud account to which the AnalyticDB for MySQL cluster belongs. You can log on to the Account Center and view the Account ID on the Security Settings page.
Use fine-grained authorization to grant the AliyunAnalyticDBAccessingLogRolePolicy permission to the RAM role.
Modify the trust policy of the RAM role to allow the AnalyticDB for MySQL cluster that belongs to the specified Alibaba Cloud account to assume this RAM role.
```
{
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Effect": "Allow",
      "Principal": {
        "RAM": [
            "acs:ram::<Alibaba Cloud account ID>:root"
        ],
        "Service": [
            "<Alibaba Cloud account ID>@ads.aliyuncs.com"
        ]
      }
    }
  ],
  "Version": "1"
}
```
Note
The Alibaba Cloud account ID is the ID of the Alibaba Cloud account to which the AnalyticDB for MySQL cluster belongs, as specified in Step 1. When you configure the policy, do not include the angle brackets (<>).

Step 2: Create a data source

Note

If you have added an SLS data source, you can skip this step and create a data link. For more information, see Step 3: Create a data link.

Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the navigation pane on the left, choose Data Ingestion > Data Sources.
In the upper-right corner, click Create Data Source.

On the Create Data Source page, configure the parameters. The following table describes the parameters.

Parameter	Description
Data Source Type	Select SLS.
Data Source Name	The system automatically generates a name based on the data source type and the current time. You can change the name as needed.
Data Source Description	The description of the data source, such as the application scenario or business restrictions.
Cloud provider	Only Alibaba Cloud Instance is supported.
Region of Simple Log Service Project	The region where the SLS project resides. Note You can select only the region where the AnalyticDB for MySQL cluster resides.
Across Alibaba Cloud Accounts	An AnalyticDB for MySQL cluster can synchronize SLS data from the same Alibaba Cloud account or from another Alibaba Cloud account (cross-account). No: Synchronize SLS data from the current Alibaba Cloud account to the AnalyticDB for MySQL cluster. Yes: Synchronize SLS data from another Alibaba Cloud account to an AnalyticDB for MySQL cluster. When you select Yes, you must configure RAM authorization and enter the Alibaba Cloud Account and RAM Role. For more information, see Configure RAM authorization. Note Alibaba Cloud Account: The ID of the Alibaba Cloud account to which the SLS project belongs. RAM Role: The RAM role that belongs to the Alibaba Cloud account to which the SLS project belongs. This is the RAM role that you created in Step 1 of Configure RAM authorization.
Simple Log Service Project	The source SLS project.
Simple Log Service Logstore	The source SLS Logstore.

After you configure the parameters, click Create.

Step 3: Create a sync link

In the navigation pane on the left, choose Data Ingestion > Simple Log Service/Kafka Data Synchronization.
In the upper-right corner, click Create Synchronization Job.

On the Create Synchronization Job page, configure the parameters in the Source and Destination Settings, Destination Database and Table Settings, and Synchronization Settings tabs.

The following table describes the parameters in the Source and Destination Settings section.

Parameter	Description
Job Name	The name of the data link. The system automatically generates a name based on the data source type and the current time. You can change the name as needed.
Data Source	Select an existing SLS data source or create a data source.
Destination Type	For Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters, select Data Warehouse - AnalyticDB for MySQL Storage. This parameter is not required for Data Warehouse Edition clusters.
AnalyticDB for MySQL Account	The database account of the AnalyticDB for MySQL cluster.
AnalyticDB for MySQL Password	The password of the database account for the AnalyticDB for MySQL cluster.

The following table describes the parameters in the Destination Database and Table Settings section.

Parameter	Description
Database Name	The name of the database in the AnalyticDB for MySQL cluster.
Table Name	The name of the data table in the AnalyticDB for MySQL cluster.
Source Data Preview	Click View Latest 10 Logstore Data Entries to view the 10 latest data records from the source SLS.
Schema Field Mapping	AnalyticDB for MySQL automatically populates the Destination Table Field and Source Field for the fields in a cluster table. If the mapping between the Destination Table Field and the Source Field is incorrect, modify it manually. For example, if the field name for a data table in an AnalyticDB for MySQL cluster is name, and the source data field name in SLS is user_name, the system automatically fills the Source Field and Destination Table Field with name. In this case, you need to manually change the Source Field to user_name.

The following table describes the parameters in the Synchronization Settings section.

Parameter	Description
Start Offset	When the sync task starts, it consumes SLS data from the specified point in time. For example, if you set Start Offset to 2024-04-09 13:10, the system starts consuming data from the first record after 13:10 on April 9, 2024.
Dirty Data Processing Mode	During data synchronization, if the data type of a field in the destination table does not match the data type of the actual data from the source SLS, the synchronization fails. For example, if the source data is `abc` and the destination field type is `int`, a synchronization error occurs because the data cannot be converted. The following values are valid for the dirty data handling mode: Stop Synchronization (Default): The data synchronization is stopped. You must modify the field type in the destination table or change the dirty data handling mode, and then restart the sync task. Treat as Null: The dirty data is written as a NULL value to the destination table, and the original dirty data is discarded. For example, a row of SLS data has three fields (col1, col2, and col3). If the col2 field is dirty data, the data for the col2 field is converted to a NULL value and written to the table. The data for the col1 and col3 fields is written normally.
Convert Unix Timestamp into Datetime	If a source SLS field is a Unix timestamp (for example, 1710604800) and the destination field type is DATETIME or TIMESTAMP, you must enable this feature for conversion. After you enable this feature, you can select Timestamp Accurate to Seconds, Timestamp Accurate to Milliseconds, or Timestamp Accurate to Microseconds based on the precision of the SLS timestamp data.

After you configure the parameters, click Submit.

Step 4: Start the data sync task

On the data synchronization page, find the sync task that you created and click Start in the Actions column.
If the status changes to Running, the sync task has started.

Step 5: Manage the Data Source

On the data synchronization page, you can perform the following operations in the Actions column.

Operation	Description
Start	Starts the data sync job.
View Details	View the details of the data sync job, including the source and destination configurations, run logs, and run monitoring.
Edit	Edit the job's start offset, field mappings, and other settings.
Pause	Pauses the data sync job. You can click Start to resume the synchronization. The synchronization resumes from the offset where it was paused.
Delete	Deletes the data sync job. This operation cannot be undone. Proceed with caution.