The Simple Log Service (SLS) sync link synchronizes data in real time from an SLS Logstore to an AnalyticDB for MySQL cluster, starting from a specific time offset. This feature supports near real-time data output, full historical data archiving, and elastic analytics. This topic describes how to add an SLS data source, create an SLS sync link, start the sync task, analyze the synchronized data, and manage the data source.
Prerequisites
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.
A job resource group is created for the AnalyticDB for MySQL cluster.
A database account is created for the AnalyticDB for MySQL cluster.
If you use an Alibaba Cloud account, you need to only create a privileged account.
If you use a Resource Access Management (RAM) user, you must create a privileged account and a standard account and associate the standard account with the RAM user.
You have activated Simple Log Service and created a project and a Logstore in the same region as the AnalyticDB for MySQL cluster. For more information, see Collect and analyze ECS text logs using LoongCollector.
Notes
Currently, a table in an AnalyticDB for MySQL cluster can be synchronized with only one Logstore from Simple Log Service.
After data is ingested into the data lake, you must run a commit operation to make the written data visible. To prevent small commit intervals from affecting job stability and read/write performance, the default commit interval for the AnalyticDB for MySQL data synchronization feature is 5 minutes. Therefore, after you create and start a data synchronization task for the first time, you must wait at least 5 minutes before you can view the first batch of written data.
Billing
Using the AnalyticDB for MySQL data migration feature to migrate data to OSS incurs the following fees.
Elastic ACU resource fees for AnalyticDB for MySQL. For more information about billing items, see Billing items of Data Lakehouse Edition and Billing items of Enterprise Edition and Basic Edition.
Storage fees, GET request fees, and fees for PUT and other requests for OSS. For more information about billing items, see Billing overview.
Procedure
Step 1 (Optional): Configure RAM authorization.
Step 2: Create a data source.
Step 3: Create a sync link.
Step 4: Start the data synchronization task.
Step 5: Analyze data.
Step 6 (Optional): Manage the data source.
Configure RAM authorization
If you synchronize SLS data to AnalyticDB for MySQL across different Alibaba Cloud accounts, you must create a RAM role on the source account, grant specific permissions to the RAM role, and modify the trust policy of the RAM role. If you are synchronizing SLS data only within the same account, you can skip this step and proceed to create a data source. For more information, see Create a data source.
Create a RAM role. For more information, see Create a RAM role for an Alibaba Cloud account.
NoteWhen you configure the Select Trusted Cloud Account parameter, select Other Cloud Account and enter the ID of the Alibaba Cloud account that owns the AnalyticDB for MySQL cluster. You can log on to the Account Center to view the Account ID on the Overview page.
Grant the AliyunAnalyticDBAccessingLogRolePolicy permission to the RAM role. For more information, see Grant precise permissions to a RAM role.
Modify the trust policy of the RAM role. For more information, see Modify the trust policy of a RAM role.
{ "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "RAM": [ "acs:ram::<Alibaba Cloud account ID>:root" ], "Service": [ "<Alibaba Cloud account ID>@ads.aliyuncs.com" ] } } ], "Version": "1" }NoteThe Alibaba Cloud account ID is the one you entered in Step 1. Do not include the angle brackets (<>).
Create a data source
If you want to synchronize data using an existing data source management task, you can skip this step and directly create a sync link. For more information, see Create a sync link.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Data Ingestion > Data Sources.
In the upper-right corner of the page, click Create Data Source.
On the Create Data Source page, configure the parameters. The following table describes the parameters.
Parameter
Description
Data Source Type
Select SLS.
Data Source Name
The system generates a name based on the data source type and the current time. You can change it as needed.
Data Source Description
A description for the data source, such as the Data Lakehouse scenario or business limitations.
Deployment Mode
Currently, only Alibaba Cloud Instance is supported.
Region of Simple Log Service Project
The region where the SLS project is located.
Across Alibaba Cloud Accounts
The SLS data source supports synchronizing SLS data to AnalyticDB for MySQL across different Alibaba Cloud accounts.
No: Synchronize SLS data within the current account to AnalyticDB for MySQL.
Yes: Synchronize SLS data from another account to AnalyticDB for MySQL. When you select cross-account synchronization, you must specify Cross-Alibaba Cloud Account and Cross-Alibaba Cloud Account Role Name.
NoteCross-Alibaba Cloud Account: The Alibaba Cloud account ID of the source.
Cross-Alibaba Cloud Account Role Name: The name of the RAM role created on the source. This is the RAM role created in Step 1.
Simple Log Service Project
The source SLS project.
ImportantThe SLS Project list displays all projects under the Alibaba Cloud account and its RAM users. If you select a project of the Alibaba Cloud account, make sure the RAM user has permissions for that project. Otherwise, data cannot be synchronized to AnalyticDB for MySQL.
Simple Log Service Logstore
The source SLS Logstore.
After you configure the parameters, click Create.
Create a sync link
In the left-side navigation pane, click Simple Log Service/Kafka Data Synchronization.
In the upper-right corner of the page, click Create Synchronization Job.
On the Create Synchronization Job page, configure the parameters in the Source and Destination Settings, Destination Database and Table Settings, and Synchronization Settings sections.
The following table describes the parameters for Data Source and Destination Configuration.
Parameter
Description
Job Name
The name of the data link. The system generates a name based on the data source type and the current time. You can change it as needed.
Data Source
Select an existing SLS data source or create a new one.
Destination Type
The following options are supported:
Data Lake - User OSS.
Data Lake - ADB Lake Storage (Recommended).
ImportantWhen you select Data Lake-ADB Lake Storage, you must enable the lake storage feature.
ADB Lake Storage
The name of the lake storage where the AnalyticDB for MySQL lake data resides.
Select the destination lake storage from the drop-down list. If no lake storage has been created, click Automatic Creation in the drop-down list to automatically create one.
ImportantThis parameter is required when Destination Type is set to Data Lake - ADB Lake Storage.
OSS Path
The storage path in OSS for the AnalyticDB for MySQL Data Lakehouse data.
ImportantAll buckets that reside in the same region as the AnalyticDB for MySQL cluster are displayed. Configure this parameter based on your business requirements. After you configure this parameter, it cannot be modified.
We recommend that you select an empty directory that does not have nested relationships with the directories of other data synchronization jobs. This prevents historical data from being overwritten. For example, assume that the involved OSS storage paths of two data synchronization jobs are oss://adb_demo/test/sls1/ and oss://adb_demo/test/. In this case, data overwriting occurs during data synchronization because these two paths have nested relationships with each other.
Storage Format
The data storage format. The following options are supported:
PAIMON.
ImportantThis format is supported only when Destination Type is set to Data Lake - User OSS.
ICEBERG.
The following table describes the parameters for Destination Database and Table Settings.
Parameter
Description
Database Name
The name of the database in AnalyticDB for MySQL to which data will be synchronized. If a database with the same name does not exist, a new one is created. If a database with the same name already exists, data is synchronized to the existing database. For more information about naming conventions, see Limits.
ImportantIn the Source and Destination Settings section, if Storage Format is set to PAIMON, an existing database must meet the following conditions. Otherwise, the data synchronization task will fail.
It must be an external database. The statement to create the database must be
CREATE EXTERNAL DATABASE <DatabaseName>.The `DBPROPERTIES` parameter of the database creation statement must include the
catalogproperty, and thecatalogvalue must bepaimon.The DBPROPERTIES parameter in the database creation statement must include the
adb.paimon.warehouseproperty. For example:adb.paimon.warehouse=oss://testBucketName/aps/data.The DBPROPERTIES parameter in the database creation statement must include the
LOCATIONproperty. You must also add.dbto the end of the database name. Otherwise, XIHE queries will fail. For example:LOCATION=oss://testBucketName/aps/data/kafka_paimon_external_db.db/.The bucket folder in the OSS path specified for
LOCATIONmust exist. Otherwise, the database creation will fail.
Table Name
The name of the table in AnalyticDB for MySQL to which data will be synchronized. If a table with the same name does not exist in the database, a new one is created. If a table with the same name already exists, the data synchronization will fail. For more information about naming conventions, see Limits.
Schema Field Mapping
By default, fields are retrieved from the delivery task configuration of Simple Log Service. If no delivery task is configured for the LogStore, fields are retrieved from the most recent log data by default.
Supported data types: BOOLEAN, INT, BIGINT, FLOAT, DOUBLE, and STRING.
Synchronization of SLS reserved fields is supported. For more information, see Reserved fields.
ImportantYou cannot modify the destination field names.
If the task has been started (including running and completed states), you cannot modify existing column information, but you can add new columns. If the task has been created but not started, you can modify it as needed.
Partition Key Settings
The partition field settings for the destination table. To ensure that data is ingested and queries are executed as expected, we recommend that you configure partitions based on the log time or business logic. If you do not configure partitions, no partitions exist in the destination table.
Valid values for the Format Processing Method parameter:
Formatted Time: Select a datetime field from the Source Partition Field drop-down list, set the Format Processing Method parameter to Formatted Time, and then configure the Source Field Format and Destination Partition Format parameters. AnalyticDB for MySQL identifies the value of the partition field based on the specified source field format and converts the value into the specified destination partition format for partitioning. For example, if the source field is gmt_created whose value is 1711358834, the Source Field Format parameter is set to Timestamp Accurate to Seconds, and the Destination Partition Format parameter is set to yyyyMMdd, the value is partitioned based on 20240325.
Specified Partition Field: Set the Format Processing Method parameter to Specified Partition Field and configure other required parameters.
The following table describes the parameters for Synchronization Configuration.
Parameter
Description
Starting Consumer Offset for Incremental Synchronization
When the sync task starts, it begins consuming SLS data from the selected time point. Valid values:
Earliest offset (begin_cursor): Automatically consumes data from the earliest time point in the SLS data.
Latest offset (end_cursor): Automatically retrieves data from the latest time point in the SLS data.
Custom offset: You can select any time point. The system will start consuming from the first piece of data in SLS that is at or after this time point.
Job Resource Group
Specify the Job resource group for the task to run in.
Number Of ACUs For Incremental Sync
The number of AnalyticDB compute units (ACUs) that are required for the job resource group to run the data synchronization job. The value ranges from 2 to the maximum number of computing resources that are available in the job resource group. To improve the stability and performance of data ingestion, we recommend that you specify a larger number of ACUs.
NoteWhen you create a data synchronization job in a job resource group, elastic resources in the resource group are used, and the system deducts the used resources from the resource group. For example, assume that a job resource group has 48 ACUs of reserved computing resources and a synchronization job that consumes 8 ACUs is created. When you create another synchronization job in the resource group, you can select up to 40 ACUs.
Advanced Settings
Advanced configuration lets you customize the sync task. To make custom configurations, contact technical support.
After you configure the preceding parameters, click Submit.
Start the data synchronization job
On the Simple Log Service/Kafka Data Synchronization page, find the data synchronization job that you created and click Start in the Actions column.
In the upper-right corner of the page, click Search. If the state of the job changes to Starting, the data synchronization job is started.
Analyze data
After the data synchronization job is complete, you can use Spark JAR Development to analyze the data that is synchronized to the AnalyticDB for MySQL cluster. For more information about Spark development, see Spark editor and Overview.
In the left-side navigation pane, choose .
Enter SQL statements in the default template and click Run Now.
-- Here is just an example of SparkSQL. Modify the content and run your spark program. conf spark.driver.resourceSpec=medium; conf spark.executor.instances=2; conf spark.executor.resourceSpec=medium; conf spark.app.name=Spark SQL Test; conf spark.adb.connectors=oss; -- Here are your sql statements show tables from lakehouse20220413156_adbTest;(Optional) On the Applications tab, find an application and click Log in the Actions column to view the Spark SQL running log of the application.
Manage the data source
On the Data Sources page, you can perform the operations described in the following table in the Actions column.
Operation | Description |
Create Job | Goes to the Create Synchronization Job or Create Migration Job page to create a job that uses the data source. |
View | Views the detailed configurations of the data source. |
Edit | Modifies the data source parameters, such as the data source name and description. |
Delete | Deletes the data source. Note If the data source is being used in a data synchronization or data migration job, you cannot delete the data source. In this case, you must first go to the Simple Log Service/Kafka Data Synchronization page, find the job, and then click Delete in the Actions column to delete the job. |