This topic provides answers to some frequently asked questions about the Scheduled SQL feature.
How do I ensure data accuracy when I execute SQL statements to analyze data?
Data analysis results may be inaccurate due to the following reasons: Latency exists when data is written to Simple Log Service, or the scheduling configurations of instances are invalid.
A write latency exists when data is written to Simple Log Service. In a scenario with a 5-minute write latency, if an instance whose SQL time window is [12:02:00,12:03:00) runs at 12:03:00, Simple Log Service cannot obtain data for that time window.
A query latency exists after data is written to Simple Log Service. In most cases, the latency is less than 3 seconds. Some data cannot be obtained even if the latency is low. For example, if an instance whose SQL time window is [12:02:30,12:03:30) runs at 12:03:30, Simple Log Service may fail to obtain the logs that are written to Simple Log Service at 12:03:29 for that time window.
If the logs that are generated at different points in time are written to Simple Log Service at the same minute, all the logs have the same index based on the storage time. Later logs can have an earlier index. For example, an instance whose SQL time window is [12:02:30,12:03:30) runs at 12:03:30. If two logs generated at 12:02:20 and 12:02:50 are written to Simple Log Service at 12:02:50, the logs are both indexable by using 12:02:20. In this case, Simple Log Service cannot obtain logs for the time window [12:02:30,12:03:30).
When you use the Scheduled SQL feature, we recommend that you balance the timeliness and accuracy of data based on your business requirements.
When data is uploaded to Simple Log Service, latency may exist. In this case, the data for an SQL time window may not be completely uploaded to Simple Log Service when an instance is running. To prevent this issue, we recommend that you configure the Delay Task and SQL Time Window parameters based on the data collection latency and the maximum result viewing latency allowed for your business. To ensure that instances run as expected, we recommended that you specify values that are slightly earlier than the theoretical values.
To ensure the accuracy of processing results when some unordered data is uploaded, we recommend that you specify minute- or hour-level SQL time windows for instances.
How do I resolve the latency problem?
Method 1
To avoid log time delays, ensure log time matches write time, and prevent scheduled SQL calculations from missing data, use the ingest processor to process data before it is written to the logstore.
Assume that a project named web-project-test and a logstore named website_log are created. The Log Public IP feature is enabled for the logstore. Logs are collected by using Logtail or Simple Log Service SDKs.
Create an ingest processor.
Log on to the Simple Log Service console. In the Projects section, click the project web-project-test.
In the left-side navigation pane, choose
.On the
tab, click Create.On the Create Processor panel, enter the processor name and the following SPL statement, and then click OK.
* | extend __time__=cast("__tag__:__receive_time__" as bigint)
Associate the ingest processor with the logstore.
In the left-side navigation pane, click Log Storage, move the pointer over the
website_log
logstore, and then choose .On the Logstore Attributes page, click Modify in the upper-right corner of the page. In edit mode, select the ingest processor that you want to associate with the logstore from the Ingest Processor
drop-down list and click Save.
Enter the following SQL statement to verify the results of the write time and log time.
* | SELECT cast("__tag__:__receive_time__" as bigint) - cast(__time__ as bigint) as diff WHERE cast("__tag__:__receive_time__" as bigint) - cast(__time__ as bigint) > 0
Method 2
Logstore supports forcibly overwriting time with the server time to avoid latency issues. If you want to use this feature, submit a ticket.
How do I prevent failures when I execute SQL statements to analyze data?
Enter valid SQL statements.
Configure valid indexes for the fields that you want to analyze. For example, if the query statement is
* | select uid
, you must turn on Enable Analytics for the uid field. For more information, see Create indexes.Make sure that your account is granted the required permissions. For example, your account is granted the permissions to execute SQL statements to analyze data and the permissions to read data from logstores.
Make sure that you have sufficient computing resources. For more information about resource limits, see Resource limits for query and analysis.
To prevent calculation timeout errors, we recommend that you do not use complex SQL statements or configure a long SQL time window. The timeout period of Simple Log Service is 10 minutes.
Does Simple Log Service check indexes when the calculation results of a Scheduled SQL job are written to a destination logstore or Metricstore?
No, Simple Log Service does not check indexes when the calculation results of a Scheduled SQL job are written to the destination logstore or Metricstore. If you have not configured indexes for the destination logstore or Metricstore, you cannot execute SQL statements to analyze data. Log consumption and query are not affected.
Before you create a Scheduled SQL job, we recommend that you configure indexes for the destination logstore or Metricstore. If you have not configured indexes for the destination logstore or Metricstore, you can use the reindexing feature to configure indexes for historical data. For more information, see Reindex logs for a logstore.
Does the timeout of an instance affect subsequent execution?
No, the timeout of an instance does not affect subsequent execution. The scheduled time for subsequent instances follows the scheduled time of the instance that timed out, but the subsequent instances are delayed from creating and running. The delay can be gradually offset by running subsequent instances at a higher speed until an instance runs as scheduled.
In delay offsetting scenarios, if the volume of data that needs to be processed is fixed, a larger scheduling interval leads to a higher offset speed. Examples:
If 24-hour data needs to be processed and the scheduling interval is 1 minute, a total of 1,440 instances are generated, and each instance runs for 20 seconds.
If 24-hour data needs to be processed and the scheduling interval is 1 hour, a total of 24 instances are generated, and each instance runs for 2 minutes.
Simple Log Service supports distributed query and analysis and allocates more computing resources if more data needs to be processed.
How do I trace the source of data that is written to a destination logstore or Metricstore?
By default, the following __tag__ fields are added when the data of a Scheduled SQL job is written to the destination logstore or Metricstore. You can use the fields to trace the source of the data.
__tag__:__instance_id__:2b****06a: the ID of the instance.
__tag__:__job__:from_now: the name of the job.
__tag__:__project__:ali-sls-etl-staging: the project to which the job belongs.
__tag__:__schedule_time__:1618474200: the point in time at which the job is scheduled. The value is a UNIX timestamp. Unit: seconds.
__tag__:__trigger_time__:1618474259: the point in time at which the job is run. The value is a UNIX timestamp. Unit: seconds.
How do I specify a custom log time for logs that a Scheduled SQL job writes to a destination logstore?
If you want to query and analyze logs based on the log time and do not want to use the default log time, you can specify a custom log time by referring to the following cases.
The __time__ field specifies the log time. You can view the log time in the destination logstore.
The current_timestamp function returns the current date, time, and time zone. For more information, see current_timestamp function.
Which time is used as the log time for logs that a Scheduled SQL job writes to a destination logstore?
If the SQL statement of a Scheduled SQL job does not contain the __time__
field, the start time of the SQL time window is used as the default log time. By default, the * | select *
statement covers the __time__
field.
In the following figure, the start time of the SQL time window is 2024-09-14 15:46:00. Simple Log Service automatically uses the start time as the log time at which the result logs of the instance are written to the destination logstore.
Case 1: Specify the log time of the source logstore as the log time at which logs are written to a destination logstore
When you create a Scheduled SQL job, you must add the __time__
field together with the required fields to the SQL statement that you specify. The __time__ field specifies the log time of the source logstore. This way, when the logs are written from the source logstore to the destination logstore, Simple Log Service obtains the value of the __time__
field from the results of the query statement and uses the value as the log time at which logs are written to the destination logstore.
Case 2: Specify the current time as the log time at which logs are written to a destination logstore
When you create a Scheduled SQL job, you must add the cast(to_unixtime(current_timestamp) as bigint) AS __time__
field together with the required fields to the SQL statement that you specify. The cast(to_unixtime(current_timestamp) AS __time__ field specifies the current time. This way, Simple Log Service uses the current time as the log time at which logs are written to the destination logstore. For more information about the to_unixtime function, see to_unixtime function. For more information about the cast function, see cast function.
Case 3: Specify the value of a time field in source logs as the log time at which logs are written to a destination logstore
You must replace the start_time
field in the SQL statement that you specify with an actual time field in your source logs. Make sure that the field value is a timestamp. This way, Simple Log Service uses the value of the __time__
field in the results of the query statement as the log time at which logs are written to the destination logstore.