When you ship data from Simple Log Service to MaxCompute, verify data completeness at the partition level. This check confirms that all data for a specific partition in a MaxCompute table was delivered.
Verify data completeness
Reserved field
Using the __partition_time__ field
The __partition_time__ value derives from the time field of a log. Simple Log Service calculates this value by rounding down the log's event time based on a specified time format string. The event time is the actual time of the log, not the time when the data is shipped or written to the server.
For example, if a log's event time is 2017-05-19 10:43:00, the partition format string is set to yyyy_MM_dd_HH_mm, and data is shipped hourly, MaxCompute stores this log in the 2017_05_19_10_00 partition, regardless of when it was written to the server. For more details on this calculation, see Ship logs to MaxCompute (legacy).
If you are writing logs in real time instead of shipping historical data, use one of the following methods to verify if the data in a partition is complete:
-
Use the API, SDK, or console. This is the recommended method.
Use the API, SDK, or console to get the shipping tasks for a specific project and Logstore. The API returns a list of tasks, which the console then visualizes. The following is an example API response:
{ "count" : 10, "total" : 20, "statistics" : { "running" : 0, "success" : 20, "fail" : 0 } "tasks" : [ ... { "id" : "abcdefghijk", "taskStatus" : "success", "taskMessage" : "", "taskCreateTime" : 1448925013, "taskLastDataReceiveTime" : 1448915013, "taskFinishTime" : 1448926013 }, { "id" : "xfegeagege", "taskStatus" : "success", "taskMessage" : "", "taskCreateTime" : 1448926813, "taskLastDataReceiveTime" : 1448930000, "taskFinishTime" : 1448936910 } ] }taskLastDataReceiveTimeindicates the time when Simple Log Service received the data. You can use this parameter to determine whether all data from before time T has been delivered to the MaxCompute table.-
You can consider data before time
Tcomplete if all shipping tasks with ataskLastDataReceiveTimeearlier thanT + 300shave a status ofsuccess. The 300-second buffer accounts for potential retries due to transient errors. -
If any tasks are in the
readyorrunningstate, data delivery is still in progress. Wait for these tasks to finish. -
If any tasks fail, investigate the cause, resolve the issue, and then retry the task. You may need to modify your shipping task configuration to fix the problem.
-
-
Estimate completeness based on MaxCompute partitions
For example, if you partition your MaxCompute table and run your shipping task every 30 minutes, you will see partitions like the following:
2017_05_19_10_00 2017_05_19_10_30When the 2017_05_19_11_00 partition is created, you can assume that the data in all previous partitions (before 11:00) is complete.
While this method is simpler and does not require the API, it is less precise and only provides a rough estimate.
Custom field
How this method works
For example, a log might contain a date field with values such as 20170518 and 20170519. When you configure the shipping rule, you can map this date field to a partition key column.
In this case, you must also consider the time difference between the date field's value and the log's write time. To verify completeness, combine this with the method described in Use a reserved field as the partition key column to verify data completeness, which relies on the data reception time.
Troubleshooting
Task succeeds but data is missing
If a shipping task succeeds but you find that data is missing from the MaxCompute table, it may be due to one of the following causes:
-
The source field in Simple Log Service that is mapped to a partition key column does not exist. This results in a
nullvalue for the partition key, which is not allowed by MaxCompute. -
The value of the Simple Log Service field mapped to a partition key column contains a forward slash (
/) or other special characters. MaxCompute prohibits these characters in partition key values because it treats them as reserved words.
When these issues occur, the shipping task skips the invalid logs and continues processing. The task still delivers other valid logs to their correct partitions.
Therefore, an incorrect field mapping can cause data loss even if the task status is success. To resolve this, correct the partition column configuration. We recommend using the __partition_time__ reserved field for partitioning.
For more details, see Limits.