After a data transformation task is started, the data transformation engine sends the transformation results to the target Logstores based on routing rules. This topic provides guidance for troubleshooting data transformation task failures. For example, no log is generated in the target Logstores or a long delay occurs during the data transformation process.

Analyze errors

When an error occurs, you can analyze in which step of the data transformation task the error is produced. This helps you locate the error more efficiently.

According to Processing principles, a data transformation task consists of four steps, as shown in the following figure.Data transformation steps
Errors may occur in any of the four steps, with different causes, impacts, and troubleshooting methods.
  • Start the data transformation engine.
    • Errors may occur in this step if domain specific language (DSL) rules for log transformation fail in the internal security audit by the data transformation engine.
    • If an error occurs in this step, the data transformation task stops. You need to modify the DSL rules and restart the data transformation task. If the retry succeeds, the data transformation task works properly without losing any logs or generating any duplicate logs.
    For more information about the troubleshooting methods in this step, see Startup errors of the data transformation engine.
  • Read data from the source Logstore.
    • Errors may occur in this step because of inaccessibility to the source Logstore. Possible causes for such errors are as follows: The configurations of the source Logstore are incorrect. A network error occurs. The source Logstore information is changed.
    • If an error occurs in this step, the data transformation task will keep retrying until data reading succeeds or is manually stopped. If the retry succeeds, the data transformation task works properly without losing any logs.
    • If an error is returned after some data has been read, the data transformation task saves the breakpoint and keeps retrying. After the retry succeeds, it continues reading from the breakpoint without losing any logs or generating any duplicate logs. If the task is stopped during the retry process, no log is lost and no duplicate log is generated.
    For more information about the troubleshooting methods in this step, see Errors in reading data from the source Logstore.
  • Transform log events.
    • Errors may occur in this step if the transformation rules do not apply to all or some log events during the data transformation process.
    • If an error occurs in this step, log events that do not match the transformation rules are discarded and not included in the transformation results.
    For more information about the troubleshooting methods in this step, see Errors in data transformation rules.
  • Export the transformation results to the target Logstores.
    • Errors may occur in this step because of inaccessibility to the target Logstores. Possible causes for such errors are as follows: The configurations of the target Logstores are incorrect. A network error occurs. The target Logstore information is changed.
    • If an error occurs in this step, the data transformation task will keep retrying until data export succeeds or is manually stopped. If the retry succeeds, the data transformation task works properly without losing any logs.
    • An error may occur after some data is exported. For example, two target Logstores are specified. Data export to one Logstore succeeds, but to the other Logstore fails. If such an error occurs, the data transformation task saves the breakpoint and keeps retrying. After the retry succeeds, the data transformation task continues data export from the breakpoint without losing any logs or generating any duplicate logs. If the data transformation task is stopped and then restarted when the error occurs, the data transformation task continues from the breakpoint. In this case, no log is lost, but duplicate logs may be generated.
    For more information about the troubleshooting methods in this step, see How can I fix errors that occur during data outputs to the target Logstore?

Troubleshoot common errors

  1. Check whether data has been written to a target Logstore.
    Check whether data has been written to a target Logstore recently by viewing data on the Consumption Preview page of the target Logstore.Consumption preview
    Note The consumption preview data may be inaccurate because of the following causes:
    • In Log Service, logs are transformed based on their receiving time. In scenarios where historical logs are being written for transformation, the time range specified for querying historical logs may be different from the time when the logs are written.
    • There is usually a delay of several minutes to query log data based on the indexes of historical logs written to Log Service. If historical logs are being written in a data transformation task, the log data may not be queried immediately on the GUI.
  2. View the status of the data transformation task.
    • Check whether the current task is started. For more information, see View task status. Tasks with a fixed time range automatically stop at the end time.
    • Check whether the consumer group in the current task is enabled and updated.Consumption status
    • Check whether errors occur by referring to View error logs. Find the causes and fix the errors by referring to Analyze errors.
  3. Check whether data is generated in the source Logstore.
    Check whether logs exist in the source Logstore within the time range of the current data transformation task.
    • If the end time of the time range is not set, check whether new logs are generated in the source Logstore. If no new logs are generated and no historical logs exist within the specific time range, the data transformation task cannot be performed.
    • If you select a time range in the past, check whether logs within that range exist in the source Logstore.

    Click Modify a rule for the data transformation task, select a time range, and then check whether raw logs exist in the specified time range.

  4. Check whether the transformation rules are correct.
    Check whether any exceptions exist in the transformation rule code. For example, the following exceptions may occur:
    • The log time is modified. As a result, no logs are queried in the current time range.
    • The transformation rule code discards logs under specific conditions.
      For example, the following code discards all the logs without the name field or with an empty value for this field. The pre-logic in the code is to build the name field. If the name field is not correctly built because of a pre-logic issue, no log is generated.
      # .... The pre-logic.
      #  .... Build the name field...
      
      e_keep(e_search('name: "?"'))
    • If the task pulls data from a third party for enrichment, check whether the data size of the third party is too large. If so, the data transformation task may be in the initializing state and fail to start consuming data in a timely manner. An example is as follows:
      e_dict_map(res_rds_mysql(..database="userinfo", table="user"), "username", ["city", "school", "age"])

    Click Modify a rule for the data transformation task, select a time range, and then click Preview Data to view the result.

    If logs are queried, comment out the specific statement that causes the error and preview data again.

  5. Check whether the number of shards is as expected.

    If data transformation is slow, check whether the planning of the source and target Logstores meets performance expectations. If not, we recommend that you adjust the number of shards in the source or target Logstore.

View error logs

You can view error logs in the following ways:
  • View error logs in the internal-etl-log Logstore.
    Logs generated by a data transformation task are stored in the internal-etl-log Logstore. The system automatically generates this Logstore after performing the data transformation task.
    • The internal-etl-log Logstore is a dedicated Logstore provided for you free of charge. You cannot modify its configuration or write any other data to it.
    • In the internal-etl-log Logstore, the __topic__ field of each log event indicates the status of the data transformation task. You can check whether an error occurs in the data transformation task based on this field.
    • You can check the message and reason fields of each log event to view the detailed error information, as shown in the following figure.Error information in the Logstore
  • View error logs on the dashboard.

    Click the data transformation task and check the dashboard data in the Status section on the Data Transformation Overview page.

    The error information details appear in the reason column in the Exception detail section.Error information on the dashboard
  • View error logs in the console.

    Error logs in the preview phase directly appear in the console. In the preview phase, you simulate the operations of the transformation rule and check whether the preview data meets expectations. The preview operation does not bring about real changes to the source and target Logstores. Therefore, any errors that occur in the preview phase will not affect the source log events.

Preview restrictions

Compared to real data transformation tasks, data transformation in the preview phase has the following restrictions:
  • An invalid AccessKey of the RAM user used for accessing the source Logstore cannot be identified.

    In the preview phase, no consumer groups are created to consume data. Therefore, the system does not check the permissions of the RAM user on consumer groups.

  • A name error of the target Logstore in the transformation rule cannot be identified.

    Data is not really written to the target Logstore in the preview phase. Therefore, the system does not check whether a correct target Logstore is configured in the transformation rule.

  • The configuration errors of the target Logstore cannot be identified.
    • The configuration errors include incorrect target project, target Logstore, and AccessKey permission configurations.
    • Data is not really written to the target Logstore in the preview phase. Therefore, the system does not check whether the configurations of the target Logstore are correct.
  • Only partial data is pulled in the preview phase.
    • By default, only 1,000 data records are pulled from the source Logstore for data transformation during preview.
    • If no transformation result is generated after the first 1,000 data records are pulled and transformed, the system continues to pull data for 5 minutes until a transformation result is generated.