You can upload log files to Object Storage Service (OSS) buckets for storage. Then, you can import the log data from OSS to Log Service and perform supported operations on the data in Log Service. For example, you can query, analyze, and transform the data. You can import only the OSS objects that are no more than 5 GB in size to Log Service. If you want to import a compressed object, the size of the compressed object must be no more than 5 GB.

Prerequisites

  • Log files are uploaded to an OSS bucket. For more information, see Upload objects.
  • A project and a Logstore are created. For more information, see Create a project and Create a Logstore.
  • Log Service is authorized to assume the AliyunLogImportOSSRole role to access your OSS resources. You can complete the authorization on the Cloud Resource Access Authorization page.
    If you use a RAM user, you must grant the PassRole permission to the RAM user. The following example shows a policy that you can use to grant the PassRole permission. For more information, see Create a custom policy and Grant permissions to the RAM user.
    {
       "Statement": [
        {
          "Effect": "Allow",
          "Action": "ram:PassRole",
          "Resource": "acs:ram:*:*:role/aliyunlogimportossrole"
        }
      ],
      "Version": "1"
    }        

Create a data import configuration

Important Data import tasks import full data of updated OSS objects to Log Service. If new data is appended to an OSS object that is imported to Log Service, all data of the OSS object is re-imported to Log Service when a data import task for the OSS object is run.
  1. Log on to the Log Service console.
  2. On the Data Import tab in the Import Data section, click OSS - Data Import.
  3. Select the project and Logstore. Then, click Next.
  4. In the Configure Import Settings step, create a data import configuration.
    1. In the Configure Import Settings step, configure the parameters. The following table describes the parameters.
      ParameterDescription
      Config NameThe name of the data import configuration.
      OSS RegionThe region where the OSS bucket resides. The OSS bucket stores the OSS objects that you want to import to Log Service.

      If the OSS bucket and the Log Service project reside in the same region, no Internet traffic is generated, and data is transferred at a high speed.

      BucketThe OSS bucket.
      File Path Prefix FilterThe directory of the OSS objects. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. For example, if the OSS objects that you want to import are stored in the csv/ directory, you can set this parameter to csv/.
      If you leave this parameter empty, the system traverses the entire OSS bucket to find the OSS objects.
      Note We recommend that you configure this parameter. The larger the number of OSS objects in an OSS bucket is, the lower the data import efficiency becomes when the entire bucket is traversed.
      File Path Regex FilterThe regular expression that is used to filter OSS objects by directory. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. Only the objects whose names match the regular expression are imported. The names include the paths of the objects. By default, this parameter is left empty, which indicates that no filtering is performed.

      For example, if an OSS object that you want to import is named testdata/csv/bill.csv, you can set this parameter to (testdata/csv/)(.*).

      For information about how to test a regular expression, see How do I test a regular expression?

      File Modification Time FilterThe modification time that is used to filter OSS objects. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. Valid values:
      • All: To import all the OSS objects that meet specified conditions, select this option.
      • Start at a specified time: To import OSS objects that are modified after a specified point in time, select this option.
      • Within Specific Period: To import OSS objects that are modified within a specified time range, select this option.
      Data FormatThe format of the OSS objects. Valid values:
      • CSV: You can specify the first line of an OSS object as field names or specify custom field names. All lines except the first line are parsed as the values of log fields.
      • Single-line JSON: An OSS object is read line by line. Each line is parsed as a JSON object. The fields in JSON objects are log fields.
      • Single-line Text: Each line in an OSS object is parsed as a log.
      • Multi-line Text: Multiple lines in an OSS object are parsed as a log. You can specify a regular expression to match the first line or the last line in a log.
      • ORC: An OSS object in the Optimized Row Columnar (ORC) format is automatically parsed into the format that is supported by Log Service without manual configurations.
      • Parquet: An OSS object in the Parquet format is automatically parsed into the format that is supported by Log Service without manual configurations.
      • Alibaba Cloud OSS Access Log: OSS data is parsed based on the format of access logs of Alibaba Cloud OSS. For more information, see Logging.
      • Alibaba Cloud CDN Download Log: OSS data is parsed based on the format of download logs of Alibaba Cloud CDN (CDN). For more information, see Download logs.
      Compression FormatThe compression format of the OSS objects that you want to import. Log Service decompresses the OSS objects based on the specified format to read data.
      Encoding FormatThe encoding format of the OSS objects that you want to import. Only UTF-8 and GBK are supported.
      New File Check CycleIf new objects are constantly generated in the specified directory of OSS objects, you can specify an interval for the New File Check Cycle parameter based on your business requirements. After you configure this parameter, a data import task continuously runs in the backend, and new objects are automatically detected and read at regular intervals. This ensures that data in an OSS object is not repeatedly written to Log Service.

      If new objects are no longer generated in the specified directory of OSS objects, you can change the value of the New File Check Cycle parameter to Never Check. This value indicates that a data import task automatically exits after all the objects that meet specified conditions are read.

      Import Archive FilesIf the OSS objects that you want to import are of the Archive or Cold Archive storage class, Log Service can read data from the objects only after the objects are restored. If you turn on this switch, Archive and Cold Archive objects are automatically restored.
      Note
      • Restoring Archive objects requires approximately 1 minute, which may cause the first preview to time out. If the first preview times out, try again later.
      • Restoring Cold Archive objects requires approximately 1 hour. If the preview times out, you can skip the preview or try again 1 hour later.

        By default, restored Cold Archive objects are valid within seven days. This way, the system has sufficient time to import Cold Archive objects to Log Service.

      Log Time Settings
      Time FieldThe time field. You must enter the name of a time column in an OSS object. If you set the Data Format parameter to CSV, Single-line JSON, ORC, Parquet, Alibaba Cloud OSS Access Log, or Alibaba Cloud CDN Download Log, you must configure this parameter. This parameter specifies the time at which logs are imported to Log Service.
      Regex to Extract TimeThe regular expression that is used to extract log time. If you set the Data Format parameter to Single-line Text or Multi-line Text, you must configure this parameter.
      For example, to extract log time from the 127.0.0.1 - - [10/Sep/2018:12:36:49 0800] "GET /index.html HTTP/1.1" log, you can set the Regex to Extract Time parameter to [0-9]{0,2}\/[0-9a-zA-Z]+\/[0-9:,]+.
      Note For other data types, if you want to extract part of the time field, you can specify a regular expression.
      Time Field FormatThe time format that is used to parse the value of the time field.
      • The time formats that follow the syntax defined in the Java SimpleDateFormat class are supported. Example: yyyy-MM-dd HH:mm:ss. For more information about the time format syntax, see Class SimpleDateFormat. For more information about common time formats, see Time formats.
      • You can use an epoch time format, such as epoch, epochMillis, epochMicro, or epochNano.
      Time ZoneThe time zone that corresponds to the time field. If the value of the Time Field Format parameter is set to an epoch time format, you do not need to configure this parameter.

      If daylight saving time (DST) and winter time are used in your time zone, select UTC. Otherwise, select GMT.

      Advanced Settings
      OSS Metadata IndexingIf the number of OSS objects exceeds one million, we recommend that you turn on the switch to improve the efficiency of new file discovery. If you enable the metadata indexing feature, the system can find new objects in an OSS bucket within seconds. This way, data in the new objects can be written to Log Service in near real time.

      Before you use the metadata indexing feature, you must enable the metadata management feature in OSS. For more information, see Data indexing.

      If you set the Data Format parameter to CSV or Multi-line Text, you must configure additional parameters. The following tables describe the parameters.
      • Unique parameters when you set Data Format to CSV
        ParameterDescription
        DelimiterThe delimiter for logs. The default value is a comma (,).
        QuoteThe quote that is used to enclose a CSV-formatted string.
        Escape CharacterThe escape character for logs. The default value is a backslash (\).
        Maximum LinesThe maximum number of lines allowed in a log if the original log has multiple lines. Default value: 1.
        First Line as Field NameIf you turn on First Line as Field Name, the first line in a CSV file is extracted as field names. For example, the first line in the CSV file that is shown in the following figure is extracted as field names. First line
        Custom FieldsIf you turn off First Line as Field Name, you can specify custom field names based on your business requirements. Separate multiple field names with commas (,).
        Lines to SkipThe number of lines that are skipped. For example, if you set this parameter to 1, the system skips the first line of a CSV file and starts collecting logs from the second line.
      • Unique parameters when you set Data Format to Multi-line Text
        ParameterDescription
        Position to Match with RegexThe usage of a regular expression.
        • If you select Regex to Match First Line Only, the regular expression that you specify is used to match the first line in a log. The unmatched lines are not collected as a part of the log until the specified maximum number of lines is reached.
        • If you select Regex to Match Last Line Only, the regular expression that you specify is used to match the last line in a log. The unmatched lines are not collected as a part of the next log until the specified maximum number of lines is reached.
        Regular ExpressionThe regular expression. You can specify a regular expression based on the log content.

        For information about how to test a regular expression, see How do I test a regular expression?

        Max LinesThe maximum number of lines allowed in a log.
    2. Click Preview to preview the import result.
    3. Confirm the settings and click Next.
  5. Preview data, configure indexes, and then click Next.
    By default, full-text indexing is enabled for Log Service. You can also configure field indexes based on collected logs in manual mode or automatic mode. To configure field indexes in automatic mode, click Automatic Index Generation. This way, Log Service automatically creates field indexes. For more information, see Create indexes.
    Important If you want to query and analyze logs, you must enable full-text indexing or field indexing. If you enable both full-text indexing and field indexing, the system uses only field indexes.
  6. Click Log Query. On the Search & Analysis page, check whether OSS data is imported.
    Wait for approximately 1 minute. If the required OSS data is imported, the import is successful.

View the data import configuration

After you create the data import configuration, you can view the configuration details and related statistical reports in the Log Service console.

  1. In the Projects section, click the project to which the data import configuration belongs.
  2. In the left-side navigation pane, choose Log Storage > Logstores. Click the Logstore to which the data import configuration belongs, choose Data Import > Data Import, and then click the name of the data import configuration.
  3. On the Import Configuration Overview page, view the basic information and statistical reports of the data import configuration.
    Import Configuration Overview

What to do next

On the Import Configuration Overview page, you can perform the following operations on the data import configuration:

  • Modify the configuration

    To modify the data import configuration, click Modify Settings. For more information, see Create a data import configuration.

  • Delete the configuration
    To delete the data import configuration, click Delete Configuration.
    Warning After the data import configuration is deleted, it cannot be recovered.
  • Stop a task

    To stop the data import task, click Stop.

FAQ

IssueCauseSolution
No data is displayed during preview. The OSS bucket contains no objects, the objects contain no data, or no objects meet the filter conditions.
  • Check whether the OSS bucket contains objects that are not empty or check whether CSV files contain only the headers line. If no OSS objects that contain data exist, you cannot import the objects until the objects contain data.
  • Modify the File Path Prefix Filter, File Path Regex Filter, and File Modification Time Filter parameters.
Garbled characters exist. The data format, compression format, or encoding format is not configured as expected. Check the actual format of the OSS objects, and then modify the Data Format, Compression Format, or Encoding Format parameter.

To handle the existing garbled characters, create a Logstore and a data import configuration.

The log time displayed in Log Service is different from the actual time at which data is imported. The time field is not specified in the data import configuration, or the specified time format or time zone is invalid. Specify a time field or specify a valid time format or time zone. For more information, see Create a data import configuration.
After data is imported, the data cannot be queried or analyzed.
  • The data is not within the query time range.
  • No indexes are configured.
  • The indexes failed to take effect.
  • Check whether the import time of the data that you want to query is within the time range that you specify for the query.

    If not, adjust the query time range and query the data again.

  • Check whether indexes are configured for the Logstore to which the data is imported.

    If not, configure indexes first. For more information, see Create indexes and Reindex logs for a Logstore.

    .
  • If indexes are configured for the Logstore and an expected volume of imported data is displayed on the Data Processing Insight dashboard, the possible cause is that the indexes failed to take effect. In this case, reindex the data. For more information, see Reindex logs for a Logstore.
The number of imported data entries is less than expected. Some OSS objects contain data in which a line is greater than 3 MB in size. In this case, the data is discarded during the import. For more information, see Limits on collection. When you write data to an OSS object, make sure that the size of a line does not exceed 3 MB.
The number of OSS objects and the total volume of data are large, but the import speed does not meet your expectation. In most cases, the import speed can reach 80 MB/s. The number of shards in the Logstore is excessively small. For more information, see Limits on performance. If the number of shards in a Logstore is small, increase the number of shards to 10 or more and check the latency. For more information, see Manage shards.
You cannot select an OSS bucket when you create a data import configuration. The AliyunLogImportOSSRole role is not assigned to Log Service. Complete authorization. For more information, see the "Prerequisites" section of this topic.
Some OSS objects failed to be imported to Log Service. The settings of the filter conditions are invalid or the size of a single object exceeds 5 GB. For more information, see Limits on collection.
  • Check whether the OSS objects that you want to import meet the filter conditions. If the objects cannot meet the filter conditions, modify the filter conditions.
  • Make sure that the size of each OSS object that you want to import is less than 5 GB.

    If the size of an object exceeds 5 GB, reduce the size of the object.

No Archive objects are imported to Log Service. Import Archive Files is turned off. For more information, see Limits on collection.
  • Method 1: Modify the data import configuration and turn on Import Archive Files.
  • Method 2: Create a data import configuration and turn on Restore Archived Files.
Multi-line text logs are incorrectly parsed. The specified regular expression that is used to match the first line or the last line in a log is invalid. Check whether the regular expression that is used to match the first line or the last line in a log is valid.
The latency to import new OSS objects is higher than expected. The number of existing OSS objects that meet the conditions specified by File Path Prefix Filter exceeds the limit and OSS Metadata Indexing is turned off in the data import configuration. If the number of existing OSS objects that meet the conditions specified by File Path Prefix Filter exceeds one million, turn on OSS Metadata Indexing in the data import configuration. Otherwise, the efficiency of new file discovery is low.

Error handling

ItemDescription
File read failureIf an OSS object fails to be completely read because a network exception occurs or the object is damaged, the corresponding data import task automatically retries to read the object. If the object fails to be read after three retries, the object is skipped.

The retry interval is the same as the value of the New File Check Cycle parameter. If the New File Check Cycle parameter is set to Never Check, the retry interval is 5 minutes.

Compression format parsing errorIf the compression format is invalid when an OSS object is decompressed, the corresponding data import task skips the object.
Data format parsing error
  • If data in the binary format (ORC or Parquet) fails to be parsed, the corresponding data import task skips the OSS object
  • If data in other formats fails to be parsed, the data import task stores the original text content in the content field of logs.
Logstore not existA data import task periodically retries. The data import task does not resume the import until the Logstore is recreated.

If the Logstore does not exist, the data import task does not skip any OSS objects. Therefore, after the Logstore is recreated, the data import task automatically imports data from the unprocessed objects in the OSS bucket to the Log Service Logstore.

OSS bucket not existA data import task periodically retries. The data import task does not resume the import until the OSS bucket is recreated.
Permission errorIf a permission error occurs when data is read from an OSS bucket or data is written to a Log Service Logstore, the corresponding data import task periodically retries. The data import task does not resume the import until the error is fixed.

If a permission error occurs, the data import task does not skip any OSS objects. Therefore, after the error is fixed, the data import task automatically imports data from the unprocessed objects in the OSS bucket to the Log Service Logstore.