This topic describes how to import data from OSS to Log Service. After you store logs in OSS, you can import these logs as OSS data to Log Service. Then you can search, analyze, and transform the data in Log Service.

Prerequisites

  • An OSS bucket is created and the logs to be imported are stored in the OSS bucket. For more information, see Upload objects.
  • A project and a Logstore are created. For more information, see Create a project and a Logstore.
  • Log Service is authorized to use the AliyunLogImportOSSRole role to access your OSS resources. For more information, see Cloud Resource Access Authorization.
    If you use a RAM user, you must attach the PassRole policy to the RAM user. The following script provides an example. For more information, see Create a custom policy and Grant permissions to a RAM user.
    {
       "Statement": [
           {
               "Effect": "Allow",
               "Action": "ram:PassRole",
               "Resource": "acs:ram:*:*:role/aliyunlogimportossrole"
           }
       ],
       "Version": "1"
    }        

Background information

You can use Logtail or call API operations in Log Service to collect data from multiple data sources. You can also import logs to Log Service in a supported format. The supported file formats are JSON, CSV, Parquet, and TEXT.

Create data import configurations

  1. Log on to the Log Service console.
  2. In the Import Data section, select OSS - Object Storage Service.
  3. In the Specify Logstore step, select the target project and Logstore, and click Next.
    You can also click Create Now to create a project and a Logstore. For more information, see Step 1: Create a project and a Logstore.
  4. Configure data import.
    1. On the Specify Data Source tab, set the required parameters. The following table describes the parameters.
      Parameter Description
      Config Name The name of the configuration.
      OSS Region The region where the OSS bucket resides.
      Bucket The bucket where the OSS files reside.
      Folder Prefix The prefix of the folder where the OSS objects to be imported reside. This prefix is used to locate the folder of the OSS objects. For example, if the full path of the OSS bucket is csv/convertcsv.csv, you can set the prefix to csv/.
      Regular Expression Filter The regular expression that is used to filter OSS objects. Only the objects whose names match the regular expression are imported. Default value: null. This value indicates that objects are not filtered.
      Data Format The format into which the OSS objects are parsed. Valid values:
      • CSV: a delimited text file that uses commas (,) to separate values. You can specify the first line of the file as the line of all field names, or customize field names. Field values in all lines except the line of field names is parsed into the values of log fields.
      • JSON: Every OSS object is read line by line. Each line is parsed as a JSON object. The fields in JSON objects are log fields.
      • Parquet: Parquet format. By default, OSS objects are parsed into the Parquet format. You cannot preview Parquet-formatted data.
      • Single-line Text: Each line of OSS objects is parsed into a log entry.
      • Multiple-line Text: Multiple lines of an OSS object are parsed into a log entry. You can specify a regular expression to match the first line or the last line of a log entry.
      Compression Format The compression format of the OSS objects to be imported. Valid values: Gzip, Bzip2, Snappy, and Uncompressed. Log Service decompresses the OSS objects based on the compression format and reads data from the objects. Default value: Uncompressed.
      Encoding Format The encoding format of the OSS objects to be imported.
      Restore Archived Files If the OSS objects are archived objects, they must be restored. Otherwise, Log Service cannot read the objects. If you turn on this switch, archived objects are automatically restored.
    2. Click Preview to preview the objects.
    3. Click Next.
    4. On the Specify Data Type tab, set the required parameters. The following table describes the parameters.
      • Basic parameters
        Parameter Description
        Use System Time
        • If you turn on the Use System Time switch, the time field of a parsed log entry is the system time when the log entry is imported.
        • If you turn off the Use System Time switch, you must configure the time field and format.
        Note We recommend that you turn on the Use System Time switch. You can configure an index for the time field and use the index for log queries. If you import data that is generated earlier than the current time minus the data retention period of the Logstore, the data cannot be queried in the Log Service console. For example, if the data retention period is seven days and you import data that was generated seven days ago, no results can be found in the Log Service console.
        Time Field If you turn off the Use System Time switch, you must specify a field to extract the log time.
        Time Format If you turn off the Use System Time switch, you must specify a time format by using the Java SimpleDateFormat class. The time format is used to parse time fields. For more information, see Class SimpleDateFormat.
        Note The Java SimpleDateFormat class does not support UNIX timestamps. If you want to use UNIX timestamps, you can set the time format to epoch.
        Time Zone If you turn off the Use System Time switch, you must specify a time zone to parse the time zone of the log time. This parameter is invalid if the time zone information already exists in the log format.
      • Unique parameters for CSV files
        Parameter Description
        Delimiter The delimiter that is used to delimit CSV files. By default, the delimiter is a comma (,).
        Quote If a field contains a delimiter, the field must be enclosed in a quote. By default, the quote is double quotation marks ("").
        Escape Character The escape character that is used in CSV files. By default, the escape character is a backslash (\).
        Max Lines for Multiline Logging The maximum number of lines that you specify for a log entry. Default value: 1.
        First Line as Field Name If you turn on the First Line as Field Name switch, the first line in a CSV file records field names.
        Custom Field List If you turn off the First Line as Field Name switch, you can customize field names based on your business requirements. Each field name is separated by a comma (,).
        Lines to Skip The number of lines that are skipped from the starting position of a file before data is read. Default value: 0.
      • Unique parameters for multiple-line text logs
        Parameter Description
        Position to Match with Regex
        • If you select Regex to Match First Line Only, the regular expression that you configure is used to match the first line of a log entry.The unmatched lines are collected as a part of the log entry until the specified maximum number of lines of the log entry is reached.
        • If you select Regex to Match Last Line Only, the regular expression that you configure is used to match the last line of a log entry. The unmatched lines are collected as a part of the next log entry until the specified maximum number of lines of the log entry is reached.
        Regular Expression Configure a regular expression based on the log content. For more information, see How do I modify a regular expression?.
        Max Lines The maximum number of lines that each log entry contains. Default value: 10.
    5. Optional:After you specify the required parameters, click Test.
    6. After the test succeeds, click Next.
    7. Optional:On the Specify Scheduling Interval tab, set the required parameters. The following table describes the parameters.
      Parameter Description
      Import Interval The interval at which the OSS objects are imported to Log Service.
      Import Now If you turn on the Import Now switch, the MaxCompute data is immediately imported.
    8. Click Next.
  5. In the Configure Query and Analysis step, configure the indexes.
    Indexes are configured by default. You can re-configure the indexes based on your business requirements. For more information, see Enable and configure the index feature for a Logstore.
    Note
    • You must configure Full Text Index or Field Search. If you configure both of them, the settings of Field Search are applied.
    • If the data type of index is long or double, the Case Sensitive and Delimiter settings are unavailable.
  6. Click Next.

View the data import configurations

After you create data import configurations, you can view the configuration details and relevant statistical report in the Log Service console.

  1. In the Projects section, click the name of your project.
  2. Find the Logstore to which the data import configurations belong, choose Data Import > Data Import, and then click the name of the data import configurations.
  3. On the Import Configuration Overview page, view the configuration details and statistical report.

Related operations

On the Import Configuration Overview page, you can perform the following operations:

  • Modify the data import configurations

    Click Modify Settings to modify the data import configurations. For more information, see Configure data import.

  • Delete the data import configurations
    You can click Delete Configuration to delete the data import configurations.
    Note After the data import configurations are deleted, the configurations cannot be recovered.

Appendix

  • Sample time formats
    Time format Parsing syntax Parsed value (seconds)
    2020-05-02 17:30:30 yyyy-MM-dd HH:mm:ss 1588411830
    2020-05-02 17:30:30:123 yyyy-MM-dd HH:mm:ss:SSS 1588411830
    2020-05-02 17:30 yyyy-MM-dd HH:mm 1588411800
    2020-05-02 17 yyyy-MM-dd HH 1588410000
    20-05-02 17:30:30 yy-MM-dd HH:mm:ss 1588411830
    2020-05-02T17:30:30V yyyy-MM-dd'T'HH:mm:ss'V' 1588411830
    Sat May 02 17:30:30 CST 2020 EEE MMM dd HH:mm:ss zzz yyyy 1588411830
  • Time parsing syntax
    Character Description Example
    G Epoch mark AD
    y 4-digit year number 2001
    M Month name or number July or 07
    d Day of month as a number 10
    h Hour (12-hour clock, 0 to 12) 12
    H Hour (24-hour clock, 0 to 23) 22
    m Minute 30
    s Second 55
    S Millisecond 234
    E Day of week (Sunday to Saturday) Tuesday
    D Day of year 360
    F Day of week in a month 2 (second Wed. in July)
    w Week number of year 40
    W Week number of month 1
    a AM/PM PM
    k Hour (24-hour clock, 1 to 24) 24
    k Hour (12-hour clock, 0 to 11) 10
    z Time zone Eastern Standard Time
    ' Text delimiter Delimiter
    " Single quotation mark None