All Products
Search
Document Center

Simple Log Service:Import data from OSS to Simple Log Service

Last Updated:Dec 20, 2023

You can upload log files to Object Storage Service (OSS) buckets for storage. Then, you can import the log data from OSS to Simple Log Service and perform supported operations on the data in Simple Log Service. For example, you can query, analyze, and transform the data. You can import only the OSS objects that are no more than 5 GB in size to Simple Log Service. If you want to import a compressed object, the size of the compressed object must be no more than 5 GB.

Prerequisites

  • Log files are uploaded to an OSS bucket. For more information, see Upload objects.

  • A project and a Logstore are created. For more information, see Create a project and Create a Logstore.

  • Simple Log Service is authorized to assume the AliyunLogImportOSSRole role to access your OSS resources. You can complete the authorization on the Cloud Resource Access Authorization page.

    If you use a RAM user, you must grant the PassRole permission to the RAM user. The following example shows a policy that you can use to grant the PassRole permission. For more information, see Create custom policies and Grant permissions to a RAM user.

    {
     "Statement": [
        {
           "Effect": "Allow",
           "Action": "ram:PassRole",
           "Resource": "acs:ram:*:*:role/aliyunlogimportossrole"
        },
        {
            "Action": "oss:GetBucketWebsite",
            "Resource": "*",
            "Effect": "Allow"
        }
     ],
     "Version": "1"
    }   

Create a data import configuration

Important

Data import jobs import full data of updated OSS objects to Simple Log Service. If new data is appended to an OSS object that is imported to Simple Log Service, all data of the OSS object is re-imported to Simple Log Service when a data import job for the OSS object is run.

  1. Log on to the Simple Log Service console.

  2. On the Data Import tab in the Import Data section, click OSS - Data Import.

  3. Select the project and Logstore. Then, click Next.
  4. In the Configure Import Settings step, create a data import configuration.

    1. In the Configure Import Settings step, configure the parameters. The following table describes the parameters.

      Parameter

      Description

      Config Name

      The name of the data import configuration.

      OSS Region

      The region where the OSS bucket resides. The OSS bucket stores the OSS objects that you want to import to Simple Log Service.

      If the OSS bucket and the Simple Log Service project reside in the same region, no Internet traffic is generated, and data is transferred at a high speed.

      Bucket

      The OSS bucket.

      File Path Prefix Filter

      The directory of the OSS objects. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. For example, if the OSS objects that you want to import are stored in the csv/ directory, you can set this parameter to csv/.

      If you leave this parameter empty, the system traverses the entire OSS bucket to find the OSS objects.

      Note

      We recommend that you configure this parameter. The larger the number of OSS objects in an OSS bucket is, the lower the data import efficiency becomes when the entire bucket is traversed.

      File Path Regex Filter

      The regular expression that is used to filter OSS objects by directory. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. Only the objects whose names match the regular expression are imported. The names include the paths of the objects. By default, this parameter is left empty, which indicates that no filtering is performed.

      For example, if an OSS object that you want to import is named testdata/csv/bill.csv, you can set this parameter to (testdata/csv/)(.*).

      For information about how to test a regular expression, see How do I test a regular expression?

      File Modification Time Filter

      The modification time that is used to filter OSS objects. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. Valid values:

      • All: To import all the OSS objects that meet specified conditions, select this option.

      • From Specific Time: To import OSS objects that are modified after a specified point in time, select this option.

      • Within Specific Period: To import OSS objects that are modified within a specified time range, select this option.

      Data Format

      The format of the OSS objects. Valid values:

      • CSV: You can specify the first line of an OSS object as field names or specify custom field names. All lines except the first line are parsed as the values of log fields.

      • Single-line JSON: An OSS object is read line by line. Each line is parsed as a JSON object. The fields in JSON objects are log fields.

      • Single-line Text: Each line in an OSS object is parsed as a log.

      • Multi-line Text: Multiple lines in an OSS object are parsed as a log. You can specify a regular expression to match the first line or the last line in a log.

      • ORC: An OSS object in the Optimized Row Columnar (ORC) format is automatically parsed into the format that is supported by Simple Log Service without manual configurations.

      • Parquet: An OSS object in the Parquet format is automatically parsed into the format that is supported by Simple Log Service without manual configurations.

      • Alibaba Cloud OSS Access Log: OSS data is parsed based on the format of access logs of Alibaba Cloud OSS. For more information, see Loggging.

      • Alibaba Cloud CDN Download Log: OSS data is parsed based on the format of download logs of Alibaba Cloud CDN (CDN). For more information, see Download logs.

      Compression Format

      The compression format of the OSS objects that you want to import. Simple Log Service decompresses the OSS objects based on the specified format to read data.

      Encoding Format

      The encoding format of the OSS objects that you want to import. Only UTF-8 and GBK are supported.

      New File Check Cycle

      If new objects are constantly generated in the specified directory of OSS objects, you can specify an interval for the New File Check Cycle parameter based on your business requirements. After you configure this parameter, a data import job continuously runs in the backend, and new objects are automatically detected and read at regular intervals. This ensures that data in an OSS object is not repeatedly written to Simple Log Service.

      If new objects are no longer generated in the specified directory of OSS objects, you can change the value of the New File Check Cycle parameter to Never Check. This value indicates that a data import job automatically exits after all the objects that meet specified conditions are read.

      Import Archive Files

      If the OSS objects that you want to import are of the Archive or Cold Archive storage class, Simple Log Service can read data from the objects only after the objects are restored. If you turn on this switch, Archive and Cold Archive objects are automatically restored.

      Note
      • Restoring Archive objects requires approximately 1 minute, which may cause the first preview to time out. If the first preview times out, try again later.

      • Restoring Cold Archive objects requires approximately 1 hour. If the preview times out, you can skip the preview or try again 1 hour later.

        By default, restored Cold Archive objects are valid within seven days. This way, the system has sufficient time to import Cold Archive objects to Simple Log Service.

      Log Time Settings

      Time Field

      The time field. You must enter the name of a time column in an OSS object. If you set the Data Format parameter to CSV, Single-line JSON, ORC, Parquet, Alibaba Cloud OSS Access Log, or Alibaba Cloud CDN Download Log, you must configure this parameter. This parameter specifies the time at which logs are imported to Simple Log Service.

      Regex to Extract Time

      The regular expression that is used to extract log time. If you set the Data Format parameter to Single-line Text or Multi-line Text, you must configure this parameter.

      For example, to extract log time from the 127.0.0.1 - - [10/Sep/2018:12:36:49 0800] "GET /index.html HTTP/1.1" log, you can set the Regex to Extract Time parameter to [0-9]{0,2}\/[0-9a-zA-Z]+\/[0-9:,]+.

      Note

      For other data types, if you want to extract part of the time field, you can specify a regular expression.

      Time Field Format

      The time format that is used to parse the value of the time field.

      • The time formats that follow the syntax defined in the Java SimpleDateFormat class are supported. Example: yyyy-MM-dd HH:mm:ss. For more information about the time format syntax, see Class SimpleDateFormat. For more information about common time formats, see Time formats.

      • You can use an epoch time format, such as epoch, epochMillis, epochMicro, or epochNano.

      Time Zone

      The time zone that corresponds to the time field. If the value of the Time Field Format parameter is set to an epoch time format, you do not need to configure this parameter.

      If daylight saving time (DST) and winter time are used in your time zone, select UTC. Otherwise, select GMT.

      Advanced configurations

      OSS Metadata Indexing

      If the number of OSS objects exceeds one million, we recommend that you turn on the switch to improve the efficiency of new file discovery. If you enable the metadata indexing feature, the system can find new objects in an OSS bucket within seconds. This way, data in the new objects can be written to Simple Log Service in near real time.

      Before you use the metadata indexing feature, you must enable the metadata management feature in OSS. For more information, see Data indexing.

      If you set the Data Format parameter to CSV or Multi-line Text, you must configure additional parameters. The following tables describe the parameters.

      • Unique parameters when you set Data Format to CSV

        Parameter

        Description

        Delimiter

        The delimiter for logs. The default value is a comma (,).

        Quote

        The quote that is used to enclose a CSV-formatted string.

        Escape Character

        The escape character for logs. The default value is a backslash (\).

        Maximum Lines

        The maximum number of lines allowed in a log if the original log has multiple lines. Default value: 1.

        First Line as Field Name

        If you turn on First Line as Field Name, the first line in a CSV file is extracted as field names. For example, the following figure shows the first line that is extracted to be the field name.首行

        Custom Fields

        If you turn off First Line as Field Name, you can specify custom field names based on your business requirements. Separate multiple field names with commas (,).

        Lines to Skip

        The number of lines that are skipped. For example, if you set this parameter to 1, the system skips the first line of a CSV file and starts collecting logs from the second line.

      • Unique parameters when you set Data Format to Multi-line Text

        Parameter

        Description

        Position to Match with Regex

        The usage of a regular expression.

        • If you select Regex to Match First Line Only, the regular expression that you specify is used to match the first line in a log. The unmatched lines are not collected as a part of the log until the specified maximum number of lines is reached.

        • If you select Regex to Match Last Line Only, the regular expression that you specify is used to match the last line in a log. The unmatched lines are not collected as a part of the next log until the specified maximum number of lines is reached.

        Regular Expression

        The regular expression. You can specify a regular expression based on the log content.

        For information about how to test a regular expression, see How do I test a regular expression?

        Max Lines

        The maximum number of lines allowed in a log.

    2. Click Preview to preview the result.

    3. Confirm the settings and click Next.

  5. Preview data, configure indexes, and then click Next.
    By default, full-text indexing is enabled for Log Service. You can also configure field indexes based on collected logs in manual mode or automatic mode. To configure field indexes in automatic mode, click Automatic Index Generation. This way, Log Service automatically creates field indexes. For more information, see Create indexes.
    Important If you want to query and analyze logs, you must enable full-text indexing or field indexing. If you enable both full-text indexing and field indexing, the system uses only field indexes.
  6. Click Log Query. On the query and analysis page, check whether OSS data is imported.

    Wait for approximately 1 minute. If the required OSS data is imported, the import is successful.

View a data import configuration

After you create a data import configuration, you can view the configuration details and related reports in the Simple Log Service console.

  1. In the Projects section, click the project to which the data import configuration belongs.

  2. In the left-side navigation pane, choose Log Storage > Logstores. Click the Logstore to which the data import configuration belongs, choose Data Import > Data Import, and then click the name of the data import configuration.

  3. On the Import Configuration Overview page, view the basic information about the data import configuration and the related reports.

    导入任务概览

What to do next

On the Import Configuration Overview page, you can perform the following operations on the data import configuration:

  • Modify the data import configuration

    To modify the data import configuration, click Modify Settings. For more information, see Create a data import configuration.

  • Delete the data import configuration

    To delete the data import configuration, click Delete Configuration.

    Warning

    After a data import configuration is deleted, it cannot be restored. Proceed with caution.

  • Stop the data import job

    To stop the data import job, click Stop.

Billing

You are not charged for the data import feature of Simple Log Service. You are charged for the traffic and requests that are generated when the OSS API is called to import data. For information about the pricing of the related billable items, see OSS pricing. The daily fee that is incurred when you import data from OSS is calculated by using the following formula:

image..png

Field

Description

N

The number of objects that are imported from OSS to Simple Log Service per day.

T

The total size of data that is imported from OSS to Simple Log Service per day. Unit: GB.

p_read

The traffic fee per GB of data.

  • If you import data from OSS to Simple Log Service in the same region, outbound traffic over the internal network is generated. This type of traffic is free of charge.

  • If you import data from OSS to Simple Log Service across regions, outbound traffic over the Internet is generated.

p_put

The request fee per 10,000 PUT requests.

Simple Log Service calls the ListObjects operation to query the objects in a bucket. You are charged for PUT requests. The charges are included in your OSS bills. Each call can return a maximum of 1,000 data entries. If you have 1 million new files to import, 1,000 calls are required.

p_get

The request fee per 10,000 GET requests.

M

The interval at which new objects are checked. Unit: minutes.

You can configure the New File Check Cycle parameter when you create a data import configuration.

FAQ

Issue

Cause

Solution

No data is displayed during preview.

The OSS bucket contains no objects, the objects contain no data, or no objects meet the filter conditions.

  • Check whether the OSS bucket contains objects that are not empty or check whether CSV files contain only the headers line. If no OSS objects that contain data exist, you cannot import the objects until the objects contain data.

  • Modify the File Path Prefix Filter, File Path Regex Filter, and File Modification Time Filter parameters.

Garbled characters exist.

The data format, compression format, or encoding format is not configured as expected.

Check the actual format of the OSS objects, and then modify the Data Format, Compression Format, or Encoding Format parameter.

To handle the existing garbled characters, create a Logstore and a data import configuration.

The log time displayed in Simple Log Service is different from the actual time of the imported data.

The time field is not specified in the data import configuration, or the specified time format or time zone is invalid.

Specify a time field or specify a valid time format or time zone. For more information, see Create a data import configuration.

After data is imported, the data cannot be queried or analyzed.

  • The data is not within the query time range.

  • No indexes are configured.

  • The indexes failed to take effect.

  • Check whether the time of the data that you want to query is within the query time range that you specify.

    If not, adjust the query time range and query the data again.

  • Check whether indexes are configured for the Logstore to which the data is imported.

    If not, configure indexes first. For more information, see Create indexes and Reindex logs for a Logstore.

  • If indexes are configured for the Logstore and an expected volume of imported data is displayed on the Data Processing Insight dashboard, the possible cause is that the indexes failed to take effect. In this case, reindex the data. For more information, see Reindex logs for a Logstore.

The number of imported data entries is less than expected.

Some OSS objects contain data in which a line is greater than 3 MB in size. In this case, the data is discarded during the import. For more information, see Limits on collection.

When you write data to an OSS object, make sure that the size of a line does not exceed 3 MB.

The number of OSS objects and the total volume of data are large, but the import speed does not meet your expectation. In most cases, the import speed can reach 80 MB/s.

The number of shards in the Logstore is excessively small. For more information, see Limits on performance.

If the number of shards in a Logstore is small, increase the number of shards to 10 or more and check the latency. For more information, see Manage shards.

You cannot select an OSS bucket when you create a data import configuration.

The AliyunLogImportOSSRole role is not assigned to Simple Log Service.

Complete authorization. For more information, see the "Prerequisites" section of this topic.

Some OSS objects failed to be imported to Simple Log Service.

The settings of the filter conditions are invalid or the size of a single object exceeds 5 GB. For more information, see Limits on collection.

  • Check whether the OSS objects that you want to import meet the filter conditions. If the objects cannot meet the filter conditions, modify the filter conditions.

  • Make sure that the size of each OSS object that you want to import is less than 5 GB.

    If the size of an object exceeds 5 GB, reduce the size of the object.

No Archive objects are imported to Simple Log Service.

Import Archive Files is turned off. For more information, see Limits on collection.

  • Method 1: Modify the data import configuration and turn on Import Archive Files.

  • Method 2: Create a data import configuration and turn on Restore Archived Files.

Multi-line text logs are incorrectly parsed.

The specified regular expression that is used to match the first line or the last line in a log is invalid.

Check whether the regular expression that is used to match the first line or the last line in a log is valid.

The latency to import new OSS objects is higher than expected.

The number of existing OSS objects that meet the conditions specified by File Path Prefix Filter exceeds the limit and OSS Metadata Indexing is turned off in the data import configuration.

If the number of existing OSS objects that meet the conditions specified by File Path Prefix Filter exceeds one million, turn on OSS Metadata Indexing in the data import configuration. Otherwise, the efficiency of new file discovery is low.

Error handling

Item

Description

File read failure

If an OSS object fails to be completely read because a network exception occurs or the object is damaged, the corresponding data import job automatically retries to read the object. If the object fails to be read after three retries, the object is skipped.

The retry interval is the same as the value of the New File Check Cycle parameter. If the New File Check Cycle parameter is set to Never Check, the retry interval is 5 minutes.

Compression format parsing error

If the compression format is invalid when an OSS object is decompressed, the corresponding data import job skips the object.

Data format parsing error

  • If data in the binary format (ORC or Parquet) fails to be parsed, the corresponding data import job skips the OSS object

  • If data in other formats fails to be parsed, the data import job stores the original text content in the content field of logs.

Logstore not exist

A data import job periodically retries. The data import job does not resume the import until the Logstore is recreated.

If the Logstore does not exist, the data import job does not skip any OSS objects. Therefore, after the Logstore is recreated, the data import job automatically imports data from the unprocessed objects in the OSS bucket to the Simple Log Service Logstore.

OSS bucket not exist

A data import job periodically retries. The data import job does not resume the import until the OSS bucket is recreated.

Permission error

If a permission error occurs when data is read from an OSS bucket or data is written to a Simple Log Service Logstore, the corresponding data import job periodically retries. The data import job does not resume the import until the error is fixed.

If a permission error occurs, the data import job does not skip any OSS objects. Therefore, after the error is fixed, the data import job automatically imports data from the unprocessed objects in the OSS bucket to the Simple Log Service Logstore.