All Products
Search
Document Center

Simple Log Service:Import data from OSS to Simple Log Service

Last Updated:Apr 15, 2024

You can import log data from Object Storage Service (OSS) buckets to Simple Log Service and perform operations on the data in Simple Log Service. For example, you can query, analyze, and transform the data. You can import only OSS objects that do not exceed 5 GB in size to Simple Log Service. If you want to import a compressed object, the size of the compressed object cannot exceed 5 GB.

Prerequisites

  • Log files are uploaded to an OSS bucket. For more information, see Upload objects.

  • A project and a Logstore are created. For more information, see Create a project and Create a Logstore.

  • Simple Log Service is authorized to assume the AliyunLogImportOSSRole role to access your OSS resources. You can complete authorization on the Cloud Resource Access Authorization page.

  • The Resource Access Management (RAM) user that you want to use is granted the oss:ListBuckets permission to access OSS buckets. For more information, see Attach a custom policy to a RAM user.

    If you use a RAM user, you must grant the PassRole permission to the RAM user. The following example shows a policy that you can use to grant the permission. For more information, see Create custom policies and Grant permissions to a RAM user.

    {
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "ram:PassRole",
          "Resource": "acs:ram:*:*:role/aliyunlogimportossrole"
        },
        {
          "Effect": "Allow",
          "Action": "oss:GetBucketWebsite",
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": "oss:ListBuckets",
          "Resource": "*"
        }
      ],
      "Version": "1"
    }    

Create a data import configuration

Important

If an OSS object is imported to Simple Log Service and new data is appended to the OSS object, all data of the OSS object is re-imported to Simple Log Service when a data import job for the OSS object is run.

  1. Log on to the Simple Log Service console.

  2. In the Import Data section, click the Data Import tab. Then, click OSS - Data Import.

  3. Select the project and Logstore. Then, click Next.

  4. In the Import Configuration step, create a data import configuration.

    1. In the Import Configuration step, configure the following parameters.

      Parameter

      Description

      Job Name

      The name of the job. The name must be globally unique.

      Display Name

      The display name of the job.

      OSS Region

      The region where the OSS bucket resides. The OSS bucket stores the OSS objects that you want to import to Simple Log Service.

      If the OSS bucket and the Simple Log Service project reside in the same region, no Internet traffic is generated, and data is transferred at a high speed.

      Bucket

      The OSS bucket.

      File Path Prefix Filter

      The directory of the OSS objects. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. For example, if the OSS objects that you want to import are stored in the csv/ directory, you can set this parameter to csv/.

      If you leave this parameter empty, the system traverses the entire OSS bucket to find the OSS objects.

      Note

      We recommend that you configure this parameter. The larger the number of OSS objects in an OSS bucket is, the lower the data import efficiency becomes when the entire bucket is traversed.

      File Path Regex Filter

      The regular expression that you want to use to filter OSS objects by directory. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. Only the objects whose names match the regular expression are imported. The names include the paths of the objects. By default, this parameter is empty, which indicates that no filtering is performed.

      For example, if an OSS object that you want to import is named testdata/csv/bill.csv, you can set this parameter to (testdata/csv/)(.*).

      For more information about how to debug a regular expression, see How do I debug a regular expression?

      File Modification Time Filter

      The modification time based on which you want to filter OSS objects. If you configure this parameter, the system can find the OSS objects that you want to import in a more efficient manner. Valid values:

      • All: To import all the OSS objects that meet specified conditions, select this option.

      • From Specific Time: To import OSS objects that are modified after a point in time, select this option.

      • Specific Time Range: To import OSS objects that are modified within a time range, select this option.

      Data Format

      The format of the OSS objects. Valid values:

      • CSV: You can specify the first line of an OSS object as field names or specify custom field names. All lines except the first line are parsed as the values of log fields.

      • Single-line JSON: An OSS object is read line by line. Each line is parsed as a JSON object. The fields in JSON objects are log fields.

      • Single-line Text Log: Each line in an OSS object is parsed as a log.

      • Multi-line Text Logs: Multiple lines in an OSS object are parsed as a log. You can specify a regular expression to match the first line or the last line for a log.

      • ORC: An OSS object of the Optimized Row Columnar (ORC) format is automatically parsed into the format that is supported by Simple Log Service. You do not need to configure further settings.

      • Parquet: An OSS object of the Parquet format is automatically parsed into the format that is supported by Simple Log Service. You do not need to configure further settings.

      • Alibaba Cloud OSS Access Log: An OSS object is parsed as an access log of Alibaba Cloud OSS. For more information, see Loggging.

      • Alibaba Cloud CDN Download Log: An OSS object is parsed as a download log of Alibaba Cloud CDN (CDN). For more information, see Download offline logs.

      Compression Format

      The compression format of the OSS objects that you want to import. Simple Log Service decompresses the OSS objects based on the specified format to read data.

      Encoding Format

      The encoding format of the OSS objects that you want to import. UTF-8 and GBK are supported.

      New File Check Cycle

      If new objects are constantly generated in the specified directory of OSS objects, you can specify an interval in the New File Check Cycle parameter based on your business requirements. After you configure this parameter, a data import job is continuously running in the background, and new objects are automatically detected and read at regular intervals. The system ensures that data in an OSS object is not repeatedly written to Simple Log Service.

      If new objects are no longer generated in the specified directory of OSS objects, you can change the value of the New File Check Cycle parameter to Never Check. This value indicates that a data import job automatically exits after all the objects that meet specified conditions are read.

      Import Archive Files

      If the OSS objects that you want to import are of the Archive or Cold Archive storage class, Simple Log Service can read data from the objects only after the objects are restored. If you turn on this switch, Archive and Cold Archive objects are automatically restored.

      Note
      • It takes approximately 1 minute to restore Archive objects, which may cause the first preview to time out. If the first preview times out, you must wait for a period of time and try again.

      • It takes approximately 1 hour to restore Cold Archive objects. If the preview times out, you can skip the preview or wait for an hour and try again.

        By default, restored Cold Archive objects remain valid for seven days. This allows sufficient time for the system to import Cold Archive objects to Simple Log Service.

      Log Time Configuration

      Time Field

      The time field. You must enter the name of a time column in an OSS object. If you select CSV, Single-line JSON, ORC, Parquet, Alibaba Cloud OSS Access Log, or Alibaba Cloud CDN Download Log for the Data Format parameter, you must configure this parameter. This parameter specifies the log time.

      Regular Expression to Extract Time

      The regular expression that you want to use to extract log time. If you select Single-line Text Log or Multi-line Text Logs for the Data Format parameter, you must configure this parameter.

      For example, if a sample log is 127.0.0.1 - - [10/Sep/2018:12:36:49 0800] "GET /index.html HTTP/1.1", you can set the Regular Expression to Extract Time parameter to [0-9]{0,2}\/[0-9a-zA-Z]+\/[0-9:,]+.

      Note

      For other data types, if you want to extract part of the time field, you can specify a regular expression.

      Time Field Format

      The time format that you want to use to parse the value of the time field.

      • You can specify a time format that is supported by the Java SimpleDateFormat class. Example: yyyy-MM-dd HH:mm:ss. For more information about the time format syntax, see Class SimpleDateFormat. For more information about common time formats, see Time formats.

      • You can specify an epoch time format, which can be epoch, epochMillis, epochMicro, or epochNano.

      Time Zone

      The time zone for the value of the time field. If the value of the Time Field Format parameter is an epoch time format, you do not need to configure this parameter.

      If you want to use daylight saving time when you parse logs, you can select a time zone in UTC. Otherwise, select a time zone in GMT.

      Advanced Settings

      OSS Metadata Indexing

      If the number of OSS objects exceeds one million, we recommend that you turn on the switch. Otherwise, the system requires a long period of time to find new objects. If you turn on this switch, the system can find new objects in an OSS bucket within seconds. This way, data in the new objects can be written to Simple Log Service in near real time.

      Before you turn on this switch, you must enable the metadata management feature in OSS. For more information, see Data indexing.

      If you select CSV or Multi-line Text Logs for the Data Format parameter, you must configure additional parameters. The following tables describe the parameters.

      • Additional parameters when you set the Data Format parameter to CSV

        Parameter

        Description

        Delimiter

        The delimiter for logs. The default value is a comma (,).

        Quote

        The quote that is used to enclose a CSV-formatted string.

        Escape Character

        The escape character for logs. The default value is a backslash (\).

        Maximum Lines

        The maximum number of lines allowed for a log if the original log has multiple lines. Default value: 1.

        First Line as Field Name

        If you turn on First Line as Field Name, the first line in a CSV file is used to extract field names. For example, the first line in the CSV file that is shown in the following figure is used to extract field names.首行

        Custom Fields

        If you turn off First Line as Field Name, you can specify custom field names based on your business requirements. Separate multiple field names with commas (,).

        Lines to Skip

        The number of lines that are skipped. For example, if you set this parameter to 1, the first line of a CSV file is skipped, and log collection starts from the second line.

      • Additional parameters when you set the Data Format parameter to Multi-line Text Logs

        Parameter

        Description

        Position to Match Regular Expression

        The usage of a regular expression.

        • Regular Expression to Match First Line: If you select this option, the regular expression that you specify is used to match the first line for a log. The unmatched lines are collected as a part of the log until the maximum number of lines that you specify is reached.

        • Regular Expression to Match Last Line: If you select this option, the regular expression that you specify is used to match the last line for a log. The unmatched lines are collected as a part of the next log until the maximum number of lines that you specify is reached.

        Regular Expression

        The regular expression. You can specify a regular expression based on log content.

        For more information about how to debug a regular expression, see How do I debug a regular expression?

        Maximum Lines

        The maximum number of lines allowed for a log.

    2. Click Preview to preview the import result.

    3. After you confirm the result, click Next.

  5. Preview data, configure indexes, and then click Next.

    By default, full-text indexing is enabled for Log Service. You can also configure field indexes based on collected logs in manual mode or automatic mode. To configure field indexes in automatic mode, click Automatic Index Generation. This way, Log Service automatically creates field indexes. For more information, see Create indexes.

    Important

    If you want to query and analyze logs, you must enable full-text indexing or field indexing. If you enable both full-text indexing and field indexing, the system uses only field indexes.

  6. Click Query Log. On the query and analysis page that appears, check whether OSS data is imported.

    Wait for approximately 1 minute. If the required OSS data exists, the import is successful.

View the data import configuration

After you create the data import configuration, you can view the configuration details and related statistical reports in the Simple Log Service console.

  1. In the Projects section, click the project to which the data import configuration belongs.

  2. On the Log Storage > Logstores tab, click the Logstore to which the data import configuration belongs, choose Data Collection > Data Import, and then click the name of the data import configuration.

  3. On the Import Configuration Overview page, view the basic information and statistical reports of the data import configuration.

    导入任务概览

What to do next

On the Import Configuration Overview page, you can perform the following operations on the data import configuration:

  • Modify the data import configuration

    To modify the data import configuration, click Edit Configurations. For more information, see Create a data import configuration.

  • Delete the data import configuration

    To delete the data import configuration, click Delete Configuration.

    Warning

    After the data import configuration is deleted, it cannot be restored.

  • Stop a data import job

    To stop a data import job, click Stop.

Billing

You are not charged for the data import feature of Simple Log Service. However, the feature calls OSS API. You are charged for the OSS traffic and requests that are generated. For more information about the pricing of related billable items, see Pricing of OSS. The daily OSS fee that is generated when you import data from OSS is calculated by using the following formula:

image..png

Field

Description

N

The number of objects that are imported from OSS to Simple Log Service per day.

T

The total size of data that is imported from OSS to Simple Log Service per day. Unit: GB.

p_read

The traffic fee per GB of data.

  • If you import data from OSS to Simple Log Service in the same region, outbound traffic over the internal network is generated. You are not charged for the outbound traffic.

  • If you import data from OSS to Simple Log Service across regions, outbound traffic over the Internet is generated.

p_put

The request fee per 10,000 PUT requests.

Simple Log Service calls the ListObjects operation to query the objects in a bucket. You are charged for PUT requests. The fees are included in your OSS bills. Each call can return up to 1,000 data entries. If you have 1 million new objects to import, 1,000 calls are required.

p_get

The request fee per 10,000 GET requests.

M

The interval at which new objects are detected. Unit: minutes.

You can configure the New File Check Cycle parameter when you create a data import configuration.

FAQ

Problem description

Cause

Solution

No data is displayed during preview.

The OSS bucket contains no objects, the objects contain no data, or no objects meet the filter conditions.

  • Check whether the OSS bucket contains objects that are not empty or check whether CSV files contain only the headers line. If no OSS objects contain data, you cannot import the objects until the objects contain data.

  • Modify the File Path Prefix Filter, File Path Regex Filter, and File Modification Time Filter parameters.

Garbled characters exist.

The data format, compression format, or encoding format is not configured as expected.

Check the actual format of the OSS objects and modify the Data Format, Compression Format, or Encoding Format parameter.

To handle the existing garbled characters, create a Logstore and a data import configuration.

The log time displayed in Simple Log Service is different from the actual log time.

No time field is specified in the data import configuration, or the specified time format or time zone is invalid.

Specify a time field or specify a valid time format and time zone. For more information, see Create a data import configuration.

After data is imported, the data cannot be queried or analyzed.

  • The data is not within the query time range.

  • No indexes are configured.

  • Configured indexes do not take effect.

  • Check whether the time of the log that you want to query is within the query time range.

    If no, adjust the query time range and query the data again.

  • Check whether indexes are configured for the Logstore of the data.

    If no, configure indexes. For more information, see Create indexes and Reindex logs for a Logstore.

  • If indexes are configured for the Logstore and the volume of imported data is displayed as expected on the Data Processing Insight dashboard, the possible cause is that the indexes do not take effect. In this case, reconfigure the indexes. For more information, see Reindex logs for a Logstore.

The number of imported data entries is less than expected.

Some OSS objects contain lines that are greater than 3 MB in size. In this case, the data is discarded during the import. For more information, see Limits on collection.

When you write data to an OSS object, make sure that the size of a line does not exceed 3 MB.

The number of OSS objects and the total volume of data are large, but the import speed does not meet expectations. In most cases, the import speed can reach 80 MB/s.

The number of shards in the Logstore is excessively small. For more information, see Limits on performance.

If the number of shards in a Logstore is small, increase the number of shards to 10 or more and check the latency. For more information, see Manage shards.

OSS buckets cannot be selected during the creation of a data import configuration.

The AliyunLogImportOSSRole role is not assigned to Simple Log Service.

Complete authorization based on the descriptions in the "Prerequisites" section of this topic.

Some OSS objects failed to be imported to Simple Log Service.

The settings of the filter conditions are invalid or the size of a single object exceeds 5 GB. For more information, see Limits on collection.

  • Check whether the OSS objects that you want to import meet the filter conditions. If no, modify the filter conditions.

  • Check whether the size of each OSS object that you want to import is less than 5 GB.

    If no, reduce the size of the object.

Archive objects are not imported to Simple Log Service.

Import Archive Files is turned off. For more information, see Limits on collection.

  • Method 1: Modify the data import configuration and turn on Import Archive Files.

  • Method 2: Create a different data import configuration and turn on Import Archive Files.

An error occurred in parsing an OSS object that is in the Multi-line Text Logs format.

The regular expression that is specified to match the first line or the last line in a log is invalid.

Check whether the regular expression that is used to match the first line or the last line in a log is valid.

The latency to import new OSS objects is higher than expected.

The number of existing OSS objects that meet the conditions specified by File Path Prefix Filter exceeds the upper limit and OSS Metadata Indexing is turned off in the data import configuration.

If the number of existing OSS objects that meet the conditions specified by File Path Prefix Filter exceeds one million, turn on OSS Metadata Indexing in the data import configuration. Otherwise, the efficiency of new file discovery is low.

Error handling

Error

Description

File read failure

If an OSS object fails to be completely read because a network exception occurs or the object is damaged, the data import job automatically retries to read the object. If the object fails to be read after three retries, the object is skipped.

The retry interval is the same as the value of the New File Check Cycle parameter. If the New File Check Cycle parameter is set to Never Check, the retry interval is 5 minutes.

Compression format parsing error

If an OSS object is in an invalid format, the data import job skips the object during decompression.

Data format parsing error

  • If an OSS object that is in a binary format such as ORC or Parquet fails to be parsed, the data import job skips the OSS object.

  • If data in other formats fails to be parsed, the data import job stores the original text content in the content field of logs.

OSS bucket absence

A data import job periodically retries. The data import job does not resume the import until the OSS bucket is recreated.

Permission error

If a permission error occurs when data is read from an OSS bucket or data is written to a Simple Log Service Logstore, the data import job periodically retries. The data import job does not resume the import until the error is fixed.

If a permission error occurs, the data import job does not skip any OSS objects. Therefore, after the error is fixed, the data import job automatically imports data from the unprocessed objects in the OSS bucket to the Simple Log Service Logstore.