Scenario

DataWorks is the data relay service of Alibaba Cloud. DataWorks can ship log files that Log Service collects to MaxCompute for storing and analyzing these log files. MaxCompute provides offline computing. If you require online analytical processing (OLAP), you can use DataWorks to export the log files that have been shipped to MaxCompute and the computing result to Log Service. Log Service then performs a real-time search and analysis of the exported data.

Implementation

LogHub Writer obtains the data that is generated by Reader from the DataWorks framework, and transforms the data types that are supported by DataWorks to the string type. When the data volume reaches the specified batchSize, LogHub Writer uses the Log Service Java SDK to transfer all the data to Log Service at a time. By default, LogHub Writer transfers 1,024 entries at a time. The value of batchSize is up to 4096.

Prerequisites

  1. You have activated Log Service and created the project and Logstore.
  2. You have activated MaxCompute and created tables.
  3. You have activated DataWorks.

Procedure

  1. Log on to the DataWorks console and create a LogHub data source.

    For more information about how to create a data source, see Step 1 Create a data source.

  2. Create a synchronization task in script mode.
    1. Click Sync Tasks in the left-side navigation pane, and click Script Mode to configure the synchronization task.
      Figure 1. Script Mode


    2. Specify the parameters in the import template.
      Figure 2. Import template


      Parameter Description
      Source type Select ODPS as the type of your data source.
      data sources The name of your data source. You can also click New Source to create a data source.
      Type of objective Select LogHub as the type of the shipping destination.
      data sources The name of the shipping destination. Select the LogHub data destination created in step 1, or click New Source to create a data destination.

      Then, click confirmation to configure the synchronization task.

    3. Enter your configuration.
      The example is as follows:
      {
        "type": "job",
        "version": "1.0",
        "configuration": {
          "setting": {
            "errorLimit": {
              "record": "0"
            },
            "speed": {
              "mbps": "1",
              "concurrent": 1,
              "dmu": 1,
              "throttle": false
            }
          },
          "reader": {
            "plugin": "odps",
            "parameter": {
                     "accessKey":"*****",
                 "accessId":"*****",
                 "column":["*"],
                 "isCompress":"false",
                 "partition":["pt=20161226"],
                 "project":"aliyun_account",
                 "table":"ak_biz_log_detail"
            }
          },
          "writer": {
            "plugin": "loghub",
            "parameter": {
              "endpoint": "",
              "accessId": "",
              "accessKey": "",
              "project": "",
              "logstore": "",
              "batchSize": "1024",
              "topic": "",
              "time" :"time_str",
              "timeFormat":"%Y_%m_%d %H:%i:%S",
              "column": [
                "col0",
                "col1",
                "col2",
                "col3",
                "col4",
                "col5"
              ],
              "datasource": "sls"
            }
          }
        }
      }
      Parameter Required Description
      endpoint Yes The endpoint of Log Service. For more information, see Service endpoint.
      accessKeyId Yes The AccessKeyId of your Alibaba Cloud account or RAM user.
      accessKeySecret Yes The AccessKeyId of your Alibaba Cloud account or RAM user.
      project Yes The name of the destination project in Log Service.
      logstore Yes The name of the destination Logstore in Log Service.
      topic No The field in MaxCompute that you specify as the topic field in Log Service. It is an empty string by default.
      batchSize No The number of entries that LogHub Writer transfers at a time. It is 1024 by default.
      column Yes The column name in each entry.
      Note The columns that are not specified in the column parameter are dirty data.
      time No The name of the time field.
      Note If the time field is not specified, the system time is used as the log time by default.
      timeFormat If the time field is specified, timeFormat is required. You can set timeFormat to the following format:
      • bigint: unix timestamp.
      • timestamp: time retrieved from the string, such as %Y_%m_%d %H:%M:%S.

      If the time field is 1529382552 in the bigint type, the timeFormat field is bigint. If the time field is 2018_06_19 12:30:25 in the string type, the timeFormat field is %Y_%m_%d %H:%M:%S.

      datasource Yes The data type that is defined in DataWorks.
  3. Save and run this task.
    Click Save and specify the path to save this synchronization task. You can also run this task directly, or submit it to the the scheduling system.
    Figure 3. Run the synchronization task


    • Run the task.

      Click Run to directly start synchronizing all the data.

    • Schedule the task.
      Click Submit to submit the task to the scheduling system. Then, the scheduling system automatically runs this task according to your configuration.
      Note We recommend that you set the scheduling cycle the same as the partition generation cycle. For example, if the partition is generated based on hourly collected data, the scheduling cycle is one hour.

      For more information about scheduling the task, see Ship data to MaxCompute via DataWorks.

Data types

After you import MaxCompute data to Log Service using DataWorks, all data types are converted to the string type, as shown in the following table.

MaxCompute data type Data type imported to LogHub
Long String
Double String
String String
Data String
Boolean String
Bytes String