This topic describes how to use resource functions to pull data from another Logstore to enrich log data during the data pull from a Logstore.

You can use the res_log_logstore_pull resource function to pull data from a Logstore by using the following two methods:
  • Pull Logstore data collected in a specified period of time.
  • Pull Logstore data in a continuous manner.
For more information, see res_log_logstore_pull.
Note The res_log_logstore_pull function pulls data from a Logstore, but does not enrich data. Therefore, we recommend that you use the e_table_map and e_search_table_map function together instead of using the res_log_logstore_pull function alone.

Example data

Two Logstores are used in the following examples. The source_logstore stores personal information. The target_logstore stores guest registration information of a hotel. The examples show how to use guest registration information to enrich personal information.
Note The pull_log interface is used to pull data. During the data pull, the index of the Logstore that stores guest registration information can be enabled or disabled.
Personal information stored in the source_logstore is as follows:
topic:xxx
city:xxx
cid:12345
name:maki


topic:xxx
city:xxx
cid:12346
name:vicky

topic:xxx
city:xxx
cid:12347
name:mary
The guest registration information stored in target_logstore is as follows:
time:1567038284
status:check in
cid:12345
name:maki
room_number:1111

time:1567038284
status:check in
cid:12346
name:vicky
room_number:2222

time:1567038500
status:check in
cid:12347
name:mary
room_number:3333

time:1567038500
status:leave
cid:12345
name:maki
room_number:1111
The basic syntax is as follows:
res_log_logstore_pull(
        endpoint,
        ak_id,
        ak_secret,
        project,
        logstore,
        fields,
        from_time=None,
        to_time=None,
        fetch_include_data=None,
        fetch_exclude_data=None,
        primary_keys=None,
        delete_data=None,
        refresh_interval_max=60,
        fetch_interval=2):

Pull all data collected in a specified duration

Note The duration specified in this example refers to a period of time during which log data is collected.
  • DSL syntax
    res_log_logstore_pull(..., ["cid","name","room_number"],from_time=1567038284,to_time=1567038500)
  • Pulled data
    # In this example, the cid, name, and room_number fields are specified in the syntax. All data related to these fields will be pulled in the specified duration.
    
    cid:12345
    name:maki
    room_number:1111
    
    cid:12346
    name:vicky
    room_number:2222
    
    cid:12347
    name:mary
    room_number:3333
    
    cid:12345
    name:maki
    room_number:1111

Set a whitelist and a blacklist to filter data

  • Set a whitelist.
    • DSL syntax
      # A whitelist is set to pull data whose room_number value is 1111.
      res_log_logstore_pull(..., ["cid","name","room_number","status"],from_time=1567038284,to_time=1567038500,fetch_include_data="room_number:1111")
    • Pulled data
      # The fetch_include_data field indicates the whitelist, which means to pull data whose room_number value is 1111.
      
      status: check in
      cid:12345
      name:maki
      room_number:1111
      
      status:leave
      cid:12345
      name:maki
      room_number:1111
  • Set a blacklist.
    • DSL syntax
      res_log_logstore_pull(..., ["cid","name","room_number","status"],from_time=1567038284,to_time=1567038500,fetch_exclude_data="room_number:1111")
    • Pulled data
      # The fetch_exclude_data field indicates the blacklist, which means to drop data whose room_number value is 1111.
      status:check in
      cid:12346
      name:vicky
      room_number:2222
      
      
      status:check in
      cid:12347
      name:mary
      room_number:3333
  • Set a blacklist and a whitelist at the same time.
    • DSL syntax
      res_log_logstore_pull(..., ["cid","name","room_number","status"],from_time=1567038284,to_time=1567038500,fetch_exclude_data="status:leave",fetch_include_data="status:check in")
    • Pulled data
      # If a blacklist and a whitelist are specified at the same time, the blacklist takes precedence.
      # The blacklist value is status:leave, which means to drop data whose status value is leave.
      # The whitelist value is status:check in, which means to pull data whose status value is check in.
      status:check in
      cid:12345
      name:maki
      room_number:1111
      
      
      status:check in
      cid:12346
      name:vicky
      room_number:2222
      
      
      status:check in
      cid:12347
      name:mary
      room_number:3333

Pull data from the target Logstore in a continuous manner

To pull data from the target Logstore in a continuous manner, set the to_time parameter to None. You can set the fetch_interval parameter to specify the data pull interval. You can also set the refresh_interval_max parameter to specify the maximum retry interval in the case of data pull errors.

DSL syntax
res_log_logstore_pull(..., ["cid","name","room_number","status"],from_time=1567038284,to_time=None,fetch_interval=15,refresh_interval_max=60)
# If an error occurs during continuous data pulls, the data transformation service will retry until the data pull succeeds.

Enable primary key maintenance to pull data from the target Logstore

  • Limits

    This feature can be used only when all data is stored in a single shard of the target Logstore.

  • Background information

    In this example, the preceding source_logstore and target_logstore are used. Data written to a Logstore cannot be deleted even though the data becomes invalid. To prevent the transformation rule from checking invalid data, you can enable primary key maintenance.

  • Scenario description

    You want to pull data about guests who have checked in but have not checked out from the target_logstore. Data whose status value is leave is not pulled from the target Logstore.

  • DSL syntax
    res_log_logstore_pull(..., ["cid","name","room_number","status","time"],from_time=1567038284,to_time=None,primary_keys="cid",delete_data="status:leave")
  • Pulled data
    # The latest status of the guest named Maki is leave. Therefore, data about Maki is not pulled.
    time:1567038284
    status:check in
    cid:12346
    name:vicky
    room_number:2222
    
    time:1567038500
    status:check in
    cid:12347
    name:mary
    room_number:3333
Note The primary_keys parameter can only be set to a single string. Fields with unique values in Logstore data must be specified, for example, the cid field in the preceding syntax. If the primary_keys is specified, the delete_data parameter cannot be None.

Use functions to enrich data

  • Use the e_table_map function to enrich data.
    • DSL syntax
      # Use the e_table_map function to enrich data.
      e_table_map(res_log_logstore_pull(...,
              fields=["cid","room_number"],
              from_time="begin",
              ), "cid","room_number")
    • Enriched data
      In this example, the e_table_map function enriches the room_number field in personal information by using the cid field. For more information, see e_table_map.
      topic:xxx
      city:xxx
      cid:12345
      name:maki
      room_number:1111
      
      topic:xxx
      city:xxx
      cid:12346
      name:vicky
      room_number:2222
      
      topic:xxx
      city:xxx
      cid:12347
      name:mary
      room_number:3333
  • Use the e_search_table_map function to enrich data.
    • DSL syntax
      # Use the e_search_table_map function to enrich data.
      e_search_table_map(res_log_logstore_pull(...,
              fields=["cid","room_number"],
              from_time="begin",
              ), "cid=12346","room_number")
    • Enriched data
      In this example, the e_search_table_map function searches guest registration information for data whose cid value is 12346 and adds the room_number field of found data to personal information. For more information, see e_search_table_map.
      topic:xxx
      city:xxx
      cid:12345
      name:maki
      Room_number: 2222
      
      topic:xxx
      city:xxx
      cid:12346
      name:vicky
      room_number:2222
      
      topic:xxx
      city:xxx
      cid:12347
      name:mary
      room_number:2222