This topic describes how to use search-based mapping functions to enrich complex data.

If field-based mapping functions cannot meet your data enrichment requirements, you can use search-based mapping functions instead. Search-based mapping functions differ from field-based mapping functions in matching rules.
  • Field-based mapping functions
    Note Field-based mapping functions consist of two functions: e_table_map and e_dict_map. The e_dict_map function receives dictionary type data. The e_table_map function receives table type data that is obtained by using resource functions. For more information, see Resource functions.
    Field-based mapping functions use the full-text matching method. For example, you want to convert the following status codes in NGNIX logs to text.
    Status code Text
    200 Success
    300 Redirect
    400 Request error
    500 Server error
    You can call the e_dict_map function, convert the HTTP status code in the status field into text, and enter the text in the status_desc field.
    e_dict_map({"400": "Request error", "500": "Server error", "300": "Redirect", "200": "Success"}, "status", "status_desc")

    NGNIX logs also use 401 and 404 HTTP status codes. If the value of the status field is 401 or 404, you must replace "400" with "Request error" in the e_dict_map function. Otherwise, data matching fails. For more information, visit HTTP Status Codes.

  • Search-based mapping functions
    Note Search-based mapping functions consist of two functions: e_search_table_map and e_search_dict_map. The e_search_dict_map function receives dictionary type data. The e_search_table_map function receives table type data that is obtained by using resource functions. For more information, see Resource functions.
    Search-based mapping functions use the specific-value matching method. For example, you want to convert the following status codes to text.
    Status code Text
    1XX and 2XX Success
    3XX Redirect
    4XX Request error
    5XX Server error
    You can use search-based mapping functions. In this example, the dictionary key is a search string. For more information, see Query string syntax.
    Status code Text
    ≤ 299 Success
    [300, 399] Redirect
    [400, 499] Request error
    [500, 599] Server error
    The syntax is as follows:
    e_search_dict_map({"status: [400, 499]": "Request error", "status: [500, 599]": "Server error", "status: [300, 399]": "Redirect", "status<=200": "Success"}, "status", "status_desc")

Enrich complex data by mapping a search string to a key in a dictionary

The following example uses network request log entries to demonstrate how to map complex data.
  • Raw log entries
    "Log entry 1"
    http_host:  m1.abcd.com
    http_status:  200
    request_method:  GET
    body_bytes_sent: 740
    
    "Log entry 2"
    http_host:  m2.abcd.com
    http_status:  200
    request_method:  POST
    body_bytes_sent: 1123
    
    "Log entry 3"
    http_host:  m3.abcd.com
    http_status:  404
    request_method:  GET
    body_bytes_sent: 711
    
    "Log entry 4"
    http_host:  m4.abcd.com
    http_status:  504
    request_method:  GET
    body_bytes_sent: 1822
  • Transformation requirements
    Different type values are attached to log events according to the values of the http_status and body_bytes_sent fields.
    • The type parameter is set to Normal for log events whose http_status value is 2XX and body_bytes_sent value is less than 1000.
    • The type parameter is set to Overlength warning for log events whose http_status value is 2XX and body_bytes_sent value is greater than 1000.
    • The type parameter is set to Redirect for log events whose http_status value is 3XX.
    • The type parameter is set to Error for log events whose http_status value is 4XX.
    • The type parameter is set to Others for other log events.
  • DSL orchestration
    e_search_dict_map({'http_status~="2\d+" and body_bytes_sent < 1000': "Normal", 'http_status~="2\d+" and body_bytes_sent >= 1000': "Overlength warning", 'http_status~="3\d+"': "Redirect", 'http_status~="4\d+"': "Error",  "*": "Others"}, "http_status", "type")
  • Transformed log entries
    "Log entry 1"
    type: Normal
    http_host:  m1.abcd.com
    http_status:  200
    request_method:  GET
    body_bytes_sent: 740
    
    "Log entry 2"
    type: Overlength warning
    http_host:  m2.abcd.com
    http_status:  200
    request_method:  POST
    body_bytes_sent: 1123
    
    "Log entry 3"
    type: Error
    http_host:  m3.abcd.com
    http_status:  404
    request_method:  GET
    body_bytes_sent: 711
    
    "Log entry 4"
    type: Others
    http_host:  m4.abcd.com
    http_status:  504
    request_method:  GET
    body_bytes_sent: 1822
Note
  • For the syntax of the e_search_dict_map function, see e_search_dict_map. The function maps a search string to a key in a dictionary. Regular expression match, exact match, and fuzzy match are supported.
  • In dictionary-based enrichment, you can create a dictionary by using braces ({}) or based on allocated resources, Object Storage Service (OSS) resources, and tables. For more information, see Build dictionaries.

Enrich data by mapping a search string to a column in a table

  • Raw log entries
    "Log entry 1"
    http_host:  m1.abcd.com
    http_status:  200
    request_method:  GET
    body_bytes_sent: 740
    
    "Log entry 2"
    http_host:  m2.abcd.com
    http_status:  200
    request_method:  POST
    body_bytes_sent: 1123
    
    "Log entry 3"
    http_host:  m3.abcd.com
    http_status:  404
    request_method:  GET
    body_bytes_sent: 711
    
    "Log entry 4"
    http_host:  m4.abcd.com
    http_status:  504
    request_method:  GET
    body_bytes_sent: 1822
  • Transformation requirements
    The http_status and body_bytes_sent fields are mapped into other fields such as type, warning_level, and warning_email. Transformation rule examples are stored in ApsaraDB RDS for MySQL. The following table shows an example.
    content type warning_level warning_email
    http_status~="2\d+" and body_bytes_sent < 1000 Normal INFO normal@etl.com
    http_status~="2\d+" and body_bytes_sent >= 1000 Overlength warning WARNING over-long@etl.com
    http_status~="3\d+" Redirect WARNING redirect@etl.com
    http_status~="4\d+" Error ERROR error@etl.com
  • DSL orchestration
    e_search_table_map(res_rds_mysql("... MySQL connection parameters..."),"content",["type", "warning_level", "warning_email"])
    Note
    • In this example, the e_search_table_map function syntax is used. For more information, see e_search_table_map.
    • MySQL connection parameters are included in the parentheses of the res_rds_mysql() parameter. The function pulls data from the specified MySQL table. For more information about the syntax, see res_rds_mysql.
    • The content parameter specifies the fields in the MySQL table. The values of these fields are used to match the values in the raw log entry. Regular expression match, exact match, and fuzzy match are supported. For information about matching rules, see e_search.
  • Transformed log entries
    Different type, warning_level, and warning_email values are attached to log events according to the values of the http_status and body_bytes_sent fields.
    "Log entry 1"
    type: Normal
    warning_level: INFO
    warning_email: normal@etl.com
    http_host:  m1.abcd.com
    http_status:  200
    request_method:  GET
    body_bytes_sent: 740
    
    "Log entry 2"
    type: Overlength warning
    warning_level: WARNING
    warning_email: over-long@etl.com
    http_host:  m2.abcd.com
    http_status:  200
    request_method:  POST
    body_bytes_sent: 1123
    
    "Log entry 3"
    type: Error
    warning_level: ERROR
    warning_email: error@etl.com
    http_host:  m3.abcd.com
    http_status:  404
    request_method:  GET
    body_bytes_sent: 711
    
    "Log entry 4"
    type: Others
    warning_level: INFO
    warning_email: others@etl.com
    http_host:  m4.abcd.com
    http_status:  504
    request_method:  GET
    body_bytes_sent: 1822
    • By default, the preceding transformation rule returns a row of data immediately after matching the data in the table. You can specify multi_match=True in the e_search_table_map function to enable multiple-row matching. You can also specify multi_join="," in the function to join multiple matched values together by using commas (,).
      e_search_table_map(res_rds_mysql("... MySQL connection parameters..."),"content",["type", "warning_level", "warning_email"], multi_match=True,multi_join=",")
    • By default, the preceding transformation rule uses the column name in the table as the name of the added field. The name of the added field can be modified. For example, to change the name of the warning_email field to email, include both names into the same tuple, as shown in the following example:
      e_search_table_map(res_rds_mysql("... MySQL connection parameters..."),"content",["type", "warning_level", ("warning_email", "email")],multi_match=True,multi_join=",")
Note
  • For the syntax of the e_search_table_map function, see e_search_table_map. The function maps a search string to a column in a table. Regular expression match, exact match, and fuzzy match are supported.
  • In table-based enrichment, you can create a table by using the tab_parse_csv function, local resources, or OSS resources in addition to ApsaraDB RDS for MySQL. For more information, see Build tables.