This topic describes how to use resource functions to obtain data from a table in an ApsaraDB RDS for MySQL instance to enrich log data in Log Service.

Assume that the ApsaraDB RDS for MySQL instance uses the userinfo table to store user information. The following table lists data in the userinfo table.
id province city uid
1 jiangsu nanjing 01234
2 henan zhengzhou 01235
3 heilongjiang haerbin 01236
4 jiangsu yantai 01237

Obtain full data and regularly update it

Assume that full data is obtained from the userinfo table and regularly updated for data enrichment. If you want to configure a data transformation task to automatically obtain data at regular intervals, set the transformation rule as follows:
res_rds_mysql(..., refresh_interval=300)
The preceding code returns a table schema and automatically obtains data from the MySQL table at an interval of 5 minutes to update the data in the returned table.

Obtain partial data

If you only want to enrich data based on specific fields in the MySQL table, we recommend that you use the table, sql, and fields parameters to filter rows or columns. This reduces the size of the dimension table and increases data enrichment efficiency.

The following two methods filter columns city and uid. They have the same effect.
res_rds_mysql(..., sql="select city, uid from userinfo")      # Filter columns.
res_rds_mysql(..., table="userinfo", fields=["city", "uid"])    # Filter columns.
The following two methods uses the sql parameter to filter rows and columns to select data records whose uid is greater than 1234.
res_rds_mysql(..., sql="select * from userinfo where uid > 1234")   # Filter rows.
res_rds_mysql(..., sql="select city, uid from userinfo where uid > 1234")   # Filter rows and columns.

Obtain and filter data

If you do not obtain required data by using the table, sql, and fields parameters to filter rows or columns, you can use the fetch_exclude_data and / parameters together or use the fetch_include_data parameter separately to further filter rows. Examples:
res_rds_mysql(..., fetch_include_data="uid==0123*")    # Retain all data records whose uid starts with 0123.
res_rds_mysql(..., fetch_exclude_data="uid < 1234")    # Delete all data records whose uid is smaller than 1234.
res_rds_mysql(..., fetch_include_data="city:n", fetch_exclude_data="uid < 1234")
The comments above explain the difference between the fetch_exclude_data and fetch_include_data parameters. Note that both parameters are query strings.
If you set both fetch_exclude_data and fetch_include_data, the system preferentially executes fetch_exclude_data to delete non-compliant data records. Then, the system executes fetch_include_data to add compliant data records. Both fetch_exclude_data and fetch_include_data are executed based on Query string syntax. They support multiple matching modes, including regular expression matching and fuzzy matching. The method in the third line pulls all the data records in which uid is greater than or equal to 1234 and the value of city includes letter n to the dimension table.
Note The fetch_exclude_data and fetch_include_data parameters filter data only after data records are pulled to a local directory. The efficiency is lower than that of filtering by parameters table, sql, and fields.

Adjust the schema of the returned table

By default, the returned table has the same schema as that of the MySQL table. If you need to adjust the schema, for example, changing the province field to prov, use the following methods.
res_rds_mysql(..., sql="select id, uid, province as prov, city from userinfo")
res_rds_mysql(..., table="userinfo", fields=["id", "uid", ("province", "prov"), "city" ])

The two methods have the same effect. For more information about the fields parameter, see Resource functions.

Enrich data by using e_table_map or e_search_map_table along with the res_rds_mysql function

Note It is meaningless to use the res_rds_mysql function alone in the console. The res_rds_mysql function only obtains data from the MySQL table but does not enrich data. The following section describes how to use res_rds_mysql together with other functions to enrich data.
  • Use the e_table_map function for exact matching.
    • Assume that the MySQL table contains the following content.
      province city population cid eid
      shanghai shanghai 2000 1 00001
      tianjin tianjin 800 1 00002
      beijing beijing 4000 1 00003
      henan zhengzhou 3000 2 00004
      jiangsu nanjing 1500 2 00005
    • The source Logstore contains the following data.
      # Four logs in the source Logstore.
      time:"1566379109"
      data:"test-one"
      cid:"1"
      eid:"00001"
      
      time:"1566379111"
      data:"test_second"
      cid:"1"
      eid:"12345"
      
      time:"1566379111"
      data:"test_three"
      cid:"2"
      eid:"12345"
      
      time:"1566379113"
      data:"test_four"
      cid:"2"
      eid:"12345"
      
      ...
    • Match data based on the cid column in the MySQL table and add the province, city, and population fields to the source Logstore.
      # The DSL orchestration rule.
      # This rule matches a row in the MySQL table whose cid value is the same as that in the source Logstore and adds the province, city, and population fields of the row to the logs in the source Logstore.
      e_table_map(res_rds_mysql(...),"cid",["province","city","population"])
    • New data is generated after the rule is applied.
      # Four logs in the source Logstore.
      time:"1566379109"
      data:"test-one"
      cid:"1"
      eid:"00001"
      province:"shanghai"
      city:"shanghai"
      population:"2000"
      
      time:"1566379111"
      data:"test_second"
      cid:"1"
      eid:"12345"
      province:"shanghai"
      city:"shanghai"
      population:"2000"
      
      time:"1566379111"
      data:"test_three"
      cid:"2"
      eid:"12345"
      province:"henan"
      city:"zhengzhou"
      population:"3000"
      
      time:"1566379113"
      data:"test_four"
      cid:"2"
      eid:"12345"
      province:"henan"
      city:"zhengzhou"
      population:"3000"
      
      ...
      Note As shown in the preceding example, multiple rows whose cid value is 1 are matched in the MySQL table when you match data based on the cid field. However, the rule obtains only the first matched row. This is the feature of the e_table_map function. That is, the e_table_map function can match data by one or more fields but it stops matching so long as it matches one row. The e_table_map function will be able to process multiple matched rows in the future. To add all matched rows to the source Logstore, you can use the e_search_table_map function.
  • Use the e_search_table_map function to match a string. This function supports adding multiple matched rows to the source Logstore.
    • Assume that the raw log in the source Logstore is as follows.
      time:1563436326
      data:123
      city:nanjing
      province:jiangsu
    • Assume that data in the MySQL table is as follows.
      content name age
      city~=n* aliyun 10
      province~=su$ Maki 18
      city:nanjing vicky 20
    • Use ETL to transform and orchestrate data.

      Search the raw log for the value of a specified field in the MySQL table. The field value uses the key/value format, where value is a regular expression and key is a field value in the raw log. Add the matched value of the specified field in the MySQL table to the raw log based on the matching result. The e_search_table_map function supports adding the values of multiple matched rows to a field in the raw log.

    • Example 1: single-row matching. The function stops searching after finding one matched row in the MySQL table.
      e_search_table_map(res_rds_mysql(address="rds-host", username="mysql-username",password="xxx",database="xxx",table="xx",refresh_interval=60),"content","name")
      Note
      • For more information about the e_search_table_map function, see e_search_table_map.
      • The res_rds_mysql() function pulls data from a specified MySQL table. For more information, see res_rds_mysql.
      • content is a field in the MySQL table and its value is used to match data in the raw log. For more information about the matching rules, see e_search. Multiple matching modes, including regular expression matching, exact matching, and fuzzy matching, are supported.

      DSL orchestration result:

      The system searches for the value of city that starts with n in the raw log. Then, the system adds the matched value of the name field in the MySQL table to the raw log.
      time:1563436326
      data:123
      city:nanjing
      province:jiangsu
      name:aliyun
    • Example 2: multi-row matching. Traverse all the rows in the MySQL table and add matched values to the specified field.
      The following method uses the multi_match=True and multi_join="," fields to add all matched values to the raw log. The multi_match=True field indicates that multi-row matching is enabled. The multi_join="," field indicates that all matched values are separated with commas (,).
      e_search_table_map(res_rds_mysql(address="rds-host", username="mysql-username",password="xxx",database="xxx",table="xx",refresh_interval=60),"content","name",multi_match=True,multi_join=",")

      DSL orchestration result:

      The preceding method searches for data in the raw log based on city~=n*, province~=su$, and city:nanjing. The search complies with the rules of the e_search function. In the search criteria, ~= is followed by a regular expression, and : indicates whether the specified string is included. Based on the rules, all three rows are matched in the MySQL table. Therefore, the system adds the three values of name, separated with commas (,), to the raw log.
      time:1563436326
      data:123
      city:nanjing
      province:jiangsu
      name:aliyun,Maki,vicky