Data masking effectively reduces the exposure of sensitive data during processes such as transformation, transmission, and use. This practice lowers the risk of sensitive data leakage and protects user privacy. This topic describes common data masking scenarios, the corresponding methods, and provides examples for data transformation in Simple Log Service (SLS).
Background information
Data masking is commonly used for sensitive information such as phone numbers, bank card numbers, email addresses, IP addresses, AccessKeys (AKs), ID card numbers, URLs, and strings. In the SLS data transformation service, a common masking method is to use the regex_replace regular expression function.
Use case 1: Mask phone numbers
Data masking methods
If a log contains a phone number that you do not want to expose, use the
regex_replacefunction with a regular expression to mask it.Example 1: Show the first three and last four digits of a phone number and hide the middle digits.
Raw log
message:{"data":{"receivePhoneNo":"13812345678"}}SPL orchestration rule
* | extend message1 = regexp_replace(message, 'receivePhoneNo":\s*"([\+86]*1[3-9]{1}\d{1})\d{4}(\d{4,11})','receivePhoneNo":"\1****\2')Transformation result
message:{"data":{"receivePhoneNo":"13812345678"}} message1:{"data":{"receivePhoneNo":"138****5678"}}
Example 2: For Hong Kong and Macao phone numbers, show the first two and last two digits and hide the middle digits.
Raw log
message:{"data":{"receivePhoneNo":"59092819"}}SPL orchestration rule
* | extend message1 = regexp_replace(message, 'receivePhoneNo":\s*"(5|6|7|8|9)(\d{1})(\d{4})(\d{2})\"','receivePhoneNo":"\1\2****\4"')Results
message:{"data":{"receivePhoneNo":"59092819"}} message1:{"data":{"receivePhoneNo":"59****19"}}
Example 3: For Taiwan phone numbers, show the first two and last two digits and hide the middle digits.
Raw log
message:{"data":{"receivePhoneNo":"020928198"}}SPL orchestration rule
* | extend message1 = regexp_replace(message, 'receivePhoneNo":\s*"(0[2-9])(\d{5,6})(\d{2})\"','receivePhoneNo":"\1******\3"')Transformation result
message:{"data":{"receivePhoneNo":"020928198"}} message1:{"data":{"receivePhoneNo":"02******98"}}
Use case 2: Mask bank card information
Desensitization methods
If a log contains bank card or credit card information, use the
regex_replacefunction with a regular expression to mask it.Example
Raw log
content: bank number is 491648411333978312 and credit card number is 4916484113339780SPL orchestration rule
* | extend bank_number=regexp_replace(content, '([1-9]{1})(\d{11}|\d{13}|\d{14})(\d{4})', '****\3')Transformation result
content: bank number is 491648411333978312 and credit card number is 4916484113339780 bank_number: bank number is ****978312 and credit card number is ***9780
Use case 3: Mask email addresses
Desensitization methods
If a log contains an email address, use the
regex_replacefunction with a regular expression to mask it.Example 1: Hide the email prefix.
Raw log
content: email is twiss2345@aliyun.comSPL orchestration rule
* | extend email_encrypt=regexp_replace(content, '[A-Za-z\d]+([-_.][A-Za-z\d]+)*(@([A-Za-z\d]+[-.])+[A-Za-z\d]{2,4})', '****\2')Transformation result
content: email is twiss2345@aliyun.com email_encrypt: email is ****@aliyun.com
Example 2: Mask an email address where the prefix before the at sign (@) has fewer than three characters and the suffix is fixed.
Raw log
message:{"data":{"email":"tt@1111.com","icon":"ee@2.png"}}SPL orchestration rule
* | extend message1 = regexp_replace(message,'":\s*"([A-Za-z0-9._%+-]{1,2})(@\w+\.)(com|net|org)\"','":"\1**\2\3"')Processed data
message:{"data":{"email":"tt@1111.com","icon":"ee@2.png"}} message1:{"data":{"email":"tt**@1111.com","icon":"ee@2.png"}}
Example 3: Mask an email address where the prefix before the at sign (@) has more than three characters.
Raw log
message:{"data":{"email":"ttewew@1111.com","icon":"esdse@2.png"}}SPL orchestration rule
* | extend message1 = regexp_replace(message, 'email":\s*"([A-Za-z0-9._%+-]{3})([A-Za-z0-9._%+-]*)(@)(\w+\.\w+)"','email":"\1**\3\4"')Transformation result
message:{"data":{"email":"ttewew@1111.com","icon":"esdse@2.png"}} message1:{"data":{"email":"tte**@1111.com","icon":"esdse@2.png"}}
Use case 4: Mask an AK
Data Masking
If a log contains AK information, use the
regex_replacefunction with a regular expression to mask it.Example
Raw log
content: ak id is rDhc9qxjhIhlBiyphP7buo5yg5h6Eq and ak key is XQr1EPtfnlZLYlQcSPL orchestration rule
* | extend akid_encrypt=regexp_replace(content, '([a-zA-Z0-9]{4})(([a-zA-Z0-9]{26})|([a-zA-Z0-9]{12}))', '\1****')Transformation result
content: ak id is rDhc9qxjhIhlBiyphP7buo5yg5h6Eq and ak key is XQr1EPtfnlZLYlQc akid_encrypt: ak id is rDhc**** and ak key is XQr1****
Use case 5: Mask an IP address
Data masking methods
If a log contains an IP address, use the
regex_replacefunction with a regular expression to mask it.Example
Raw log
content: ip is 192.168.1.1SPL orchestration rule
* | extend ip_encrypt=regexp_replace(content, '(\w+\s+\w+\s+)\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}', '\1****')Transformation result
content: ip is 192.168.1.1 ip_encrypt: ip is ****
Use case 6: Mask an ID card number
Data masking methods
If a log contains ID card information, use the
regex_replacefunction with a regular expression to mask it.Example 1: Show the first six digits of an ID card number and hide the rest.
Raw log
content: Id card is 11010519491231002XSPL orchestration rule
* | extend id_encrypt=regexp_replace(content, '([\d]{4})[\d]{11}([\d]{2}[\d|Xx])', '\1****')Transformation result
content: Id card is 11010519491231002X id_encrypt: Id card is 110105****
Example 2: Show the first four and last three digits of an ID card number and hide the middle digits.
Raw log
message:{"data":{"cardNumber":"410106171821090234","cardNumber":"E138123451","receivePhoneNo":"13812345678"}}SPL orchestration rule
* | extend message1 = regexp_replace(message, 'cardNumber":\s*"([\d]{4})[\d]{11}([\d]{2}[\d|Xx])\"','cardNumber":"\1****\2"')Transformation result
message:{"data":{"cardNumber":"410106171821090234","cardNumber":"E138123451","receivePhoneNo":"13812345678"}} message1:{"data":{"cardNumber":"4101****234","cardNumber":"E138123451","receivePhoneNo":"13812345678"}}
Example 3: For a passport number, show the first letter and the last three digits and hide the middle part.
Raw log
message:{"data":{"cardNumber":"410106171821090234","cardNumber":"E138123451","receivePhoneNo":"13812345678"}}SPL orchestration rule
* | extend message1 = regexp_replace(message, 'cardNumber":\s*"([G|E|H|M|P|B|D])\d{6}(\d{3})\"','cardNumber":"\1****\2"')Transformation result
message:{"data":{"cardNumber":"410106171821090234","cardNumber":"E138123451","receivePhoneNo":"13812345678"}} message1:{"data":{"cardNumber":"410106171821090234","cardNumber":"E****451","receivePhoneNo":"13812345678"}}
Example 4: For a Hong Kong and Macao travel permit number, show the fifth to eighth digits and hide the preceding digits.
Raw log
message:{"data":{"cardNumber":"18210902","cardNumber":"E138123451","receivePhoneNo":"13812345678"}}SPL orchestration rule
* | extend message1 = regexp_replace(message, 'cardNumber":\s*"([\d]{4})([\d]{4})\"','cardNumber":"****\2"')Transformation result
message:{"data":{"cardNumber":"18210902","cardNumber":"E138123451","receivePhoneNo":"13812345678"}} message1:{"data":{"cardNumber":"****0902","cardNumber":"E138123451","receivePhoneNo":"13812345678"}}
Example 5: Show only the first two and last two characters and hide the middle part.
Raw log
message:{"data":{"cardNumber":"18210902","cardNumber":"E138123451","receivePhoneNo":"13812345678"}}SPL orchestration rule
* | extend message1 = regexp_replace(message, 'cardNumber":\s*"([A-Z])(\d{1})([\d]{6})([\d]{2})\"','cardNumber":"\1\2******\4"')Transformation result
message:{"data":{"cardNumber":"18210902","cardNumber":"E138123451","receivePhoneNo":"13812345678"}} message1:{"data":{"cardNumber":"18210902","cardNumber":"E1******51","receivePhoneNo":"13812345678"}}
Use case 7: Mask a URL
Masking method
To mask a URL in a log, use the url_encode function to perform URL encoding.
Example
Raw log
url: https://www.aliyun.com/sls?logstoreSPL orchestration rule
* | extend encode_url=url_encode(url)Transformation result
url: https://www.aliyun.com/sls?logstore encode_url: https%3A%2F%2Fwww.aliyun.com%2Fsls%3Flogstore
Use case 8: Mask an order number
Data masking methods
To mask an order number in a log and prevent others from decoding it, use the md5 encoding function.
Example
Raw log
orderId: 15121412314SPL orchestration rule
* | extend md5_orderId=to_hex(md5(to_utf8(orderId)))Transformation result
orderId: 15121412314 md5_orderId: 852751F9AA48303A5691B0D020E52A0A
Use case 9: Mask a name
Data masking methods
If a log contains a name, use the
regex_replacefunction with a regular expression to mask it.Example: For an English name, show only the first letter of each part.
Raw log
message:{"data":{"name":"Sam Alice"}}SPL orchestration rule
* | extend message1 = regexp_replace(message, 'name":\s*"([a-zA-Z])[a-zA-Z]+\s+([a-zA-Z])[a-zA-Z]+','name":"\1**** \2****"')Transformation result
message:{"data":{"name":"Sam Alice"}} message1:{"data":{"name":"S**** A****""}}