Community Blog How to Mask Sensitive Information on PostgreSQL Using Anon Plug-in

How to Mask Sensitive Information on PostgreSQL Using Anon Plug-in

In this article, the author explains how to mask sensitive information such as email addresses and credit card numbers in PostgreSQL using the Anon plug-in.

By digoal


PostgreSQL security label is a security framework that opens the security label provider interface. By implementing this interface, you can achieve fine-grained data security control. As a result, some users can only see obfuscated data, for example.

  • load security label lib
  • user (labeled) —> search —> security label (object) —> security obfuscation function —> return obfuscation results

For example, SELinux Label Lib: https://www.postgresql.org/docs/12/sepgsql.html

shared_preload_libraries = 'sepgsql'  
SECURITY LABEL FOR selinux ON TABLE mytable IS 'system_u:object_r:sepgsql_table_t:s0';  

For example, prohibit direct access to credit fields.

postgres=# CREATE TABLE customer (  
               cid     int primary key,  
               cname   text,  
               credit  text  
postgres=# SECURITY LABEL ON COLUMN customer.credit  
               IS 'system_u:object_r:sepgsql_secret_table_t:s0';  

Allow access to credit through function.

postgres=# CREATE FUNCTION show_credit(int) RETURNS text  
             AS 'SELECT regexp_replace(credit, ''-[0-9]+$'', ''-xxxx'', ''g'')  
                        FROM customer WHERE cid = $1'  
           LANGUAGE sql;  
postgres=# SECURITY LABEL ON FUNCTION show_credit(int)  
               IS 'system_u:object_r:sepgsql_trusted_proc_exec_t:s0';  

An administrative user should perform the above operations.

postgres=# SELECT * FROM customer;  
ERROR:  SELinux: security policy violation  
postgres=# SELECT cid, cname, show_credit(cid) FROM customer;  
 cid | cname  |     show_credit  
   1 | taro   | 1111-2222-3333-xxxx  
   2 | hanako | 5555-6666-7777-xxxx  
(2 rows)  

In this case, a regular user cannot reference customer credit. Still, a trusted procedure show_credit allows the user to print the customers’ credit card numbers with some digits masked out.

Query specific labels.

postgres=# select * from pg_seclabel;  
 objoid | classoid | objsubid | provider | label   
(0 rows)  
postgres=# select * from pg_seclabels;  
 objoid | classoid | objsubid | objtype | objnamespace | objname | provider | label   
(0 rows)    

In addition to SELinux, some plug-ins, such as anon, also use security label interfaces to mask sensitive information.

Anon: Security Label Provider for Masking Sensitive Information

The first step is to label the user, indicating that the user needs to load the security label provider —> anon when querying data.

Then, label the object. Users who have performed labeling for objects will use the sensitive information masking function to mask sensitive information.


=# SELECT * FROM people;  
 id | fistname | lastname |   phone      
 T1 | Sarah    | Conor    | 0609110911  

Step 1: Activate the dynamic masking engine.

=# SELECT anon.start_dynamic_masking();  

Step 2: Declare a masked user.

=# CREATE ROLE skynet LOGIN;  

Step 3: Declare the masking rules.

anon.Fake_last_name and anon.partial are two masking functions.

You should label security label on people.lastname and use anon.fake_last_name to filter functions safely. In other words, when you send a query, the fake_last_name function cleans the values of the field before entering the next step.

=# SECURITY LABEL FOR anonON COLUMN people.lastname    
-# IS 'MASKED WITH FUNCTION anon.fake_last_name()';  
=# SECURITY LABEL FOR anon ON COLUMN people.phone   
-# IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)';  

Step 4: Connect with the masked user

=# \! psql peopledb -U skynet -c 'SELECT * FROM people;'  
 id | fistname | lastname  |   phone      
 T1 | Sarah    | Stranahan | 06******11  

Support for Anon Sensitive Information Masking Algorithm Functions

The object is implemented in a masking function; when you query the object content, the masking function is carried out first and then returned.

Built-in Masking Functions

  • Adding Noise
  • Shuffling
  • Randomization
  • Faking
  • Partial scrambling

Optimization algorithm:

  • For names and other 'direct identifiers,’ faking is often useful.
  • Shuffling is convenient for foreign keys.
  • Adding noise is interesting for numeric values and dates.
  • Partial scrambling is perfect for the email address and phone numbers, etc.


Add noise:

  • anon.add_noise_on_numeric_column(table, column,ratio) if ratio = 0.33, all values of the column will be randomly shifted with a ratio of +/- 33%
  • anon.add_noise_on_datetime_column(table, column,interval) if interval = '2 days', all values of the column will be randomly shifted by +/- 2 days

Random value obfuscation:

  • anon.random_date() returns a date
  • anon.random_date_between(d1,d2) returns a date between d1 and d2
  • anon.random_int_between(i1,i2) returns an integer between i1 and i2
  • anon.random_string(n) returns a TEXT value containing n letters
  • anon.random_zip() returns a 5-digit code
  • anon.random_phone(p) return a 8-digit phone with p as a prefix

Display partial content:

  • anon.partial('abcdefgh',1,'xxxx',3) will return 'axxxxfgh';
  • anon.email ('daamien@gmail.com') will become 'da@gm**.com'

Fake data should be randomly replaced by the contents of the fake library after loading the fake library.

Once the fake data is loaded, you have access to 12 faking functions:

  • anon.fake_first_name() returns a generic first name
  • anon.fake_last_name() returns a generic last name
  • anon.fake_email() returns a valid email address
  • anon.fake_city() returns an existing city
  • anon.fake_city_in_country(c) returns a city in country c
  • anon.fake_region() returns an existing region
  • anon.fake_region_in_country(c) returns a region in country c
  • anon.fake_country() returns a country
  • anon.fake_company() returns a generic company name
  • anon.fake_iban() returns a valid IBAN
  • anon.fake_siret() returns a valid SIRET
  • non.fake_siren() returns a valid SIREN

For TEXT and VARCHAR columns, you can use the classic Lorem Ipsum generator:

  • anon.lorem_ipsum() returns 5 paragraphs
  • anon.lorem_ipsum(2) returns 2 paragraphs
  • anon.lorem_ipsum( paragraphs := 4 ) returns 4 paragraphs
  • anon.lorem_ipsum( words := 20 ) returns 20 words
  • anon.lorem_ipsum( characters := 7 ) returns 7 characters

User-defined Masking Functions

You can write your own Masks and use your own functions as a mask. The function must either be destructive (like [Partial Scrambling]) or insert some randomness in the dataset (like [faking]).

For instance, if you wrote a function foo(), you can apply it as the following:


Type Conversion

If the type returned by the obfuscated function is not the same as the original type in the field, it can be converted and then returned.

For example, the faking functions will return values in TEXT data types. The random functions will return TEXT, INTEGER, or TIMESTAMP WITH TIMEZONE.

If the column you want to mask is in another data type (for instance, VARCHAR(30), then you need to add an explicit cast directly in the COMMENT declaration, as the following:

=# COMMENT ON COLUMN clients.family_name  
-# IS 'MASKED WITH FUNCTION anon.fake_last_name()::VARCHAR(30)';  


0 0 0
Share on


237 posts | 16 followers

You may also like