×
Community Blog Public Event Sentiment Analysis in PostgreSQL

Public Event Sentiment Analysis in PostgreSQL

Recursive syntax in PostgreSQL and PL/pgSQL can meet the requirements of analyzing events before and after a specific event in public opinion analysis scenarios.

Background

Sometimes a business may want to know what will happen following a specific event. For example, will publishing house price regulations lead to large investment in the stock market?

Let's take a look at how PostgreSQL databases can meet this type of requirement.

Simulate a Realistic Scenario

1.  Construct the structure of a table holding events

create table event(  
id serial8 primary key, -- Auto-increment sequence that determines the sequential order of events  
class text, -- Event type  
info text, -- Event description  
crt_time timestamp -- Time when an event occurs  
);  

2.  Insert 10 million event records, which are of 5,000 different event types

insert into event (class, info , crt_time) select (5000*random())::int::text, 'test', clock_timestamp() from generate_series(1,10000000);  

3.  Create indexes by event type

create index idx_event_class on event (class);  

Requirements

Query a specific number of events following a certain event

Simply use the following function to meet the preceding requirement

create or replace function f(  
sql text, -- Query the IDs of the target events to be analyzed  
v_class text, -- Type of a target event  
v_n int8, -- How many subsequent events are to be checked  
v_limit int -- How many times events are analyzed. If all output is required, enter a large value  
) returns setof event as 
$$
  
declare  
  v_id int8;  
begin  
  for v_id in execute sql loop  
    return query select * from event where id>=v_id order by id limit v_n+1;  
    v_limit := v_limit - 1;  
    if v_limit<=0 then  
      return;  
    end if;  
  end loop;  
end;  

$$
 language plpgsql strict;  

Query example

Query two events following an event of type 1 and generate analysis results ten times.

postgres=# select * from f(
$$
select id from event where class='1'
$$
, '1', 2, 10);  
  id   | class | info |          crt_time            
-------+-------+------+----------------------------  
  1592 | 1     | test | 2017-03-31 15:07:23.77348  
  1593 | 3032  | test | 2017-03-31 15:07:23.773483  
  1594 | 3409  | test | 2017-03-31 15:07:23.773486  
  2784 | 1     | test | 2017-03-31 15:07:23.777265  
  2785 | 2819  | test | 2017-03-31 15:07:23.777268  
  2786 | 87    | test | 2017-03-31 15:07:23.777271  
 12176 | 1     | test | 2017-03-31 15:07:23.807489  
 12177 | 2586  | test | 2017-03-31 15:07:23.807491  
 12178 | 4101  | test | 2017-03-31 15:07:23.807494  
 19398 | 1     | test | 2017-03-31 15:07:23.83072  
 19399 | 1179  | test | 2017-03-31 15:07:23.830723  
 19400 | 4237  | test | 2017-03-31 15:07:23.830726  
 19571 | 1     | test | 2017-03-31 15:07:23.831296  
 19572 | 4368  | test | 2017-03-31 15:07:23.831299  
 19573 | 2313  | test | 2017-03-31 15:07:23.831302  
 24708 | 1     | test | 2017-03-31 15:07:23.847794  
 24709 | 1327  | test | 2017-03-31 15:07:23.847797  
 24710 | 4584  | test | 2017-03-31 15:07:23.847799  
 29756 | 1     | test | 2017-03-31 15:07:23.864234  
 29757 | 4386  | test | 2017-03-31 15:07:23.864237  
 29758 | 3044  | test | 2017-03-31 15:07:23.864239  
 30224 | 1     | test | 2017-03-31 15:07:23.865765  
 30225 | 4704  | test | 2017-03-31 15:07:23.865768  
 30226 | 332   | test | 2017-03-31 15:07:23.865771  
 32004 | 1     | test | 2017-03-31 15:07:23.871554  
 32005 | 219   | test | 2017-03-31 15:07:23.871557  
 32006 | 3548  | test | 2017-03-31 15:07:23.871559  
 36472 | 1     | test | 2017-03-31 15:07:23.886097  
 36473 | 640   | test | 2017-03-31 15:07:23.8861  
 36474 | 1139  | test | 2017-03-31 15:07:23.886103  
(30 rows)  

With the preceding results, we can know what events will happen after the occurrence of event 1.

To use the preceding results for further analysis, use hotspot analysis similar to keywords, use the MADlib machine learning database in PostgreSQL or use plR.

You always have a method to find the correlation between events.

Correlation Analysis of Non-Independent Events

Because the previous analysis targets independent events, which bear no correlation themselves, it is required to use functions to generate results.

Users' mouse clicks on Taobao (for example, what other items a user clicks after clicking a specific item) is an example of non-independent events.

For non-independent events, the structure design includes associated fields like USERID, so that we can easily establish association.

Trajectory analysis can be performed on non-independent events by using recursive calls in PostgreSQL.

Summary

Recursive syntax in PostgreSQL and PL/pgSQL can perfectly meet the requirements of analyzing events before and after a specific event in public opinion analysis scenarios.

Then hotspot analysis, cluster analysis, MADlib in PostgreSQL or plR can be used to analyze events to find event correlation.

0 0 0
Share on

digoal

277 posts | 24 followers

You may also like

Comments

digoal

277 posts | 24 followers

Related Products