Graph analysis based on PolarDB: Insurance data analysis - PolarDB

In this topic, an insurance dataset that is publicly available is used to demonstrate how to use PolarDB to perform graph queries to identify abnormal claim records and fraudulent groups in insurance claim scenarios. For example, you can query policies involving the claimant or analyze the social relationships of the policyholder for potential fraudulent behaviors. PolarDB enhances relational databases with graph analysis capabilities, providing support for centralized data management and analysis in enterprises.

About the graph database engine

Graph analysis is a critical field of data science. It uses graph structures to represent data and perform various computing and analytic tasks. A graph structure consists of nodes (or vertices) and edges, with nodes representing entities and edges representing the relationships between the entities. Graph computing is commonly used in areas including social network analysis, recommendation systems, knowledge graphs, and path optimization.

PolarDB for PostgreSQL is highly compatible with Apache AGE and supports storage and query of knowledge graphs. It allows you to use both the standard ANSI SQL and openCypher graph query language to query data within the same database cluster.

Fully compatible with PolarDB for PostgreSQL
AGE is an extension for PolarDB for PostgreSQL. It can be used in existing PolarDB databases without the need for database reconstruction. AGE leverages all features provided by PolarDB, including transaction support, concurrency control, and various indexing and performance optimization features.
Unified graph and relational queries
AGE enables the concurrent processing of relational and graph data. It allows you to use both SQL and the graph query language in a single query to simplify the handling of complex data models, which improves operation efficiency.
Cypher query language supported
AGE supports the Cypher query language, which is tailored for graph databases. Its syntax is simple and flexible, providing an intuitive approach to querying and manipulating graph data.
High performance
By combining the optimization features provided by PolarDB and indexes tailored for graph data, AGE can efficiently manage large-scale graph data and complex graph queries.

In conclusion, based on the capabilities provided by AGE, PolarDB can handle graph queries in a simple and efficient manner.

Scenarios

Description

Insurance claim fraud typically involves data on patients, diseases, and claims held by insurance providers. By analyzing the connections between claim applications, diseases, and other related entities, you can identify abnormal claim records and expose potential fraudulent groups.

Data and models

The data is based on an insurance dataset that is publicly available. You can find the dataset here. It contains fundamental elements of the insurance industry. The data model can be abstracted into the following graph:

Vertices: policyholder, incharge, claim, patient, disease.
Edges: has_disease, policyholder_of_claim, incharge_of_claim, insured_of_claim, similar_claim, policyholder_connection.
Properties: name, high_risk, risk_score, disease_name, similarity_score, level, claim_date, charge, and more.

Best practices

Prepare the database

Note

The graph engine extension is supported only by PolarDB for PostgreSQL 14 whose revision version is 2.0.14.12.24.0 or later. For more information, see Getting Started.

Create the extension.
```
CREATE EXTENSION age;
```
Add the extension to the database or the search path and preload library of your account.
Note
Compatibility issues may occur when you use Data Management (DMS) to configure the search_path. In such cases, you can use PolarDB-Tools to execute related statements.
```
ALTER DATABASE <dbname> SET search_path = public,ag_catalog;
ALTER USER <username> SET search_path = public,ag_catalog;

ALTER DATABASE <dbname> SET session_preload_libraries TO 'age';
ALTER USER <username> SET session_preload_libraries TO 'age';
```

Data import

Create a graph by using the create_graph function in the ag_catalog namespace.
```
SELECT create_graph('graph');
```

Insert vertices and edges. The downloaded data file is in the CSV format and does not contain the necessary ID information. Convert the file before you import the data. A Python script is provided in Appendix to convert the data into vertices and edges that is supported by PolarDB. See the following conversion results:

Policyholder

SELECT create_vlabel('graph','policyholder');
SELECT * FROM cypher('graph', $$ CREATE (:policyholder {policyholder_id:'PH3068',fname:'ADAM',lname:'OCHSENBEIN',risk_score:'88',high_risk:'1'}) $$ ) as (n agtype);
SELECT * FROM cypher('graph', $$ CREATE (:policyholder {policyholder_id:'PH3069',fname:'MALINDA',lname:'MEHSERLE',risk_score:'42',high_risk:'0'}) $$ ) as (n agtype);
SELECT * FROM cypher('graph', $$ CREATE (:policyholder {policyholder_id:'PH3070',fname:'SANDRA',lname:'KUHTA',risk_score:'20',high_risk:'0'}) $$ ) as (n agtype);
...

Claim

- Create vlabel
SELECT create_vlabel('graph','claim');
SELECT * FROM cypher('graph', $$ CREATE (:claim {claim_id:'C3571',charge:'6517.53',claim_date:'2013-08-11 00:00:00',duration:'13',insured_id:'28523',diagnosis:'no exception',person_incharge_id:'PI23070',type:'services',policyholder_id:'PH9507'}) $$ ) as (n agtype);
SELECT * FROM cypher('graph', $$ CREATE (:claim {claim_id:'C3572',charge:'49273.65',claim_date:'2017-02-10 00:00:00',duration:'3',insured_id:'1220',diagnosis:'no exception',person_incharge_id:'PI21197',type:'services',policyholder_id:'PH406'}) $$ ) as (n agtype);
SELECT * FROM cypher('graph', $$ CREATE (:claim {claim_id:'C3573',charge:'52005.98',claim_date:'2014-06-29 00:00:00',duration:'27',insured_id:'23735',diagnosis:'no exception',person_incharge_id:'PI22361',type:'services',policyholder_id:'PH7911'}) $$ ) as (n agtype);
...

Connections between policyholder and claim

SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1528' AND b.policyholder_id = 'PH2963' CREATE (a)-[e:RELTYPE  ]->(b) RETURN e$$) as (e agtype);
SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1529' AND b.policyholder_id = 'PH1353' CREATE (a)-[e:RELTYPE  ]->(b) RETURN e$$) as (e agtype);
SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1530' AND b.policyholder_id = 'PH1071' CREATE (a)-[e:RELTYPE  ]->(b) RETURN e$$) as (e agtype);
SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1531' AND b.policyholder_id = 'PH8102' CREATE (a)-[e:RELTYPE  ]->(b) RETURN e$$) as (e agtype);
SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1532' AND b.policyholder_id = 'PH4768' CREATE (a)-[e:RELTYPE  ]->(b) RETURN e$$) as (e agtype);
...

Save the transformed results as an SQL file and use client tools such as the PostgreSQL client to import the data.

Examples