edit-icon download-icon

Financial risk management

Last Updated: Aug 17, 2018

Overview

This project is created by using Alibaba Cloud Network Chart. Network Chart is used to illustrate the interconnections among a set of entities, for example, the relationships among a group of people. Unlike hierarchical data, the relationships in Network Chart are represented by nodes and edges (links). The nodes are connected to each other through edges. Alibaba Cloud Machine Learning Platform For AI provides several Network Chart components, including K-Core, largest connected subgraph, and label propagation classification.

Scenario

The following figure shows the relationships among a group of people. The arrows in the figure represent the relationships between these people (for example, coworkers or relatives). Enoch is a trusted customer and Evan is a fraudulent customer. Based on this information and the relationship graph, Network Chart allows you to calculate the credit scores of the remaining people for financial risk management. By referencing the credit scores, you can make predictions about which of them might be fraudulent customers.

Datasets

Data source: the dataset in this project is provided by Alibaba Cloud Machine Learning Platform For AI. The dataset includes the following attributes:

Name Definition Data Type Description
start_point Start node of an edge string Name of a person.
end_point End node of an edge string Name of a person.
count Relational closeness double The larger the value is, the closer relationship the two persons have.

The following figure shows the data entries:

Data exploring procedure

The following figure shows the workflow of this project:
image

Largest connected subgraph

The largest connected subgraph allows you to find the cluster that contains the most interconnected entities. In this project, the largest connected subgraph divides the people into two groups and assigns each team a group ID (group_id). The group containing Parker, Rex, and Stan should be removed from the subgraph because the relationship between these people do not affect the prediction results. You can use the SQL script component and JOIN component to remove this group from the subgraph.

Single-source shortest path

The single-source shortest path allows you to measure the distance (number of nodes) that a start node must pass through to reach an end node.

The following figure shows the distances between Enoch and the others:

Label propagation classification

Label propagation classification is a semi-supervised classification algorithm. It uses the existing label information of the nodes to predict the label information of the unlabeled nodes. Based on the correlations between the nodes, label propagation classification propagates each label to other nodes.

To use the label propagation classification component, make sure that you have a connected graph containing all entities and the data for labeling. In this project, the data for labeling is imported from the Read Data Source component. The weight column shows the probability of a person being a fraudulent customer.


By SQL filtering, the final results show the probabilities of committing fraud for all people. The larger the value is, the larger probability a person might be fraudulent customer.

Thank you! We've received your feedback.