[DSW Gallery] Realize financial risk control based on graph algorithm

Use graph algorithm to achieve financial risk control

Graph algorithms are generally used to solve the business scenarios of relational networks. Different from conventional structured data, graph algorithms need to organize the data into a end-to-end relational graph, more considering the concept of edges and points. A wealth of graph algorithm components are provided here, including K-Core, maximum connectivity subgraph, label propagation clustering, etc.

In this example, the financial risk control is implemented based on the graph algorithm using the person relationship graph data and a small amount of tagged user data.

Operating environment requirements

PyAlink has been installed in the official PAI-DSW image by default, and the memory requirement is 4G or above.

The contents of this Notebook can be viewed directly without preparing any other files.

Import the pyalink package and enable the local running environment

• In this example, we use useLocalEnv to run Alink jobs locally (that is, in the container of DSW) and simulate distributed computing in a multi-threaded way.

• Alink can also use usePaiEnv to submit to MaxCompute.

from pyalink. alink import *

useLocalEnv(1)

Data preparation

This example requires two data sets: a people diagram data table and a known user tag table (to mark which users are fraud users and which users are credit users).

Figure Diagram Data

• The connection between every two people indicates that they have a certain relationship, which can be a colleague relationship or a family relationship.

Figure Diagram Data Definition

View the person relationship table

edges.lazyPrint(5)

BatchOperator.execute()

View statistics of discrete variables

edges. select('source, target').lazyVizDive()

BatchOperator.execute()

View the statistics of the person relationship table

edges.lazyPrintStatistics()

BatchOperator.execute()

View the person relationship table statistics panel

edges.lazyVizStatistics()

BatchOperator.execute()

Known User Tag Table

df_ labeled_ vertices = pd.DataFrame([

["Enoch", "Credit User", 1.0],

["Evan", "Fraudulent Users", 0.8]

])



labeled_ vertices = BatchOperator. fromDataframe(df_labeled_vertices, schemaStr='vertices string, labels string, weight double')

labeled_ vertices.print()

Use graph algorithm to determine whether the user is a fraudulent user

It is divided into three steps,

Step 1: Maximum connectivity subgraph

The group in the data is divided into two parts through the maximum connectivity sub graph component and assigned to the group_ id。 Then filter and JOIN are used to remove irrelevant personnel from the graph.

The maximum connectivity sub graph component can find the maximum set with connectivity relationship, thus excluding people irrelevant to risk control in the team.

Step 2: Explore each person's primary and secondary contacts

• In the output result of the shortest path component of the unit, distance indicates that Enoch can contact the target person through several people

Step 3: Use label propagation to determine the labels of unmarked points

Label propagation classification is a semi supervised classification algorithm. The principle is to use the label information of labeled nodes to predict the label information of unlabeled nodes.

During the implementation of the algorithm, the labels of each node are propagated to the adjacent nodes according to the similarity. At each step of node propagation, each node updates its own label according to the labels of the adjacent nodes. The greater the similarity with the node, the greater the influence weight of its adjacent nodes on its label, and the more consistent the labels of similar nodes are, the easier it is to propagate its labels. In the process of label propagation, the labels of labeled data are kept unchanged, so that they can be transmitted to the unlabeled data as a source.

Finally, when the iteration process ends, the probability distribution of similar nodes tends to be similar, which can be divided into the same category, thus completing the label propagation process.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us