AnalyticDB for PostgreSQL integrates the MADlib extension to implement machine learning.

MADlib is an open source library that runs machine learning and graph computing modules in PostgreSQL databases. In terms of machine learning, MADlib provides functions and stored procedures for mathematical statistics. MADlib also provides a set of typical supervised and unsupervised algorithm libraries for machine learning.

The machine learning module of MADlib solves the following issues:

- Classification and regression issues: MADlib provides a set of algorithms such as K-Nearest Neighbor (KKN), multilayer perceptron neural network, support vector machine (SVM), and decision tree to solve binary classification and regression issues. MADlib also provides a set of models such as least-squares regression, generalized linear model (GLM), logistic regression, and multinomial logistic regression to solve regression issues.
- Clustering issues: MADlib provides the K-means algorithm for clustering analysis.
- Correlation analysis: MADlib provides the Apriori algorithm for correlation analysis. The feature can help find unexpected correlations between products such as the correlation between diapers and beer.
- Analysis of time series data: MADlib provides autoregressive integrated moving average (ARIMA) models to predict future trends of time series data.
- Others: MADlib provides principal component analysis (PCA) to extract the main factors for data dimension reduction. MADlib provides a Latent Dirichlet Allocation (LDA) model for document classification and topic modeling.

MADlib also integrates a graph computing model to solve issues such as the shortest path, PageRank ranking, and social media issues on queries for the contacts of a specific user. The following table describes the algorithms related to graph computing models.

Category | Model/feature | Description |
---|---|---|

Shortest path | Shortest path among all vertices | Calculates the shortest path among all vertexes and saves the result to a specific result table. This model queries the shortest path from a start vertex to an end vertex based on the result table. |

Shortest path between a specific vertex and all other vertices | Calculates the shortest path between a specific vertex and all other vertices and saves the result to a specific result table. This model queries the shortest path from a specific vertex to any other vertex based on the result table. | |

Breadth-first search (BFS) | BFS | Uses the BFS method to query vertices that are reachable from a specific source vertex. |

HITS | HITS score | Queries the HITS scores of all vertices in a directed graph. The HITS scores include hub scores and authority scores. |

Web page ranking | PageRank | Queries the PageRank of all vertices in a directed graph. |

Weak connected component | Weak connected component | Queries all weak connected components in a directed graph. |

Measure | Average path length | Calculates the average shortest path length of graphs. |

Proximity | Calculates the closeness centrality of all nodes in a graph. | |

Graph diameter | Calculates the graph diameter. | |

In/out-degree | Calculates the in-degree and out-degree of all vertices. |

The machine learning module provides the following benefits:

- Easy to use. SQL statements are allowed to analyze large amounts of data for greater ease of programming.
- Highly lightweight. AnalyticDB for PostgreSQL helps you solve complex problems such as combination issues on classification and social network analysis.
- Elastic and high-performance. You can elastically scale computing resources such as CPU and compute nodes based on the cloud native architecture of AnalyticDB for PostgreSQL.