This topic describes how to use the MADlib plug-in. MADlib is an open source library that runs machine learning and graph computing models in AliPG databases. In terms of machine learning, MADlib provides functions and stored procedures for mathematical operations. MADlib also provides a set of typical supervised and unsupervised algorithm libraries for machine learning.

Prerequisites

  • Your ApsraDB RDS for PostgreSQL instance runs one of the following database engine versions:
  • A privileged account is used to connect to your RDS instance. You can check the type of the account that you use on the Accounts page in the ApsaraDB RDS console. If the account is a standard account, you must create a privileged account and use the privileged account to connect to your RDS instance. For more information, see Create an account on an ApsaraDB RDS for PostgreSQL instance.

Background information

The machine learning module of MADlib solves the following issues:
  • Classification and regression issues: MADlib provides a set of algorithms such as K-Nearest Neighbor (KKN), multilayer perceptron neural network, support vector machine (SVM), and decision tree to solve binary classification and regression issues. MADlib also provides a set of models such as least-squares regression, generalized linear model (GLM), logistic regression, and multinomial logistic regression to solve regression issues.
  • Clustering issues: MADlib provides the K-means algorithm for clustering analysis.
  • Correlation analysis: MADlib provides the Apriori algorithm for correlation analysis. The feature can help find unexpected correlations between products such as the correlation between diapers and beer.
  • Analysis of time series data: MADlib provides autoregressive integrated moving average (ARIMA) models to predict future trends of time series data.
  • Others: MADlib provides principal component analysis (PCA) to extract the main factors for data dimension reduction. MADlib provides a Latent Dirichlet Allocation (LDA) model for document classification and topic modeling.
MADlib also integrates a graph computing model to solve issues such as the shortest path, PageRank ranking, and social media issues on queries for the contacts of a specific user. The following table describes the algorithms related to graph computing models.
Type Model or feature Description
Shortest path Shortest path among all vertices Calculates the shortest path among all vertices and saves the result to a specific result table. This model queries the shortest path from a start vertex to an end vertex based on the result table.
Shortest path between a specific vertex and all other vertices Calculates the shortest path between a specific vertex and all other vertices and saves the result to a specific result table. This model queries the shortest path from a specific vertex to any other vertex based on the result table.
Breadth-first search (BFS) BFS Uses the BFS method to query vertices that are reachable from a specific source vertex.
HITS HITS score Queries the HITS scores of all vertices in a directed graph. The HITS scores include hub scores and authority scores.
Web page ranking PageRank Queries the PageRank values of all vertices in a directed graph.
Weakly connected component Weakly connected component Queries all weakly connected components in a directed graph.
Measure Average path length Calculates the average shortest path length of graphs.
Proximity Calculates the closeness centrality of all nodes in a graph.
Graph diameter Calculates the graph diameter.
In-degree or out-degree Calculates the in-degree and out-degree of all vertices.

Enable or disable the MADlib plug-in

  • Execute the following statement to enable the MADlib plug-in:
    Note Before you execute the following statement, you must execute the CREATE EXTENSION plpythonu; statement to create the plpythonu plug-in.
    CREATE EXTENSION madlib;
  • Execute the following statement to disable the MADlib plug-in:
    DROP EXTENSION madlib;

References

For more information about the MADlib plug-in, see MADlib documentation.