This topic describes how to use the MADlib plug-in. MADlib is an open source library that runs machine learning and graph computing models in AliPG databases. In terms of machine learning, MADlib provides functions and stored procedures for mathematical operations. MADlib also provides a set of typical supervised and unsupervised algorithm libraries for machine learning.

Prerequisites

  • Your ApsraDB RDS for PostgreSQL instance runs one of the following database engine versions:
  • A privileged account is used to connect to your RDS instance. You can check the type of the account that you use on the Accounts page in the ApsaraDB RDS console. If the account is a standard account, you must create a privileged account and use the privileged account to connect to your RDS instance. For more information, see Create an account on an ApsaraDB RDS for PostgreSQL instance.

Background information

The machine learning module of MADlib solves the following issues:
  • Classification and regression issues: MADlib provides a set of algorithms such as K-Nearest Neighbor (KKN), multilayer perceptron neural network, support vector machine (SVM), and decision tree to solve binary classification and regression issues. MADlib also provides a set of models such as least-squares regression, generalized linear model (GLM), logistic regression, and multinomial logistic regression to solve regression issues.
  • Clustering issues: MADlib provides the K-means algorithm for clustering analysis.
  • Correlation analysis: MADlib provides the Apriori algorithm for correlation analysis. The feature can help find unexpected correlations between products such as the correlation between diapers and beer.
  • Analysis of time series data: MADlib provides autoregressive integrated moving average (ARIMA) models to predict future trends of time series data.
  • Others: MADlib provides principal component analysis (PCA) to extract the main factors for data dimension reduction. MADlib provides a Latent Dirichlet Allocation (LDA) model for document classification and topic modeling.
MADlib also integrates a graph computing model to solve issues such as the shortest path, PageRank ranking, and social media issues on queries for the contacts of a specific user. The following table describes the algorithms related to graph computing models.
TypeModel or featureDescription
Shortest pathShortest path among all verticesCalculates the shortest path among all vertices and saves the result to a specific result table. This model queries the shortest path from a start vertex to an end vertex based on the result table.
Shortest path between a specific vertex and all other verticesCalculates the shortest path between a specific vertex and all other vertices and saves the result to a specific result table. This model queries the shortest path from a specific vertex to any other vertex based on the result table.
Breadth-first search (BFS)BFSUses the BFS method to query vertices that are reachable from a specific source vertex.
HITSHITS scoreQueries the HITS scores of all vertices in a directed graph. The HITS scores include hub scores and authority scores.
Web page rankingPageRankQueries the PageRank values of all vertices in a directed graph.
Weakly connected componentWeakly connected componentQueries all weakly connected components in a directed graph.
MeasureAverage path lengthCalculates the average shortest path length of graphs.
ProximityCalculates the closeness centrality of all nodes in a graph.
Graph diameterCalculates the graph diameter.
In-degree or out-degreeCalculates the in-degree and out-degree of all vertices.

Enable or disable the MADlib plug-in

  • Execute the following statement to enable the MADlib plug-in:
    Note Before you execute the following statement, you must execute the CREATE EXTENSION plpythonu; statement to create the plpythonu plug-in.
    CREATE EXTENSION madlib;
  • Execute the following statement to disable the MADlib plug-in:
    DROP EXTENSION madlib;

References

For more information about the MADlib plug-in, see MADlib documentation.