All Products
Search
Document Center

Platform For AI:Predict the repayment ability of agricultural loan applicants

Last Updated:Feb 20, 2024

Linear regression is a common regression analysis method in mathematical statistics. You can use this method to find the quantitative relationships between two or more variables. Machine Learning Designer provides a preset linear regression template to help you build a model to predict the repayment ability of agricultural loan applicants based on historical loan records. This topic describes how to use the preset linear regression template.

Background information

Repayment ability prediction of agricultural loan applicants is a typical data mining process. Loan lenders can build an empirical model based on the historical data of applicants, such as annual incomes, crop types, and loan records, and use the model to predict the repayment ability of loan applicants.

Note

The datasets that are used in this topic are only for experimental use.

Prerequisites

Datasets

The datasets that are used in this topic contains the following fields:

Field

Type

Description

id

STRING

The unique ID of the applicant.

name

STRING

The name of the applicant.

region

STRING

The geographic region where the applicant resides. Valid values: north, middle, and south.

farmsize

DOUBLE

The farmland size.

rainfall

DOUBLE

The rainfall in the region.

landquality

DOUBLE

The farmland quality. A greater value indicates better quality.

farmincome

DOUBLE

The annual income of the applicant.

maincrop

STRING

The crop type.

claimtype

STRING

The loan type.

claimvalue

DOUBLE

The loan amount.

Procedure

  1. Go to the Machine Learning Designer page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Training > Visualized Modeling (Designer) to go to the Machine Learning Designer page.

  2. Create a pipeline.

    1. On the Visualized Modeling (Designer) page, click the Preset Templates tab.

    2. On the Preset Templates tab, find the Agricultural Loan Prediction template and click Create.

    3. In the Create Pipeline dialog box, configure the required parameters. You can use the default values.

      The value of the Pipeline Data Path parameter indicates the Object Storage Service (OSS) path of the temporary data and models that are generated during the runtime of the pipeline.

    4. Click OK.

      It requires about 10 seconds to create the pipeline.

    5. On the Pipelines tab, select the created pipeline and click Open.

    6. View the components of the pipeline on the canvas. The following figure shows the pipeline that is automatically created based on the preset template.

      实验

      Section

      Description

      1

      The components in this section read the following datasets that are used in the pipeline:

      • Training dataset: contains 100 historical records that are used to train the linear regression model. The dataset contains fields such as farmsize, rainfall, and claimvalue. The claimvalue field indicates the recovered loan amount.

      • Prediction dataset: contains information about the 71 loan applicants who apply for agricultural loans this year. The claimvalue field indicates the requested loan amount.

      The pipeline predicts the repayment ability of the applicants in the prediction dataset based on the historical records in the training dataset.

      2

      The components in this section convert field values of the STRING type to the DOUBLE type. For example, the valid values of the region field are north, middle, and south. The components in this section map these values to numerical values (0, 1, and 2, respectively) and convert the numerical values to the DOUBLE type.

      3

      The linear regression component trains and generates a regression model by using historical records in the training dataset. The prediction component uses the regression model to predict the loan amount that applicants can repay. The Append Columns component merges the id, prediction_score, and claimvalue columns in the prediction results, as shown in the following figure. 预测结果The prediction_score field indicates the predicted amount that the applicants can repay.

      4

      The Evaluation component evaluates the prediction performance of the model. For information about the evaluation metrics, see Table 1 (Evaluation metrics).

      5

      The Sql Mapping component identifies eligible loan applicants by comparing the predicted repayment amounts to the requested loan amounts. If the predicted repayment amount is higher than the requested loan amount, the applicant is considered as eligible.

      Table 1. Evaluation metrics

      Metric

      Description

      MAE

      The mean absolute error.

      MAPE

      The mean absolute percentage error.

      MSE

      The mean squared error.

      R

      The coefficient of multiple correlations.

      R2

      The coefficient of determination.

      RMSE

      The root-mean-square error.

      SAE

      The sum of absolute errors.

      SSE

      The sum of squared errors.

      SSR

      The sum of squares due to regression.

      SST

      The total sum of squares.

      count

      The number of rows.

      predictionMean

      The mean of prediction results.

      yMean

      The mean of original dependent variables.

  3. Run the pipeline and view the prediction results.

    1. In the upper-left corner of the canvas, click the Run image.pngicon.

    2. After the pipeline completes, right-click the Sql Mapping component on the canvas and choose View Data > Output Port. On the tab that appears, you can view the eligible loan applicants.

      输出

References

For more information about algorithm components, see the following topics: