Linear regression is a common regression analysis method used in mathematical statistics. The method can be used to find the quantitative relationships between two or more variables. This topic describes how to use linear regression to predict the repayment ability of agricultural loan applicants based on historical loan records.

Background information

Repayment ability prediction is a typical process of data mining. A loan lender can construct an empirical model based on historical statistics about loan applicants, such as annual incomes, crop types, and loan records. Then, the lender can use the model to predict the repayment ability of loan applicants.
Note The datasets that are used in this topic are for experimental use only.

Dataset

The following table describes the fields in the datasets that are used in this topic.
Field Data type Description
id STRING The unique ID of the applicant.
name STRING The name of the applicant.
region STRING The geographic region where the applicant resides. Valid values: north, middle, and south.
farmsize DOUBLE The farmland size.
rainfall DOUBLE The rainfall in the region.
landquality DOUBLE The farmland quality. A greater value of this parameter is preferred.
farmincome DOUBLE The annual income of the applicant.
maincrop STRING The crop type.
claimtype STRING The loan type.
claimvalue DOUBLE The loan amount.

Procedure

  1. Go to the Machine Learning Studio console.
    1. Log on to the PAI console.
    2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
    3. On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.Machine Learning
  2. Create an experiment.
    1. In the left-side navigation pane, click Home.
    2. In the Templates section, click Create below Agricultural Loan Prediction.
    3. In the New Experiment dialog box, specify the experiment parameters. You can use the default values of the parameters.
      Parameter Description
      Name The name of the experiment. Default value: Agricultural Loan Prediction.
      Project The project in which you want to create the experiment. You cannot change the value of this parameter.
      Description The description of the experiment. Default value: Use regression algorithms to create models and achieve agricultural loan prediction.
      Save To The directory for storing the experiment. Default value: My Experiments.
    4. Click OK.
    5. Optional:Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
    6. Optional:Click Agricultural Loan Prediction_XX under My Experiments. The canvas of the experiment appears.
      My Experiments is the directory for storing the experiment that you created and Agricultural Loan Prediction_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.
    7. View the components of the experiment on the canvas, as shown in the following figure. The system automatically creates the experiment based on the preset template.
      Repayment ability prediction experiment
      Area No. Description
      1 The components in this area provide the datasets for the experiment, including a training dataset and a prediction dataset.
      • Training dataset: contains more than 200 historical records that are used to train the regression model of this experiment. The dataset contains fields such as farmsize, rainfall, and claimvalue. The claimvalue field indicates the recovered loan amount.
      • Prediction dataset: contains information about a total of 71 loan applicants who apply for agricultural loans this year. The claimvalue field indicates the requested loan amount.
      The experiment predicts the repayment ability of the applicants in the prediction dataset based on the historical records in the training dataset.
      2 The components in this area convert the values of fields in the STRING type to the DOUBLE type. For example, the valid values of the region field are north, middle, and south. The components in this area map the values to 0, 1, and 2, and then convert the numerals to the DOUBLE type.
      3 The Linear Regression (Old Version)-1 component generates a regression model and trains the regression model with historical data. The Prediction-1 component uses the regression model to predict the loan amount that applicants in the prediction dataset can repay. The Merge Columns-1 component merges the id, prediction_score, and claimvalue columns to the result, as shown in the following figure.Prediction resultThe prediction_score field indicates the predicted amount that the application can repay.
      4 The Regression Model Evaluation component evaluates the regression model.
      5 The Lendee_Filtering and mapping component allows you to view the applicants who are eligible to receive loans. If the predicted amount that an applicant can repay is greater than the requested loan amount, the applicant is eligible to receive the loan.
  3. Run the experiment and view the result.
    1. In the top toolbar of the canvas, click Run.
    2. After the experiment is run, right-click Lendee_Filtering and mapping on the canvas and select View Data. In the dialog box that appears, you can view the applicants who are eligible to receive loans.