All Products
Search
Document Center

Issue algriculture loans

Last Updated: May 14, 2020

The data in this topic is fictitious and is only used for experimental purposes.

Background

Issuing agriculture loans is a typical data mining case. Lenders use an experience model built based on statistics of past years (including a borrower’s yearly income, types of planted crops, loan history, and other factors) to predict that borrower’s repayment ability.
This topic is based on real agriculture loan scenarios and shows how to use the linear regression algorithm to handle loan issuing business. Linear regression is a widely applicable statistics analysis method used in statistics to determine the quantitative relation that two or more variables depend on. This topic predicts whether to issue requested loan amounts to users in the prediction set by analyzing the issuing historical information of agriculture loans.

Dataset

The fields are as follows.

Field Definition Type Description
id The unique identifier of a data item string Person.
name The name of a user string Person.
region The region where the user is located string Arranged from north to south.
farmsize The size of the farmland owned by the user double Farmland area.
rainfall The rainfall in the region double Rainfall.
landquality The land quality of the region double Higher land quality values indicate better land quality.
farmincome The income of the user from the farmland double Yearly income.
maincrop The crops cultivated on the farmland string Types of crops.
claimtype Loan type string Two types.
claimvalue Loan amount double Loan amount.

The following is a screenshot of the data.

Data exploration procedure

The following figure shows the experiment process.

1. Data source preparation

Input data is divided into two parts:

  • Loan training set: More than 200 pieces of loan data are used to train the regression model. This training set includes features such as “farmsize” and “rainfall”. “claimvalue” is the recovered loan amount.
  • Loan prediction set: This prediction set includes a total of 71 loan applicants this year. “claimvalue” is a farmer’s requested loan amount.

Predicate whom of the 71 applicants will receive loans based on the existing 200+ pieces of historical data.

2. Data preprocessing

Map data of the string type to numbers according to data meanings. For example, for the “region” field, map “north”, “middle”, and “south” in order to 0, 1, and 2, respectively. Then, convert the field to the double type by using the Data Type Conversion component, as shown in the following figure. You can perform model training after data is preprocessed.

3. Model training and prediction

Use the Linear Regression component to train historical data and create a regression model, which is used in the Prediction component to predict data in the prediction set. Use the Merge Columns component to merge the user ID, prediction score, and claim value, as shown in the following figure.
The prediction score indicates a user’s loan repayment ability (expected loan repayment amount).

4. Regression model evaluation

Use the Regression Model Evaluation component to evaluate the model. The following table lists evaluation results.

5. Loan issuance

Use the Filtering and Mapping component to determine the applicants who can receive loans. The principle of the experiment is that, if an applicant’s repayment ability is predicted to be greater than the requested loan amount, that applicant will receive a loan. This principle applies to each potential customer.

References

You can log on to Alibaba Cloud Machine Learning Platform for AI (PAI) to experience this product and go to Yunqi Community to discuss with us.