The data in this topic is fictitious and is only used for experimental purposes.
Issuing agriculture loans is a typical data mining case. Lenders use an experience model built based on statistics of past years (including a borrower’s yearly income, types of planted crops, loan history, and other factors) to predict that borrower’s repayment ability.
This topic is based on real agriculture loan scenarios and shows how to use the linear regression algorithm to handle loan issuing business. Linear regression is a widely applicable statistics analysis method used in statistics to determine the quantitative relation that two or more variables depend on. This topic predicts whether to issue requested loan amounts to users in the prediction set by analyzing the issuing historical information of agriculture loans.
The fields are as follows.
|id||The unique identifier of a data item||string||Person.|
|name||The name of a user||string||Person.|
|region||The region where the user is located||string||Arranged from north to south.|
|farmsize||The size of the farmland owned by the user||double||Farmland area.|
|rainfall||The rainfall in the region||double||Rainfall.|
|landquality||The land quality of the region||double||Higher land quality values indicate better land quality.|
|farmincome||The income of the user from the farmland||double||Yearly income.|
|maincrop||The crops cultivated on the farmland||string||Types of crops.|
|claimtype||Loan type||string||Two types.|
|claimvalue||Loan amount||double||Loan amount.|
The following is a screenshot of the data.
The following figure shows the experiment process.
Input data is divided into two parts:
- Loan training set: More than 200 pieces of loan data are used to train the regression model. This training set includes features such as “farmsize” and “rainfall”. “claimvalue” is the recovered loan amount.
- Loan prediction set: This prediction set includes a total of 71 loan applicants this year. “claimvalue” is a farmer’s requested loan amount.
Predicate whom of the 71 applicants will receive loans based on the existing 200+ pieces of historical data.
Map data of the string type to numbers according to data meanings. For example, for the “region” field, map “north”, “middle”, and “south” in order to 0, 1, and 2, respectively. Then, convert the field to the double type by using the Data Type Conversion component, as shown in the following figure. You can perform model training after data is preprocessed.
Use the Linear Regression component to train historical data and create a regression model, which is used in the Prediction component to predict data in the prediction set. Use the Merge Columns component to merge the user ID, prediction score, and claim value, as shown in the following figure.
The prediction score indicates a user’s loan repayment ability (expected loan repayment amount).
Use the Regression Model Evaluation component to evaluate the model. The following table lists evaluation results.
Use the Filtering and Mapping component to determine the applicants who can receive loans. The principle of the experiment is that, if an applicant’s repayment ability is predicted to be greater than the requested loan amount, that applicant will receive a loan. This principle applies to each potential customer.
You can log on to Alibaba Cloud Machine Learning Platform for AI (PAI) to experience this product and go to Yunqi Community to discuss with us.