This article uses middle school students' data and machine mining algorithms to determine the key factors affecting middle school students' academics. This includes information such as parents' occupation, parents' education, and Internet connectivity at home. The offline models and the academic indicator evaluation report are generated through the logistic regression algorithm to predict the students' final examination. An online prediction API is generated, through which the trained offline model is applied to the online scenario.
We will be building our predictor using the Alibaba Cloud Machine Learning Platform for Artificial Intelligence (PAI) service.
The dataset consists of 25 feature columns and 1 target column. The detailed fields are as follows.
The following is a screenshot of the data.
The following diagram shows the experiment process.
The data flows through the experiment from top to bottom, for preprocessing, splitting, training, prediction and evaluation in turn.
The SQL script is provided as follows.
1. select (case sex when 'F' then 1 else 0 end) as sex,
2. (case address when 'U' then 1 else 0 end) as address,
3. (case famsize when 'LE3' then 1 else 0 end) as famsize,
4. (case Pstatus when 'T' then 1 else 0 end) as Pstatus,
5. Medu,
6. Fedu,
7. (case Mjob when 'teacher' then 1 else 0 end) as Mjob,
8. (case Fjob when 'teacher' then 1 else 0 end) as Fjob,
9. (case guardian when 'mother' then 0 when 'father' then 1 else 2 end) as guardian,
10. traveltime,
11. studytime,
12. failures,
13. (case schoolsup when 'yes' then 1 else 0 end) as schoolsup,
14. (case fumsup when 'yes' then 1 else 0 end) as fumsup,
15. (case paid when 'yes' then 1 else 0 end) as paid,
16. (case activities when 'yes' then 1 else 0 end) as activities,
17. (case higher when 'yes' then 1 else 0 end) as higher,
18. (case internet when 'yes' then 1 else 0 end) as internet,
19. famrel,
20. freetime,
21. goout,
22. Dalc,
23. Walc,
24. health,
25. absences,
26. (case when G3>14 then 1 else 0 end) as finalScore
27. from ${t1};
Structure text data using the SQL script component.
The purpose of the normalization component is to remove the dimension and transform all the fields to 0 and 1, which eliminates the impact of the imbalance between the fields. The result is shown in the figure below.
The data set is split in a ratio of 8:2, in which 80% is used for model training, and 20% is used for prediction.
The offline model is generated by training through a logistic regression algorithm. If you are new to this algorithm, you can read more about logistic regression on Wikipedia.
View the accuracy of model predictions through the confusion matrix. As can be seen from the figure below, the prediction accuracy of this experiment is 82.911%.
According to the characteristics of the logistic regression algorithm, some valuable information can be mined through the model coefficients. Right click on the Binary Logistic Regression component to view the model. The results are shown below.
According to the characteristics of the logistic regression algorithm, the greater the weight, the greater the impact of the feature on the result. A positive weight indicates a positive correlation to the result 1 (high score in final exam), and a negative weight indicates a negative correlation. Several features with large weights are analyzed in the following table.
Due to the small dataset in this experiment, the above analysis results are not necessarily accurate and are for reference only.
Once generated, the offline model can be deployed online and the online prediction function can be implemented by calling restful-api.
To learn more about Alibaba Cloud Machine Learning Platform for Artificial Intelligence (PAI), visit www.alibabacloud.com/product/machine-learning
Alibaba Cloud Machine Learning Platform for AI: Heart Disease Prediction
Alibaba Clouder - October 15, 2020
Alibaba Cloud_Academy - September 1, 2023
Alibaba Clouder - April 20, 2020
Alibaba Cloud_Academy - February 16, 2022
Wenson - August 4, 2020
GarvinLi - January 18, 2019
An end-to-end platform that provides various machine learning algorithms to meet your data mining and analysis requirements.
Learn MoreA high-quality personalized recommendation service for your applications.
Learn MoreThis solution provides you with Artificial Intelligence services and allows you to build AI-powered, human-like, conversational, multilingual chatbots over omnichannel to quickly respond to your customers 24/7.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreMore Posts by GarvinLi