Gbdt-LR model

Related Tags:1.GBDT Regression
2. GBDT Binary Classification

Abstract: The GBDT+LR model was proposed by Facebook in 2014. The model uses GBDT to automatically filter and combine features to generate a new discrete feature vector, and then uses the feature vector as the input of the LR model to generate the final prediction result.

Introduction: This model can make comprehensive use of various features such as users, items, and contexts to generate more comprehensive recommendations, and is widely used in CTR click-through rate prediction scenarios.

Use directly



Please open the Gbdt-LR model and click "Open in DSW" in the upper right corner.

Gbdt + LR integrated model training and service deployment

1: The GBDT+LR model was proposed by Facebook in 2014. The model uses GBDT to automatically filter and combine features to generate a new discrete feature vector, and then uses the feature vector as the input of the LR model to generate the final prediction result. This model can make comprehensive use of various features such as users, items, and contexts to generate more comprehensive recommendations, and is widely used in CTR click-through rate prediction scenarios.
2: This article will introduce how to use Alink to quickly build a Gbdt+LR model based on DSW, and how to easily deploy the established model as a service.

Scale to Larger Data



In this example, we use useLocalEnv to run the Alink job locally (that is, in the DSW container), and use multi-threading to simulate distributed computing.
For larger-scale data, use usePAIEnv to submit jobs to large-scale clusters, and use help(usePAIEnv) to view the detailed usage.
Data preparation
Adult data source https://archive.ics.uci.edu/ml/datasets/Adult
Algorithm related documents:
• https://www.yuque.com/pinshu/alink_doc/csvsourcebatchop
The Adult data set (that is, the "census income" data set) is extracted from the US census data set database, which contains a total of 48,842 records. The annual income is greater than 50k US dollars accounted for 23.93%, and the annual income is less than 50k US dollars. The ratio is 76.07%, and it has been divided into 32561 training data and 16281 test data. The class variable of this data set is whether the annual income exceeds 50k dollars, and the attribute variables include 14 types of important information such as age, job type, education, occupation, etc., of which 8 types belong to category discrete variables, and the other 6 types belong to numerical continuous variables. This dataset is a classification dataset to predict whether the annual income is more than $50k.



Training model



Algorithm related documents:
• https://www.yuque.com/pinshu/alink_doc/intro
• https://www.yuque.com/pinshu/alink_doc/gbdtencoder
• https://www.yuque.com/pinshu/alink_doc/logisticregression
We complete the integrated training of the model by putting the two operators GbdtEncoder and LR into a Pipeline. Here, GbdtEncoder is used to encode the input data, and the encoded result is sent to LR for training. Finally, we get a pipeline model, which can be used to reason about data and can also be deployed as a service.

Model evaluation



Algorithm related documents:
• https://www.yuque.com/pinshu/alink_doc/evalbinaryclassbatchop
• https://www.yuque.com/pinshu/alink_doc/jsonvaluebatchop
In the model evaluation phase, we first use the above-trained model to infer testData, then use the evaluation component EvalBinaryClassBatchOp to evaluate the inference results, and finally use the JsonValueBatchOp component to complete the extraction of evaluation results.

Model write out



Algorithm related documents:
• https://www.yuque.com/pinshu/alink_doc/aksinkbatchop
In the model writing phase, we use AkSinkBatchOp to write the model to the file system, where the file system can be a local file system (as shown in the code) or a network file system (such as OSS).

Load the model and infer



The path to load the model here is the same as when the model was written, which can be a local file system (as shown in the code) or a network file system (such as OSS).

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us