How to use machine learning methods
▌Background introduction
The important role of life stages in consumer behavior has been studied for decades in marketing and sociology. Although these studies did not focus on consumer behavior, they studied the life cycle of various people and events, which provides a solid basis for studying the impact of life stages on consumer behavior. In e-commerce, more research has focused on product recommendations based on consumers' historical behavior than on users' life-stage transitions.
For example, an e-commerce company will mine users with similar preferences and recommend current users based on the preferences of other similar users. However, research on the relationship between consumer life stages and consumer behavior has just begun. Driven by these studies, we propose a dynamic fusion algorithm that can be used for life stage inference in e-commerce.
We used multiple logistic regression models to classify life stage predictions and generate corresponding probability distributions. In addition, we also develop a dynamic fusion method to continuously improve the prediction accuracy, and can effectively improve the computational efficiency. Every time a new probability distribution is generated, we update and then maintain multiple probability distributions. Doing so can identify short-term interests of consumers and is also very helpful for life-stage predictions of multiple children.
To evaluate the effectiveness of the algorithm, we conduct extensive offline and online numerical experiments, which show that our method can significantly improve the accuracy of consumer life stage inference.
The main contributions of this paper are:
We provide an industrial-grade solution for life-stage inference.
We develop a dynamic fusion method that can continuously improve prediction accuracy based on substantial savings in computing resources, and can easily maintain predictions for multiple child age stages.
We validate the effectiveness of our solution for life stage inference with real data.
Life stages of maternal and child users
User behavior will change with the life stage, and the change of consumption behavior is usually consistent with the change of life stage. This phenomenon is more obvious in vertical industries such as mother and baby. For example, mothers buy diapers when they are newborns; after two to three years, when it's time for kindergarten, mothers buy more clothes and shoes. The phenomenon that consumer behavior changes with life stages does not only exist in the maternal and child industry, but also in other industries such as home improvement and automobiles. In this article, we will focus on the maternal and child industry, that is, inferring life stages based on parents' consumption behavior.
According to our industry perception, a child's life stage development is a continuous process closely related to age. Therefore, we divide the life stages of mother and baby users into the following stages according to the age of the child: before birth (mother's pregnancy); 0-6 months (newborn); 6-1 month; 2-3 years old (nursery ); 3-7 years old (kindergarten). Parents of children of different ages are interested in different products, if we can accurately predict the age of their children and recommend suitable products, the conversion rate can be greatly improved.
dynamic fusion method
To infer a child's life stage, we developed an algorithm that continuously predicts and improves the inference. Instead of using complex models to make one-off predictions, our algorithm produces a better prediction based on the current data each time, and then continuously updates our inferences, a process known as dynamic fusion. The details of the dynamic fusion process will be described in detail below.
The predicted probability distribution for a child's life stage contains more information than the individual predictions. For example, when there are two life stages in the distribution with high probabilities, this indicates that the consumer may have two children, or that the consumer's child is at the junction of two life stages, but we cannot predict from a single result get this information. Preserving these probability distributions allows us to infer the life stages of the user's children and allows us to update the inference when appropriate. However, the behavior of consumers in different months may lead to different life stages with the highest probability in different distributions. How to maintain and update these distributions becomes the key to our solution. So we design a dynamic fusion algorithm to solve this problem.
With the feature vector X as input, we can predict the monthly probability distribution through the model, and we will introduce the training details of the model in the next section. Now suppose we already have a probability distribution of 1.jpg, and at the end of the next month, the model produces another probability distribution of 1.jpg. After having these two distributions, in order to fuse them, the previous distribution needs to be translated first, and the translation method is given by the following formula:
where ∆ is the time difference between the output time of the previous distribution and the current month. There are several ways to translate probability distributions, and the method we use is described in detail in Algorithm 1.
We then compare the shifted distributions if the life stage with the highest probability is the same for both distributions, that is:
This way we can fuse the two distributions together:
where is used to normalize the new distribution. If multiple distributions have different life stages with the highest probabilities, those distributions are kept for possible future fusion. We count the number of fusions for each distribution, and the inferred life stage is determined by the distribution with the most fusions. When a new monthly distribution is generated, the next iteration of the algorithm will follow the same logic.
▌Feature Engineering
In the e-commerce scenario, all features come from five types of consumer behaviors: search, click, favorite, additional purchase and purchase. The features we use fall into the following categories:
1. Category characteristics
There is a multi-level category structure in the category system, in which the first-level category includes major categories such as clothes and shoes; the category level can reach up to 4 or 5, and the lowest-level category without subcategories is called the leaf category head. In theory, we can use product ID as a feature, but this will cause the feature matrix to be too sparse, and only a very small number of samples will contain certain features. In order to avoid this situation and still capture the different consumption interests of users, we use the first-level category and leaf category corresponding to the product as features.
2. Category attribute characteristics
Products under the same category will share some attributes: for example, the brand attribute of the product may be specific brands such as IBM and New Balance, and the size attribute of the product may be "S", "M" or "L". Category attribute features refer to the combination features of commodity categories and attributes. We use all category attribute features in the maternal and child industry as the input of the model.
3. Commodity attribute characteristics
In addition to category attribute features, we also use the attributes of the product itself as input features.
4. Search word features
The search term refers to the keyword used by the user to search, which may directly correspond to the age group of the child, such as "3-year-old baby's clothes", "3-stage milk powder", etc. We will select some specific keywords as input features.
5. Product Title Features
Product titles contain a wealth of information, which may also include information about age groups or life stages. We sorted out about 200 age-related keywords, processed the titles and added them to the model as input features.
6. Timing characteristics
Consumers buying the same item on different dates also mean different things, a mom buying diapers 6 months ago has different meanings than buying diapers 1 week ago. Therefore, the behavior of different months is also classified as different features. In order to reduce the computational burden during model training, we use all the behaviors of the user in the last month and the purchase behavior of the same month last year as input features.
▌Numerical experiments
In order to prove that the method we designed is effective, we used the half-year taobao.com data to conduct multiple experiments (from September 2016 to February 2017). Figures 1 and 2 briefly illustrate the distribution of life stages in the training and test sets, respectively.
It is worth noting that as time goes by, it will gradually grow, so the distribution of life stages will move to the right along the time axis. In the experiment, we first use the data in September 2016 to make predictions. The resulting life stage is fused with the prediction of the next month's life stage according to Equation (2). The conclusion after fusion will serve as the final conclusion on life stages in October. And this conclusion will continue to be fused with the conclusion of the next month's single-month data prediction (the conclusion of the life stage obtained by using the characteristic data in November 2016). Correspondingly, the prediction conclusion obtained by using the single-month feature is used as the result of the control group experiment. We refer to the method used in the control experiment as the memoryless method. That is to say, this method only uses the feature data of the last month to make inferences.
Tables 2, 3, and 4 show the experimental results from September 2016 to February 2017. It is worth noting that since September 2016 is used as the starting point, the results of the two methods are the same. As shown in Table 2 and Table 3, methods based on dynamic fusion outperform.
Tables 2, 3, and 4 show the experimental results from September 2016 to February 2017. It is worth noting that since September 2016 is used as the starting point, the results of the two methods are the same. As shown in Table 2 and Table 3, the method based on dynamic fusion is almost 10% better than the memoryless method. Table 4 shows the comparison results for different months. In 2 of the 5 months (September 2016 is not included in the comparison results), the dynamic fusion method significantly outperformed the memoryless method in accuracy (up to 15%), and the memoryless method outperformed the memoryless method in the other 3 months. The dynamic fusion method predicts more accurately (less than 10%). The dynamic fusion method outperforms the memoryless method in recall. In 5 months, the dynamic fusion method surpassed the no-memory method in 3 months, the highest record was as high as 21.76%, and it was only slightly inferior to the no-memory method in the other 2 months, and the worst record was no more than -1%.
The reason behind this is that the intensity of consumer behavior varies drastically during the peak and low seasons of the year. For any given month for which there is a wealth of consumer behavior data, the accuracy of the two methods is quite similar. Then, for any month when the consumer behavior data is very sparse, the accuracy of the memoryless method will drop significantly, and then the dynamic fusion method can subtly overcome the above shortcomings by fusing the conclusions obtained in the months with rich consumer behavior.
The more months of experimentation, the more stable the conclusions about life stage become because more and more consumer behavior is implicitly considered in the model. Another advantage of this approach is that any wrong inferences can be corrected in subsequent months. This can be illustrated by the results of the experiment in November, December 2016, January and February 2017. The wrong conclusions in November were gradually corrected in the following months. Readers will question whether correct conclusions may also be changed to wrong conclusions, but our experiments show that in general, dynamic fusion methods are not worse than memoryless methods.
▌Conclusion
In this paper, we introduce an innovative approach called Dynamic Fusion to infer young children's life stages based on parents' consumption behavior. We cover feature engineering in detail, along with algorithmic details. We also conduct computational experiments to demonstrate the advantages of this algorithm.
For months with abundant consumer behavior data, the method is no worse than the control group, however in months with sparse consumer behavior data, the method can outperform the control method. In future work, we can explore other machine learning models for predicting single-month behavior, or multi-level models to continue to improve performance (e.g. one machine learning model for predicting some specific life stages and another machine learning model for predicting other life stages). In addition, the prediction of this life stage can also be used by other application scenarios (such as recommender systems).
The important role of life stages in consumer behavior has been studied for decades in marketing and sociology. Although these studies did not focus on consumer behavior, they studied the life cycle of various people and events, which provides a solid basis for studying the impact of life stages on consumer behavior. In e-commerce, more research has focused on product recommendations based on consumers' historical behavior than on users' life-stage transitions.
For example, an e-commerce company will mine users with similar preferences and recommend current users based on the preferences of other similar users. However, research on the relationship between consumer life stages and consumer behavior has just begun. Driven by these studies, we propose a dynamic fusion algorithm that can be used for life stage inference in e-commerce.
We used multiple logistic regression models to classify life stage predictions and generate corresponding probability distributions. In addition, we also develop a dynamic fusion method to continuously improve the prediction accuracy, and can effectively improve the computational efficiency. Every time a new probability distribution is generated, we update and then maintain multiple probability distributions. Doing so can identify short-term interests of consumers and is also very helpful for life-stage predictions of multiple children.
To evaluate the effectiveness of the algorithm, we conduct extensive offline and online numerical experiments, which show that our method can significantly improve the accuracy of consumer life stage inference.
The main contributions of this paper are:
We provide an industrial-grade solution for life-stage inference.
We develop a dynamic fusion method that can continuously improve prediction accuracy based on substantial savings in computing resources, and can easily maintain predictions for multiple child age stages.
We validate the effectiveness of our solution for life stage inference with real data.
Life stages of maternal and child users
User behavior will change with the life stage, and the change of consumption behavior is usually consistent with the change of life stage. This phenomenon is more obvious in vertical industries such as mother and baby. For example, mothers buy diapers when they are newborns; after two to three years, when it's time for kindergarten, mothers buy more clothes and shoes. The phenomenon that consumer behavior changes with life stages does not only exist in the maternal and child industry, but also in other industries such as home improvement and automobiles. In this article, we will focus on the maternal and child industry, that is, inferring life stages based on parents' consumption behavior.
According to our industry perception, a child's life stage development is a continuous process closely related to age. Therefore, we divide the life stages of mother and baby users into the following stages according to the age of the child: before birth (mother's pregnancy); 0-6 months (newborn); 6-1 month; 2-3 years old (nursery ); 3-7 years old (kindergarten). Parents of children of different ages are interested in different products, if we can accurately predict the age of their children and recommend suitable products, the conversion rate can be greatly improved.
dynamic fusion method
To infer a child's life stage, we developed an algorithm that continuously predicts and improves the inference. Instead of using complex models to make one-off predictions, our algorithm produces a better prediction based on the current data each time, and then continuously updates our inferences, a process known as dynamic fusion. The details of the dynamic fusion process will be described in detail below.
The predicted probability distribution for a child's life stage contains more information than the individual predictions. For example, when there are two life stages in the distribution with high probabilities, this indicates that the consumer may have two children, or that the consumer's child is at the junction of two life stages, but we cannot predict from a single result get this information. Preserving these probability distributions allows us to infer the life stages of the user's children and allows us to update the inference when appropriate. However, the behavior of consumers in different months may lead to different life stages with the highest probability in different distributions. How to maintain and update these distributions becomes the key to our solution. So we design a dynamic fusion algorithm to solve this problem.
With the feature vector X as input, we can predict the monthly probability distribution through the model, and we will introduce the training details of the model in the next section. Now suppose we already have a probability distribution of 1.jpg, and at the end of the next month, the model produces another probability distribution of 1.jpg. After having these two distributions, in order to fuse them, the previous distribution needs to be translated first, and the translation method is given by the following formula:
where ∆ is the time difference between the output time of the previous distribution and the current month. There are several ways to translate probability distributions, and the method we use is described in detail in Algorithm 1.
We then compare the shifted distributions if the life stage with the highest probability is the same for both distributions, that is:
This way we can fuse the two distributions together:
where is used to normalize the new distribution. If multiple distributions have different life stages with the highest probabilities, those distributions are kept for possible future fusion. We count the number of fusions for each distribution, and the inferred life stage is determined by the distribution with the most fusions. When a new monthly distribution is generated, the next iteration of the algorithm will follow the same logic.
▌Feature Engineering
In the e-commerce scenario, all features come from five types of consumer behaviors: search, click, favorite, additional purchase and purchase. The features we use fall into the following categories:
1. Category characteristics
There is a multi-level category structure in the category system, in which the first-level category includes major categories such as clothes and shoes; the category level can reach up to 4 or 5, and the lowest-level category without subcategories is called the leaf category head. In theory, we can use product ID as a feature, but this will cause the feature matrix to be too sparse, and only a very small number of samples will contain certain features. In order to avoid this situation and still capture the different consumption interests of users, we use the first-level category and leaf category corresponding to the product as features.
2. Category attribute characteristics
Products under the same category will share some attributes: for example, the brand attribute of the product may be specific brands such as IBM and New Balance, and the size attribute of the product may be "S", "M" or "L". Category attribute features refer to the combination features of commodity categories and attributes. We use all category attribute features in the maternal and child industry as the input of the model.
3. Commodity attribute characteristics
In addition to category attribute features, we also use the attributes of the product itself as input features.
4. Search word features
The search term refers to the keyword used by the user to search, which may directly correspond to the age group of the child, such as "3-year-old baby's clothes", "3-stage milk powder", etc. We will select some specific keywords as input features.
5. Product Title Features
Product titles contain a wealth of information, which may also include information about age groups or life stages. We sorted out about 200 age-related keywords, processed the titles and added them to the model as input features.
6. Timing characteristics
Consumers buying the same item on different dates also mean different things, a mom buying diapers 6 months ago has different meanings than buying diapers 1 week ago. Therefore, the behavior of different months is also classified as different features. In order to reduce the computational burden during model training, we use all the behaviors of the user in the last month and the purchase behavior of the same month last year as input features.
▌Numerical experiments
In order to prove that the method we designed is effective, we used the half-year taobao.com data to conduct multiple experiments (from September 2016 to February 2017). Figures 1 and 2 briefly illustrate the distribution of life stages in the training and test sets, respectively.
It is worth noting that as time goes by, it will gradually grow, so the distribution of life stages will move to the right along the time axis. In the experiment, we first use the data in September 2016 to make predictions. The resulting life stage is fused with the prediction of the next month's life stage according to Equation (2). The conclusion after fusion will serve as the final conclusion on life stages in October. And this conclusion will continue to be fused with the conclusion of the next month's single-month data prediction (the conclusion of the life stage obtained by using the characteristic data in November 2016). Correspondingly, the prediction conclusion obtained by using the single-month feature is used as the result of the control group experiment. We refer to the method used in the control experiment as the memoryless method. That is to say, this method only uses the feature data of the last month to make inferences.
Tables 2, 3, and 4 show the experimental results from September 2016 to February 2017. It is worth noting that since September 2016 is used as the starting point, the results of the two methods are the same. As shown in Table 2 and Table 3, methods based on dynamic fusion outperform.
Tables 2, 3, and 4 show the experimental results from September 2016 to February 2017. It is worth noting that since September 2016 is used as the starting point, the results of the two methods are the same. As shown in Table 2 and Table 3, the method based on dynamic fusion is almost 10% better than the memoryless method. Table 4 shows the comparison results for different months. In 2 of the 5 months (September 2016 is not included in the comparison results), the dynamic fusion method significantly outperformed the memoryless method in accuracy (up to 15%), and the memoryless method outperformed the memoryless method in the other 3 months. The dynamic fusion method predicts more accurately (less than 10%). The dynamic fusion method outperforms the memoryless method in recall. In 5 months, the dynamic fusion method surpassed the no-memory method in 3 months, the highest record was as high as 21.76%, and it was only slightly inferior to the no-memory method in the other 2 months, and the worst record was no more than -1%.
The reason behind this is that the intensity of consumer behavior varies drastically during the peak and low seasons of the year. For any given month for which there is a wealth of consumer behavior data, the accuracy of the two methods is quite similar. Then, for any month when the consumer behavior data is very sparse, the accuracy of the memoryless method will drop significantly, and then the dynamic fusion method can subtly overcome the above shortcomings by fusing the conclusions obtained in the months with rich consumer behavior.
The more months of experimentation, the more stable the conclusions about life stage become because more and more consumer behavior is implicitly considered in the model. Another advantage of this approach is that any wrong inferences can be corrected in subsequent months. This can be illustrated by the results of the experiment in November, December 2016, January and February 2017. The wrong conclusions in November were gradually corrected in the following months. Readers will question whether correct conclusions may also be changed to wrong conclusions, but our experiments show that in general, dynamic fusion methods are not worse than memoryless methods.
▌Conclusion
In this paper, we introduce an innovative approach called Dynamic Fusion to infer young children's life stages based on parents' consumption behavior. We cover feature engineering in detail, along with algorithmic details. We also conduct computational experiments to demonstrate the advantages of this algorithm.
For months with abundant consumer behavior data, the method is no worse than the control group, however in months with sparse consumer behavior data, the method can outperform the control method. In future work, we can explore other machine learning models for predicting single-month behavior, or multi-level models to continue to improve performance (e.g. one machine learning model for predicting some specific life stages and another machine learning model for predicting other life stages). In addition, the prediction of this life stage can also be used by other application scenarios (such as recommender systems).
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00