Currently, many merchants provide online platforms for consumers to write comments and give feedback on purchased items. Consumer feedback includes praises and criticisms. Merchants need to determine whether the product quality meets consumer needs based on consumers’ opinions on products, and read consumer comments to analyze the consumer opinion trend and guide future product development.
At present, a large number of comments are created on the comment platforms of hotels, restaurants, and retail stores every day. The approach of manually collecting statistics on public opinion is inefficient and fails to produce accurate data on extensive public opinion. We need to devise an approach to automatically collect statistics on public opinion to determine the public opinion trend of comment platforms.
Machine Learning Platform for AI (PAI) provides a set of algorithms based on text vectorization and classification, which are used to create a classification model based on the positive (praising) and negative (critical) comments with historical flags. The created model can be used to automatically predict new comments. The overall modeling framework has been developed based on PAI by using 11,987 labeled comments collected from a takeaway comment platform. The framework implements risk control of positive and negative public opinions, with an accuracy of about 75%.
Required knowledge: basic knowledge of natural language processing (NLP) and classification algorithms, especially how such knowledge is applied to model debugging.
Development cycle: one to two days.
Required data: more than one thousand labeled data items. The prediction effect is better when more labeled data items are available.
|label||Label. 1 indicates a positive comment, and 0 indicates a negative comment.|
|review||Actual comment data.|
Log on to PAI Studio at https://pai.data.aliyun.com/console
The solution data and experiment environment are built in the corresponding template on the homepage .
Open the experiment:
- Data source
The data source is the comments described in the preceding sections.
- Deprecated words
Manually upload the deprecated word table to filter auxiliary verbs and punctuation marks.
- Text vectorization
Use the Doc2vector algorithm to convert each comment to a semantic vector. Each line includes a vector, and each vector represents the meaning of a comment.
- Create a classification model
Use the splitting algorithm to split vectorized text into the training set and test set. Train the training set by using the logistic regression algorithm to create a binary classification model. This model can be used to determine whether a comment is a positive or negative comment.
- Model effect verification
Use the confusion matrix algorithm to verify the actual effect of the model.
The public opinion risk control approach based on comments analysis can be developed in one to two days through PAI. The approach can intelligently analyze comments in batches. The accuracy of the model is improved with the increase in comments. This approach is applicable to textual analysis, such as spam classification and classification of positive and negative public opinions on news.