This topic describes how to implement public opinion risk control based on the reviews from a takeaway platform.

Background information

Many merchants provide online platforms for consumers to write reviews and provide feedback on purchased products. Consumer feedback includes positive feedback and negative feedback, that is, praises and criticisms. Merchants can determine whether their product quality meets consumer needs based on consumer opinions on their products. Merchants can obtain the consumer opinion trend by analyzing consumer review content and use the trend as a guide for product research and development.

A large number of reviews are submitted on the online review platforms of merchants every day. Traditionally, the merchants manually collect statistics on public opinion. This method is inefficient and fails to accurately collect statistics on public opinion if the base of data is large. Therefore, the merchants need an approach to automatically collect statistics on public opinion to determine the public opinion trend. Machine Learning Platform for AI (PAI) provides a set of algorithms that are based on text vectorization and classification. These algorithms can create a classification model based on positive and negative reviews with historical flags. You can use the model to automatically predict the trend of new reviews. The overall modeling framework, which is developed based on 11,987 labeled reviews that are collected from a takeaway platform, is preset in Machine Learning Studio. The framework implements automatic risk control of positive and negative public opinion, with an accuracy of about 75%.

You can use the experiment template that is preset in Machine Learning Studio to develop a solution for public opinion risk control within one to two days. Then, you can use the solution to analyze a large number of reviews at a time. As the number of reviews increases, the prediction accuracy of the model will be continuously improved. This solution is applicable to text analysis such as spam classification and classification of positive and negative opinions on news.

Dataset

The experiment described in this topic is based on real data that is collected from a takeaway platform after data masking. The following table describes the fields in the data.
Field Data type Description
label DOUBLE Indicates whether the review is positive or negative. Valid values:
  • 1: positive review
  • 0: negative review
review STRING The review content.
The following figure shows the sample data that is used in the experiment.Sample data of the experiment

Procedure

  1. Go to the Machine Learning Studio console.
    1. Log on to the PAI console.
    2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
    3. On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.Machine Learning
  2. Create an experiment.
    1. In the left-side navigation pane, click Home.
    2. In the Templates section, click Create below Public opinion risk control based on takeaway reviews.
    3. In the New Experiment dialog box, set the experiment parameters. You can use the default values of the parameters.
      Parameter Description
      Name The name of the experiment. Default value: Public opinion risk control based on takeaway reviews. The name must be 1 to 32 characters in length. Enter a name that meets this requirement, for example, Public opinion risk control.
      Project The project in which you want to create the experiment. You cannot change the value of this parameter.
      Description The description of the experiment. Default value: Use NLP algorithm to analyze takeaway reviews to determine the positive and negative emotions of users.
      Save To The directory for storing the experiment. Default value: My Experiments.
    4. Click OK.
    5. Optional:Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
    6. Optional:Click Public opinion risk control_XX under My Experiments. The canvas of the experiment appears.
      My Experiments is the directory for storing the experiment that you created and Public opinion risk control_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.
    7. View the components of the experiment on the canvas, as shown in the following figure. The system automatically creates the experiment based on the preset template.
      takeaway platform
      Area No. Description
      1 The pai_online_project.text_emotion_analysis-1 component imports review data.
      2 The pai_online_project.stop_word-2 component imports stopwords. Stopwords include auxiliary verbs and punctuation marks. You must manually upload a stopword table, as shown in the following figure.stopword table
      3 The components in this area vectorize the text. The Doc2Vec-1 component uses the Doc2Vector algorithm to convert each review to a semantic vector. Each row represents a vector and each vector represents the meaning of a review. After the experiment is run, right-click Doc2Vec-1 on the canvas and choose View Data > View Output Port 1. In the dialog box that appears, view the text vector table.
      4 The components in this area generate a binary classification model. The Split-1 component uses a splitting algorithm to split the vectorized text into a training dataset and a prediction dataset. Then, the Logistic Regression for Binary Classification-1 component uses the logistic regression algorithm to train a binary classification model based on the training dataset. The model can determine whether a review is positive or negative.
      5 The components in this area use a confusion matrix to evaluate the quality of the model.
  3. Run the experiment and view the result.
    1. In the top toolbar of the canvas, click Run.
    2. After the experiment is run, right-click Confusion Matrix-1 on the canvas and select View Evaluation Report.
    3. In the Confusion Matrix dialog box, click the Statistics tab and view the model statistics.