Developer Content

Introduction

Human-computer dialogue has always been one of the important research directions in the field of natural language processing. In recent years, with the advancement of human-computer interaction technology, dialogue systems are gradually moving towards practical applications. Among them, the intelligent customer service system has received extensive attention from many enterprises, especially medium and large enterprises. The intelligent customer service system aims to solve the situation that the traditional customer service model requires a lot of manpower. While saving manpower, it enables human customer service to provide higher-quality services for special problems or special users, thus realizing "smart customer service + manual customer service" in Overall improvement in service efficiency and service quality. In recent years, many medium and large companies have built their own intelligent customer service systems, such as Fujitsu's FRAP, JD.com's JIMI, and Alibaba's AliMe.

The construction of an intelligent customer service system needs to rely on the background of industry data, and based on related technologies such as massive knowledge processing and natural language understanding. The first-generation intelligent customer service system mainly deals with business content and responds to high-frequency business problems. This process relies on business experts to accurately sort out the answers to high-frequency business problems. The main technical point lies in the precise relationship between user problems and knowledge points. text matching capabilities. The new intelligent customer service system defines the service scope as a pan-business scenario. In addition to solving the core high-frequency business problems, the requirements for intelligent shopping guide capabilities, obstacle prediction capabilities, intelligent voice chat capabilities, life assistant functions, and life and entertainment interactions are also the same. Be valued and covered. Among them, emotional ability, as an important embodiment of human-like ability, has been practically applied in various dimensions of the intelligent customer service system, and has played a vital role in improving the system's human-like ability.

One: Sentiment Analysis Technology Architecture in Intelligent Customer Service System

Figure 1 shows the classic human-computer combination intelligent customer service mode. Users can receive services from robots or manual customer service through dialogue, and in the process of receiving robot services, they can use instructions or automatic recognition by robots. mode to jump to manual customer service. In the above-mentioned complete customer service model, sentiment analysis technology has been practically applied to the capabilities of multiple dimensions.

Two: User emotion detection

1 Introduction to User Emotion Detection Model
User emotion detection is the foundation and core of many emotion-related applications. In this paper, we propose an emotion classification model that integrates word semantic features, multiple phrase semantic features, and sentence-level semantic features to identify "anxious", "angry", and "thank you" contained in user dialogues of intelligent customer service systems. emotion. Regarding the extraction technology of semantic features at different levels, there have been many mentions in related work. We combine semantic features at different levels to effectively improve the final emotion recognition effect. Figure 2 shows the architecture diagram of the sentiment classification model.

2 Sentence-level semantic feature extraction
Shen et al. [3] proposed the SWEM model, which applies a simple pooling strategy to the word embedding vector to achieve sentence-level semantic feature extraction, and the classification model and text matching model trained based on such features can be compared with The experimental results of the classic convolutional neural network model and the cyclic neural network model are almost the same.

In our model, we utilize the feature extraction capability of the SWEM model to obtain the sentence-level semantic features of user questions and use them in a sentiment classification model for user questions.

3 Semantic Feature Extraction of Multiple Phrases
The traditional CNN model is used in many cases to extract semantic features of n-gram phrases, where n is a variable representing the convolution window size. In this paper, we set n to 2, 3 and 4 based on experience, and for each window size, we set 16 convolution kernels to extract rich n-grams from the original word vector matrix Phrase semantic information.

4 Word-level semantic feature extraction
We utilize the LEAM model [1] to extract word-level semantic features. The LEAM model simultaneously embedding words and category tags in the semantic space of the same dimension, and based on this representation, implements text classification tasks. LEAM uses the representation of category labels to increase the semantic interaction between words and labels, so as to achieve a deeper consideration of word-level semantic information. An illustration of the semantic interaction between category labels and words is given in Figure 3(2), and a comparison between the LEAM model and traditional models is given.

Finally, after the semantic features of different levels are merged together, they are input to the last layer of the whole model, and the final classification training is performed by the logistic regression model.

Table 1 shows the comparison results of the online real evaluation effect between our proposed ensemble model and three comparison models that only consider single-level features.

Three: User emotional appeasement
1 Introduction to the overall framework of user emotional comfort
The user emotional comfort framework proposed in this paper includes an offline part and an online part, as shown in Figure 4.

offline part

First, the user's emotion needs to be identified. Here we have selected seven common emotions of users who need to be appeased for identification. They are fear, insult, disappointment, grievance, anxiety, anger and gratitude.

Second, we identify the topics contained in user questions. Here, specialized business experts summarize 35 common topics expressed by users, including "complaints about service quality" and "feedback logistics are too slow". Topic recognition model, we use the same classification model design as emotion recognition.

Knowledge construction is aimed at some situations where users express more specific content, sorting out user problems that occur frequently and need to be appeased. The reason why these specific user questions are not merged into the above-mentioned topic dimension for unified processing is because the processing of the topic dimension is relatively coarse-grained. We hope to respond to these high-frequency and more focused problems with more focused reassurance and replies. achieve better response performance.

For the emotional dimension, the "emotion + theme" dimension, and the high-frequency user question dimension, business experts sorted out soothing and replying techniques of different granularities. In particular, in the dimension of frequent user questions, we call each "question-reply" pairing a piece of knowledge.

online section

Knowledge-based appeasement is aimed at appeasing users with specific emotional content expressions. Here we use a text matching model to evaluate the matching degree between user questions and questions in our collated knowledge. If there is a question that is very similar to the meaning of the current user input question in the knowledge we have sorted out, the corresponding reply will be returned to the user directly.

Emotional responses based on emotions and topics refer to giving users an appropriate emotional response by considering both the emotional and topical information contained in the user's expressed content. Compared with knowledge-based appeasement, the reply in this way will be more generalized.
The emotional reply based on the emotional category is to respond to the user by only considering the emotional factors in the user's expressed content. This reply method is a supplement and a cover for the above two reply methods, and the content of the reply will be more general.

Table 2 gives a comparison of the effects of the classification models for the emotions that need to be appeased, including the individual effects of each emotion category and the final overall effect. Table 3 gives a comparison of the performance of classification models for topics. Table 4 shows the effect of improving user satisfaction after adding emotional comfort to several negative emotions. Table 5 shows the effect of improving user satisfaction after adding emotional comfort to the emotion of gratitude.

Four: Emotional Generative Chat

1 Emotional Generative Chat Model
Figure 6 shows the model diagram of the emotional generative chat in the intelligent customer service system. In the figure, the source RNN acts as an encoder, mapping the source sequence s to an intermediate semantic vector C, while the target RNN acts as a decoder, which can encode C according to the semantics and the emotional representation E and topic representation T we set. Decode to get the target sequence y. The s and y here correspond to the two sentences of "I am in a good mood today" and "I am so happy!" which are composed of word sequences in the figure.

Usually, in order for the decoder to preserve the information from the encoder, the last state of the encoder is passed to the decoder as the initial state. At the same time, the encoder and decoder often use different RNN networks to capture different expression patterns of questions and responses. The specific calculation formula is as follows:

Although the Seq2Seq-based dialogue generation model has achieved good results, it is easy for the model to generate safe but meaningless replies in practical applications. The reason is that the decoder in this model only receives the last state output C of the encoder. This mechanism is not effective in dealing with long-term dependence, because the state memory of the decoder will gradually weaken or even lose the source as new words are continuously generated. sequence information. An effective way to alleviate this problem is to introduce attention mechanism [2].

In the Seq2Seq framework that introduces the attention mechanism, the output layer of the final decoder predicts the probability of a word based on the input:

The objective function of the training process and the search strategy in the prediction process are consistent with the traditional RNN, and will not be repeated here.

2 Results of emotional generative speech chat model
After the model training is complete, it is tested on real user questions, and the results are checked by business experts. The final pass rate of the answers is about 72%. In addition, the average length of the reply text is 8.8 characters, which is very in line with the demand for reply length in language chat scene. Table 6 shows the effect comparison between the model AET (Attention-based emotional & topical Seq2Seq model) in this paper and the traditional Seq2Seq model. The comparison mainly focuses on two aspects: the qualified rate of the content and the length of the reply. After adding emotional information, the content of the reply will be richer than the traditional seq2seq model, and the proportion of the content that meets the best robot chat reply length of "5-20 characters" according to user research analysis will also increase significantly, eventually making the overall Response pass rate increased significantly.

Figure 7 shows an example of the application of Ali emotional generative chat model in Space. The two answers in the figure are given by the emotional generative model, and, for the user input that the user insults the robot for being too stupid, our model can generate different answers according to the corresponding reasonable topics and emotions set, which enriches the variety of answers Sex, the two answers in the picture are generated by the two emotions of 'grievance' and 'sorry'.

Five: customer service quality inspection

1 Definition of customer service quality problems
The customer service quality inspection mentioned in this article is to detect the problematic service content that may arise during the dialogue between the manual customer service and the customer, so as to better discover the problems existing in the service process of the customer service personnel and assist the customer service personnel to improve , to improve customer service quality, and ultimately improve customer satisfaction. As far as the author knows, there is no publicly implemented artificial intelligence-related algorithm model for customer service quality detection in customer service systems.

Unlike man-machine dialogue, the dialogue between human customer service and customers is not in the form of one question and one answer, but the customer and customer service can respectively input multiple sentences of text continuously. Our goal is to detect whether each customer service speech contains two service quality problems of "negative" or "poor attitude".

2 Customer Service Quality Inspection Model
In order to detect the service quality of a customer speech, we need to consider its context, including user questions and customer service speech. The features we consider include text length, speaker roles, and text content. Among them, for the text content, in addition to using the SWEM model to extract the features of the current customer service speech to be detected, we also perform emotion detection on each round of speech in the context, and find the user emotion category and customer service emotion category as model features, and here The emotion recognition model used is also the same as that described in Chapter 2, and will not be repeated here. In addition, we also consider two structures (Model 1 in Figure 8 and Model 2 in Figure 9) to extract semantic features of text sequences based on context content.

Among them, after model 1 encodes the current customer service speech and its context based on GRU or LSTM, for the coding results, consider using forward and reverse GRU or LSTM to respectively encode the above text of the current customer service speech to be tested. Perform serialization coding again with the following coding results. The two serialized coding results obtained in this way both end with the current speech technique, which can better reflect the semantic information of the current speech technique. The model structure is shown in Figure 8.

In addition, Model 2 takes the encoding results of the current customer service speech and its context, and then performs the overall forward GRU or LSTM encoding in the order of front and back as the final semantic features. A partial display of the model structure is shown in Figure 9. Compared with model 2, model 1 will highlight the semantic information of the current words to be detected, while model 2 will more reflect the serialized semantic information of the overall context.

3 Customer Service Quality Inspection Experimental Results

We compare the effects of the two contextual semantic information extraction models. The comparison results are given in Table 7. The results show that the effect of model 1 is better than that of model 2. It can be seen that the semantic information of the current speech to be detected really needs to be given more weight. , and the semantic information of the context can play a role in assisting recognition. In addition, the GRU or LSTM methods mentioned above have little difference in the actual model training process, but the GRU method is faster than the LSTM method, so all model experiments use GRU method.

In addition, different from the indicator analysis at the model level, we also conducted corresponding analysis on the indicators at the actual system level of the model, including the two dimensions of quality inspection efficiency and recall rate. These two indicators are obtained by comparing the results of the model with the results of the previous pure manual quality inspection. As shown in Table 8, both the quality inspection efficiency and the recall rate of quality inspection have been greatly improved. Among them, the reason why the recall rate of manual quality inspection is relatively low is that it is impossible to manually inspect all customer service records.

Six: Session Satisfaction Estimation
1 session satisfaction
At present, among the performance evaluation indicators of the intelligent customer service system, one of the most important indicators is user session satisfaction. As far as the author knows, there is no related research results on the automatic estimation of user session satisfaction in the intelligent customer service system.

Aiming at the scenario of session satisfaction estimation in the intelligent customer service system, we propose a session satisfaction analysis model, which can better reflect the current user satisfaction degree to the intelligent customer service. Due to differences in evaluation standards among different users, there may be situations where the emotional categories are inconsistent when a large number of conversational content, conversational answer sources, and conversational emotional information are exactly the same. Therefore, we adopted two model training methods: the first is to train the model to fit the classification model of emotion categories (satisfactory, average, and dissatisfied), and the other is to train the model to fit the regression model of the conversational emotion distribution. The effects of the two methods were compared.

2. Feature Selection of Conversational Satisfaction
The conversational satisfaction model considers various dimensions of information: semantic information (user's speech), emotional information (obtained through the emotion detection model), and answer source information (the source of the answer that replies to the current speech).

Semantic information is the content information expressed during the communication between the user and the intelligent customer service. It can better reflect the current satisfaction of the user from the user's words. The semantic information we use in the model refers to the multi-round speech information in the conversation. In the process of model processing, in order to ensure that the model can handle the same round of speech each time, we only use the last 4 sentences in the conversation in the experiment. The reason for choosing this method is that through the session data analysis, the semantic information of the user at the end of the session is more related to the overall session satisfaction. For example, phrases such as expressing gratitude at the end of a session indicate that users are generally satisfied, while phrases such as expressing criticism are likely to indicate dissatisfaction with the service.

Emotional information generally plays a very large reference role in user satisfaction. When users experience extreme emotions such as anger and insults, the probability of user feedback dissatisfaction will be extremely high. The emotional information here is in one-to-one correspondence with the words in the semantic information, and the emotion recognition is performed on the selected rounds of words to obtain the corresponding emotion category information.

Answer source information can well reflect what kind of problems users encounter. Since different answer sources represent different business scenarios, the differences in user satisfaction status caused by problems in different scenarios are quite obvious. For example, complaints and rights protection are more likely to cause user dissatisfaction than consultation.

3 Session Satisfaction Model
In this paper, we propose a conversational satisfaction prediction model that combines semantic information features, emotional information features, and answer source information features. The model fully considers the semantic information in the conversation, and uses data compression to fully express the emotional information and the source information of the answer. The model structure is shown in Figure 10.

Semantic feature extraction. The semantic information extraction method uses hierarchical GRU/LSTM. The first layer obtains the sentence representation of each sentence (the first layer GRU/LSTM part in Figure 10), and the second layer obtains the high-level of user speech skills in multiple rounds according to the results of the first layer sentence representation. stage representation.

(The part of the second layer GRU/LSTM in Figure 10), where the sequence information of the user's speech is fully utilized. In addition to this, the SWEM sentence features of the last sentence are also obtained to enhance the impact of the semantic features of the last sentence.

Emotional feature extraction: Since the acquired emotional features are of the one-hot type, and one-hot has obvious shortcomings, the data is sparse and cannot represent the direct relationship between emotions. Here we learn an emotional embedding to better express emotional characteristics.

Answer source feature extraction: The initial answer source feature is also one-hot feature, but since there are more than 50 sources of answers, the data is very sparse, so feature compression is required. Here, the embedding learning method is also used to represent the source of the answer feature.

Model prediction layer: The satisfaction category prediction and satisfaction distribution prediction were tried separately, the former prediction belongs to the classification model, and the latter belongs to the regression model.

4 Experimental Results of Conversational Satisfaction Prediction

The experimental results are shown in Figure 11. From the experimental results, the satisfaction prediction effect of the classification model is poor, and the average is more than 4 percentage points higher than the actual user feedback. expected. As shown in Table 9, the difference between the mean value of the regression model and the result of the real feedback from the user is only 0.007, while the variance is reduced by one-third than before, which proves the effectiveness of the regression model.

seven summary

This paper summarizes some practical application scenarios of sentiment analysis capabilities in the current intelligent customer service system, as well as the corresponding model introduction and effect display. Although the ability of emotional analysis has penetrated into all aspects of the human-machine dialogue process of the intelligent customer service system, it can only be regarded as the beginning of a good attempt, and it still needs to play a greater role in the process of building human-like capabilities in the intelligent customer service system .

What are the cutting-edge technologies and applications of natural language intelligence? How can natural language technology be closely integrated with industries and scenarios to generate greater value? Si Luo, head of the language technology laboratory of DAMO Academy, outstanding scientist of ACM, and senior researcher of Alibaba, will introduce the current status and trends of natural language research and development, as well as the exploration and achievements made by DAMO Academy in natural language intelligence .

Sentiment Analysis Technology: Let Intelligent Customer Service Understand Human Emotions Better

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse