How to make machine customer service more like human customer service


Human beings express the knowledge in their brains through language, and transfer knowledge to each other through dialogue. Machines can learn fluent sentence expressions to a certain extent by learning a large amount of corpus, but without knowledge, they will only generate beautiful but meaningless replies. The traditional modular dialogue model can fill in the key information into the answer template through database query and other methods, but the end-to-end dialogue generation model is more complicated.

To solve this problem, memory networks (Memory Networks) are usually a good technical method. However, the existing memory network combined with the dialogue system only provides a method of how to introduce knowledge, and cannot handle knowledge from multiple sources and structures well.

Therefore in this paper, we propose a heterogeneous memory network (Heterogeneous Memory Networks, HMNs) to simultaneously process user sentences, dialogue history and background knowledge base. HMNs consist of a context-free memory network and our proposed context-aware memory network, which are used to encode and store structured knowledge tuples (knolwdge tuples) and serialized user sentences, respectively. , historical dialogue, and generate two small vocabulary distributions (knowledge vocabulary and historical dialogue vocabulary) and a large vocabulary (all trained vocabulary distributions) for word selection for reply sentence generation. Experimental results on three datasets show that HMNs surpass the existing SOTA models and can significantly improve the performance of end-to-end task-based dialogue models.

problem background

When answering customer questions, human customer service first understands the user's language, then queries the required knowledge in relevant databases and knowledge bases, and finally sorts out and answers questions. In this process, if there is a lack of corresponding knowledge data, it is almost impossible for even humans to accurately answer the questions that users need, because the key point that customers may need is the knowledge in the database, and the beautiful reply deviates from the point Also ineligible. The same goes for machines. If the generative model is only learned from historical dialogue data, it may eventually learn only the safest and most general reply, but the lack of key knowledge in actual dialogue will also lead to the inability to solve specific transactions. Therefore, it is very important to properly introduce knowledge into the dialogue model.

As shown in the figure, it is a typical task-based dialogue. The user question What is the address for keen needs to generate a system reply statement based on the above hotel_keen and 2_miles, 578_arbol_dr retrieved from KnowledgeBase as key information. In this case, traditional pipeline-type dialogue systems usually use slot filling and retrieval to find the key information needed, which requires a lot of manual annotation. The development of deep learning has prompted us to explore further: many [7, 8] works have proved that fully data-driven task-based dialogue is also feasible and has certain prospects. Existing fully data-driven task-based dialogue models are usually based on the Sequence to Sequence model, while using an attention mechanism to introduce external knowledge. Madotto further proposed the Mem2Seq (Memory to Sequence) model [1] on these basis, and introduced the idea of multihop attention and pointer network into the dialog generation model.

Existing models mix and stitch together information from different sources and different structures, and use the same network (RNN or Memory Network) to express it. But when human beings think, they don’t mix all the information together, but they have a way of thinking: for example, in this dialogue task, the real customer service first considers the problem, then combines the context, relies on this information to consult the database, and finally generates the final reply . Under the guidance of this idea, we consider that different knowledge plays different roles in the task of dialogue. For example, historical dialogue actually guides the context information and answer patterns, while knowledge/database information is more like inserting into The "value" in the reply. Similarly, historical dialogue is a kind of serialized information, and knowledge base is often a kind of structured data, so we also need to adopt an expression model that is more suitable for their structure for different information. Therefore, we propose a heterogeneous memory network model (Heterogeneous Memory Networks, HMNs) combined with the Encoder-Decoder framework to better complete the dialogue task. The main contributions of this paper are:

We proposed the HMNs model combined with the Encoder-Decoder model for the first time to explore the use of different memories to represent the dialogue history and knowledge base.
We propose a context-aware memory network enhanced with a gating mechanism to better learn and express context-dependent information such as dialogue history.
We tested on three popular public datasets, and the experimental results show that the overall level of the dialogue model significantly exceeds the existing SOTA model, and the improvements in different parts are effective.
The main structure of our model adopts the Encoder-Decoder framework commonly used in end-to-end generative models. This chapter will describe the model structure in detail.

1. Encoder

The role of Encoder is to express the dialogue history encoding and user query into a content vector (contextvector). The Encoder part uses a context-sensitive memory network (the left half of the decoder in the figure below) to take historical dialogues and user queries as input. Each input vocabulary consists of (1) the token itself (2) the turn of the sentence and (3) the identity of the dialogue. For example, if the user says "hello" in the first round and the system replies "may i help you", the input is: [(hello, t1, user),(may, t1, sys),(I, t1, sys),( help, t1, sys), (you, t1, sys)], t1 is the first round of the dialogue, user and sys represent what the user and the system said. Converting hello, t1, and user into vectors and adding them is the embedding of this word. Finally, the embedding sequence is input into the context-sensitive memory network to finally obtain the context vector.

2. Context-sensitive memory network

Since traditional memory discards contextual sequence information when storing dialogue history, we modify it on the basis of end to end memory networks [5]. The memory network is composed of multiple hops, and the hop is formed by adding and splicing each word embedding of the embedding matrix randomly initialized by each hop itself. Here we adopt an adjacent weight sharing scheme, which means not only the input embedding matrix in the k-th hop, but also the output embedding matrix in the (k-1)th hop. In order to allow the memory network to better learn and express contextual information, we add a gating mechanism between storage units, and the gate used refers to the method of bidirectional GRU [6]. When loading sequence information, the context-related vocabulary expression will be obtained through the gate mechanism and stored in the memory network.

Input the query vector, calculate the attention weights and get the output vector. Adding the query and output vectors, we get the output of the kth hop. Here is also the query vector of (k+1) hops.

3. Heterogeneous memory network

HMNs, as shown in the previous figure, contain a context-sensitive memory network and a context-independent memory network. The context-sensitive memory network has been described in detail above, while the context-free memory network is exactly the same as the end-to-end memory network. Context-sensitive memory networks are used to store and represent historical dialogue information, while context-free memory networks are used to represent knowledge base information. The way the two are connected is to output a query vector through the context-sensitive memory network as the input of the context-free memory network. In this way, the RNN controller inputs into the historical dialogue to combine the context information and the mode of the answer to a certain extent, and then use this information to query key data, which is similar to the human thinking process: first determine the answering method according to the question, and then Find missing data information by answering.

4. Decoder

The decoder part consists of HMNs and a RNN controller as shown in the figure.

The RNN controller will feed the query vector into the HMNs at each step. In each step, the input query vector is passed through a context-sensitive memory network to generate a dialogue history vocabulary (the output of the last layer of hop) and a large vocabulary (the output of the first layer of hop as a prediction). Then the output vector of the context-sensitive network will be input into the context-free memory network as a query vector to query the knowledge base information and generate a knowledge base vocabulary. Finally, a word is selected from the three vocabularies through the word selection strategy as the word generated in this round.

5. Copy mechanism and word selection strategy

We employ a copy mechanism to replicate words from the memory network. The probability of the target word in the memory network is the attention weight _x001D_ in the two memory networks. If the target word does not appear, the tag memory jump position tag added in the preprocessing stage will be selected. In the process of word selection, if the output of the two memory networks is not a jump mark, we compare the one with the higher probability of the selected word in the two networks. If there is only one jump token, the vocabulary with the highest probability in the other network is chosen. If both are jump tokens, choose the most probable word in the large vocabulary.

Experiments and Results
In order to verify our model, we designed relevant experiments to verify whether our model design can achieve better results and whether the new design of the model (stacking multiple memory networks and context-sensitive memory networks) is effective.

1. Dataset

As shown in the table, we use three popular datasets: Key-Value Retrieval Datasets[2], DSTC2[3] and (6) dialogbAbI tasks[4]. The three data sets only use dialogue and knowledge base (tuple form, since DSTC 2 does not have a knowledge base, so the ontology file is used to generate), and all tags such as slot fillings are removed. The data quality of the three data sets is high and includes multiple forms such as multi-field and single-field.

2. Experimental indicators

In order to achieve the purpose of verification, we use a variety of indicators to verify our assumptions.

BLEU: We employ BLEU to verify whether the model can generate fluent reply sentences.
F1: We use F1 to verify whether the model can accurately extract the key information needed to answer from the knowledge base and reply.
Per-response accuracy and Per-dialog - accuracy: Mainly used in the bAbI dataset to verify whether the model has the ability to generate and reproduce the learned expressions.
3. Compare models

We compared the following models:

SEQ2SEQ: The seq2seq (LSTM) model is currently a reliable and stable model for comparison of various generative model studies.
SEQ2SEQ+Attn.: seq2seq with an attention mechanism can better generate information about unknowable and rare words.
Mem2Seq: memory to sequence introduces multihop attention into task-based dialogue generation.
HMNs-CFO: HMNs with context-free memory only. It is also the model of HMNs, but this model replaces the context-sensitive network with the context-free network to verify whether the context-sensitive network really works.
4. Experimental results

It can be seen that our model has achieved the best performance on most of the main evaluation indicators.

It can also be seen in a random sample of generation in the table below that the most fluent sentences are generated and the most accurate knowledge is extracted.

5. Experimental Analysis

The effect of the model

In all experiments, we found that HMNs basically achieved the best experimental results. Therefore, the overall model is reliable and effective to a certain extent. Especially on the F1 index, the performance of HMNs is more significant. Combined with the actual effect of generating sentences (it can even extract more and still accurate entities than ground truth), we think that HMNs can learn how to use query and Historical dialogue information extracts relevant key information from the knowledge base.

Do context-sensitive networks really work?
In the bAbI experimental results (Table 3), we compared the experimental results of the HMNs-CFO and HMNs models: HMNs significantly outperformed HMNs-CFO in all indicators, while both HMNs and HMNs-CFO used two memory network stacking methods , the only difference is that HMNs-CFO uses two of the same context-free memory networks to store historical dialogue and knowledge base information respectively. Furthermore, combined with the trained loss images (Figure 4), we can conclude that context-sensitive networks are better and faster at learning dialogue generation tasks.

Is it effective to learn dialogue history and knowledge base information separately with multiple networks?
Also in this experiment, we compared the experimental results of Mem2Seq and HMNs-CFO: HMNs-CFO achieved better results in most indicators (especially perdialogueaccuracy). The two key differences between Mem2Seq and HMNs-CFO are that HMNs-CFO uses a dedicated memory network to store different information compared to Mem2Seq, and at the same time connects the two networks by outputting query vectors to the knowledge base through dialogue history information. The idea of using a dedicated memory network and connecting the different memory modules in a suitable way is therefore effectively demonstrated here as well.

Future work

In the previous experimental results, we can notice that in the Key-Value Retrieval Datasets experiment, the weather prediction index HMNs, Mem2Seq and other models that use the memory network perform poorly.

In this regard, we conducted research and found that in the weather prediction task, the average number of tuple entries in the knowledge base is three times that of other tasks, and then we reduced the number of tuples in this task to roughly the same as other tasks. The F1 indicator rose to 48. Therefore, we believe that the current training method of memory network may have problems in the case of significantly increasing the amount of information storage. In actual use, it is necessary to reduce the number of candidates by facilitating matching and other methods. Therefore, we need to further study the performance and improvement methods of HMNs under different data volume scales.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us