Unified cross-task small-sample learning algorithm UPT gives a solution

background
With the gradual expansion of the scale of pre-trained language models, distributed training and optimization of hundreds of billions, trillions or even larger pre-trained language models are emerging. The expansion of the scale of pre-trained language models has brought continuous improvement in the performance of this type of models in related tasks such as natural language understanding. However, the parameter space of these models is relatively large. If these models are directly fine-tuned on downstream tasks, more training data is required in order to achieve better model generalization. In actual business scenarios, especially in vertical fields and specific industries, the problem of insufficient number of training samples widely exists, which greatly affects the accuracy of these models in downstream tasks. The small-sample learning technology based on Prompt Tuning can make full use of the knowledge acquired by the model during the pre-training process, and train a model with higher accuracy on a given small training set. However, in the small-sample learning scenario, the limited training data still restricts the accuracy of the model. Therefore, if other cross-task datasets can be effectively used in the small sample learning stage, the accuracy of the model can be further improved.
Algorithm Architecture
The cross-task small sample learning algorithm UPT (Unified Prompt Tuning) is a deep extension of the learning mechanism of the existing small sample learning algorithm. UPT is a unified learning paradigm that can unify various downstream tasks and pre-training tasks into the form of POV (Prompt-Options-Verbalizer), so that the model can learn to use Prompt to solve various NLP tasks. In our work, the task construction form of UPT is as follows:
It can be seen that whether it is a single sentence classification task, a double sentence matching task, or a self-supervised learning task in the pre-training stage, UPT can transform them into a unified paradigm for learning. This learning method takes into account the advantages of the classic small sample learning algorithm, and introduces the idea of "meta learning" (Meta Learning) in the learning process, which greatly improves the generalization of the model to downstream tasks and alleviates its difficulty in small sample learning. The overfitting problem encountered in the sample learning stage. After we have trained this Meta Learner, we can reuse the previous algorithm to perform Few-shot Fine-tuning on Meta Learner.
Unified Prompting Paradigm
Specifically, when the pre-training model performs Prompt-Tuning on different downstream tasks, it is necessary to design a fixed prompt mode (PVP, or Prompt-Verbalizer-Pair) for a specific task. It is difficult for the model to utilize the information shared by these tasks at the same time. We Unify various NLP tasks into the following format:
• P (Prompt): Indicates a task-related prompt, containing at least one [MASK] token;
• O (Option): List the candidates in Verbalizer by asking questions;
• V (Verbalizer): The mapping relationship between the defined label word and the label.
For the task of supervised learning, we give the following two examples, corresponding to single-sentence text classification and double-sentence text matching:
• Comment categories: "[X]. Is great or bad? It was [MASK].";
• Paper coherence prediction: "[X1]. Is this paragraph the same as the next: [X2]? It was [MASK]."
Incorporating self-supervised tasks
For self-supervised tasks, in the Pre-training stage, we have not seen this existing mode, so that it is difficult for the model to quickly learn the information of Prompt during Prompt-Tuning, so this section aims to improve the original self-supervision Task Masked Language Modeling (MLM), and extended to Prompt. Note that we do not retrain the language model, but use Prompt-MLM as an auxiliary task.
The original MLM is given a sentence, randomly select one or more positions and replace them with [MASK], and let the model predict the Word (or sub-word) of each [MASK] position through the MLM head. For example, given a sentence "Disney movies are so wonderful that I insist on watching two every week.", randomly Mask a word: "Disney movies are so [MASK] that I insist on watching two every week.", and then let the model predict Possible words for the position.
The main process of our proposed Prompt-MLM is shown in the figure below:
We first detect high-frequency adjectives from the pre-training corpus, and perform word sense similarity clustering. For an input sentence, we perform part-of-speech tagging on the text, and select the position of the adjective in the text as the position of the Mask. Then select an adjective in the Cluster that is most dissimilar to the adjective as another option to construct Options. Finally, we transform the MLM task into a Prompt-based binary classification task without any data labeling.
Algorithm Accuracy Evaluation
In order to verify the effectiveness of the above algorithms, we conducted accuracy evaluations on classic and self-developed small-sample learning algorithms. In the experiment, we use Roberta-large as the pre-training language model. For each downstream task, we only draw 16 samples of each category for learning during the training process, and evaluate on all test sets. In the table below, we list the experimental results of standard Fine-tuning, classic small-sample learning algorithms LM-BFF, PET, P-tuning, PPT, etc. on 9 public data sets, using accuracy (Accuracy, %) As a model performance evaluation index:
It can be seen from the above results that the self-developed algorithm UPT proposed by us has obvious accuracy improvement on multiple data sets. We also verified the experimental effect of UPT on multiple SuperGLUE datasets. The result looks like this:
In addition, the PAI team also won the first place in the public evaluation list of FewCLUE Chinese small-sample learning (see here), surpassing senior manufacturers such as Tencent, Baidu, and Ping An. In order to better serve the open source community, the source code of the UPT algorithm will be contributed to EasyNLP, a natural language processing algorithm framework, and NLP practitioners and researchers are welcome to use it.
EasyNLP open source framework: https://github.com/alibaba/EasyNLP

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us