# Attention-based Pairwise Multi-Perspective Convolutional Neural Network for Answer Selection in Ques

PS：额外背景

LTR（Learning torank）学习排序是一种监督学习（SupervisedLearning）的排序方法。LTR已经被广泛应用到文本挖掘的很多领域，比如IR中排序返回的文档，推荐系统中的候选产品、用户排序，机器翻译中排序候选翻译结果等等。

LTR一般说来有三类方法：单文档方法（Pointwise），文档对方法（Pairwise），文档列表方法（Listwise）

这三种方法的区别就在于训练模型时，一次要考虑多少个文档。

* Pointwise 方法通过计算单个文档的相关性来建立 LTR 模型
* Pairwise 方法通过比较相邻的文档的相关性来建立 LTR 模型。由于训练集中的任意 2 条查询文档对总会有一个相对大小的关系，因此根据训练集中的文档对的大小关系条件训练出分类模型，然后预测新查询下的各个文档之间的大小关系，进而对各个文档进行排序。
* Listwise 方法将文档在整体数据集中的排名来建立 LTR 模型。也就是说 Listwise 方法考虑整个文档，而不是单个文档或者文档对。

![img](https://img-blog.csdnimg.cn/2019011002113193.jpg?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2NoaWtpbHlfeW9uZ2Zlbmc=,size_16,color_FFFFFF,t_70)

<https://blog.csdn.net/chikily_yongfeng/article/details/81396607#fn3>

<https://tech.meituan.com/tags/learning-to-rank.html>

## 0.Abstract

​ question answering and information retrieval systems have become widely used（知识问答广泛应用）. These systems attempt to find the answer of the asked questions from raw text sources（试图寻找答案，在已有文字中）. A component of these systems is Answer Selection which selects the most relevant answer from candidate answers（问答匹配从候选集中，选择最相关的答案）. Syntactic similarities were mostly used to compute the similarity（句法相似广泛应用在相似计数）, but in recent works, deep neural networks have been used which have made a significant improvement in this field（但最进这个领域深度学习广泛提升效果）. In this research, a model is proposed to select the most relevant answers to the factoid question from the candidate answers（提出了一个模型，从候选答案中选择最相关答案）. The proposed model ranks the candidate answers in terms of semantic and syntactic similarity to the question（所提出的模型根据问题的语义和句法相似性对候选答案进行排序）, using convolutional neural networks（使用卷积神经网络）. In this research, Attention mechanism and Sparse feature vector （Attention机制和稀疏特征向量）use the context-sensitive interactions （上下文感知抽取）between questions and answer sentence（在问题和答案之间）. Wide convolution increases the importance of the interrogative word. Pairwise ranking is used to learn differentiable representations to distinguish positive and negative answers（成对排序用于学习可区分的表示形式以区分肯定和否定答案）. Our model indicates strong performance on the TrecQA（我们的模型在TrecQA上表现出色） beating previous state-of-the-art systems by 2.62% in MAP and 2.13% in MRR while using the benefits of no additional syntactic parsers and external tools. The results show that using context-sensitive interactions （结果显示使用上下文感知交互）between question and answer sentences（在问题和答案句之间） can help to find the correct answer more accurately.（能帮助找到更准备的答案）

## 1.Introduction

​ The human need for information about a particular subject is called the Information Need（人类对有关特定主题的信息的需求称为信息需求）,The answer to this need can be text, audio, image, video, or a combination of them (Kolomiyets & Moens, 2011)（答案可能是文本、音频、图片、视频等）. One of the most important information needs that mankind has been involved with throughout history is to answer the questions （最重要的是用人脑子里的过去的信息去回答问题）that have emerged in the minds of humankind over and over again（这些信息在人的脑子里一遍又一遍）. Finding questions' answers requires extensive studies and a lot of time（寻找答案需要广泛的学习和时间）. Nowadays, web space has built up a vast repository of data with extremely high redundancy（如今web有大量沉余知识库） (Brill, Dumais, & Banko, 2002). Thus, it can be a good resource to find answers（是一个寻找答案的好地方）. That' why, many users try to find pages related to their information needs and questions on the Internet and going through them to extract the answer.

​ There are several ways to find answers using the Internet such as forums, social networks, search engines, and question answering systems（用论坛、社交网络、问答系统等方式查询答案）. One of the most important ways to find answers to the questions is to use question answering (QA)（最重要的是问题回答系统） systems. Instead of presenting the entire document（而不是返回完整的文档）, these systems only return specific parts of the document information as an answer（这种系统只返回精准的文档中的答案）. This answer may be a word, a sentence, a paragraph, or an audio/ video clip (Dwivedi & Singh, 2013)（可能是句子、段落、部分音频视频）. There are two kinds of question answering systems containing Knowledge-based and Information Retrieval-based (IR-based) systems（两种问答系统：基于知识、信息检索）. In knowledge-based systems, documents are structured, or they are converted into a structured form such as relational databases or knowledge databases（基于知识的，信息是结构化的，保存在数据库这种关系型数据库里面）. In relational databases and knowledge databases, objects and their attributes are stored at the database, and the communication between them is defined（对象和属性被明确定义保存）. But IR-based systems usually handle queries and unstructured documents（**IR信息检索系统数据是非结构化的文档**）. Unstructured information has more detailed information than structured information（**非结构化的比结构化的含有更多的信息**）. That's why, nowadays more attention is focused on this type of information and is used more than knowledge-based systems（这就是为什么现在研究主要聚焦在IR）. IR-based systems have three general phases containing Question Processing (QP), Phrase Retrieval (PR), and Answer Processing (AP) (Jurafsky & Martin, 2014).（**IR有三个阶段，问题处理、短语检索、答案处理**）.The overall structure of an IR-based system is shown in Figure 1

![](/files/-LpJVq1kmQ9jnztKt_y3)

​ In the QP phase(QP阶段), the user's question is processed and the question information is extracted(用户问题被处理和信息抽取). This phase includes Query formulation and the Answer type detection sections（此阶段包括查询表述和答案类型检测部分）. Query formulation creates a query from the user's question keywords and sends it to the PR phase(查询阶段从用户问题关键词创建查询发送到PR阶段). Answer type detection attempts to discover the answer type using various analyzes on the words and structure of the user's question（答案类型发现试图发现问题类型，利用用户问题里面的各种单词和结构）. PR phase retrieves related phrases（PR阶段返回相关短语） from the documents after receiving the query. 显然”文档“不适合QA系统单元，有必要检查文档的不同部分像段落和句子。之后分析文档的不同部分，他的阶段按相关性对短语进行排序并给出它们。In the AP phase, the question's answer is extracted from the phrases returned from the PR phase.（在AP阶段问题的答案被从短语中抽取，返回给PR阶段） This phase presents question answers in two ways（这个阶段展现答案有两种方式）: the extracted answer, and the generated answer（抽取答案、生成答案）. 抽取答案从已有文本抽取, 生成答案从文本发现答案使用另一套规则 (Jurafsky & Martin, 2014; Mishra & Jain, 2016).

​ Determining the semantic similarity between two sentences is one of the essential and important issues in the field of natural language processing and information retrieval（区分两个句子的相似度是非常必要且重要的问题，在自然语言处理和信息检索领域）. This issue is also called sentence matching （也叫句子匹配）and is used in paraphrase recognition（也用于同义词识别） (Magnolini, 2014). Another application of this method is to find the most relevant semantically answer to a question in the AP phase of the QA systems（另一个应用是在QA系统中，在AP阶段查找问题的最相关的语义答案）. This method is very suitable for factoid questions（非常适用于事实问题） because the sentence structure of the answer of the factoid question is very similar to the meaning and structure of the question（因为事实问题答案的句子结构和意思与问题句非常像）. In the QA systems where the AP phase operates based on the answer extraction（QA系统中AP阶段的操作基于问题抽取）, there is a special component called the Answer Selection (AS)（有个专门的部分叫做问题选择AS）. This component is responsible for identifying the best answer to the question from candidate answers as the correct answer（**这部分用于从候选集中识别出问题的最佳答案作为正确答案**）,The AS can be considered as a supervised learning problem（**AS可以被看作是一个监督学习问题**）. Given $$q={q\_1,q\_2,...,q\_n}$$,qi代表一个问题,each qi comes together with a candidate answer set of (每个qi有一个候选答案集)$${(s\_{i1},y\_{i1},(s\_{i2},y\_{i2}),....,(s\_{im},y\_{im} }$$,in which $$s\_{ij}$$ refers to the jth candidate answer（每个$$s\_{ij}$$代表第j个候选答案）, and $$y\_{ij}$$ refers to the answer label（$$y\_{ij}$$代表问题标签0/1）.If jth candidate answer is the positive answer to ith question, then yij=1, and vice versa（如果第j个候选答案对问题来说是正确的，那么jij是1，反之亦然）.A classifier can be trained to predict candidate answers for unseen questions with this labeled data (Yih, Chang, Meek, & Pastusiak, 2013)（可以训练出分类器，在不知道问题标签下预测答案）. Indeed, the classifier predicts the semantic similarity between question and answer (Echihabi & Marcu, 2003)（确实可以用分类器预测问答之间的语义相似性）. Classifier's input is (qi, aij) that each refers to a sentence, and the output is yij, which is 0/1 that represents the correctness of the answer. For example, consider the q question in Table I, with three candidate answers a1 and a2 and a3. This classifier must output zero for inputs (q, a1) and (q, a2), and output one for input (q, a3).

![](/files/-LpM31c5RDARsablWdrs)

​ Various ways to solve the answer selection problem has been presented so far（至今，有太多解决方案提出，用于被用于问题选择）. These methods can be divided into two general categories（能被主要分为2大）. The first category includes the methods which try to measure the similarity between question and answer using feature engineering and syntactic methods(第一类寻找问题和答案之间的相似度，利用特征工程和句法分析) . The second category includes methods which use deep neural networks（第二类使用深度学习）. Over the past few years, neural network and deep learning have become more prominent and have used to solve answer selection problem（最近几年，神经网络和深度学习被显著的用于解决答案选择问题）. In this paper, we use deep neural networks and try to provide a model with a convolutional neural network (CNN)（我们这个论文，我们使用深层神经网络并尝试为模型提供卷积神经网络）.The proposed model does not use external sources（提出的模型不需要借助其它辅助）, such as WordNet (Miller, 1998), syntactic parsers (Jurafsky & Martin, 2014) and named entity recognizer (NER) (Jurafsky & Martin, 2014), and just uses the raw texts of the question and the answer（仅仅用原始文本中的问题和答案）.

​ In section II, related works will be explained.（第二部分将会介绍相关工作） In section III, the proposed model will be described in detail（第三部分会详细描述提出模型的细节）. In section IV, the proposed model will be evaluated with TrecQA (M. Wang, et al., 2007) dataset and the accuracy will be compared to the other models（第四部分，提出的模型评估TrecQA数据集，并且精度和其它模型比更好）. Finally, in section IV, the paper will be concluded（最后，第五部分，总结论文）.

## 2. Related Works

​ The early works have used string overlap features to measure sentence similarity. These have included features such as bag-of-words overlap and grams overlap（早期，使用 bag-of-words overlap 、 grams测量相似 ） (Wan, Dras, Dale, & Paris, 2006). However, these approaches cannot capture linguistic and semantic features（但是不能不捉语言和语义特征）. For example, the question "Who did Jamshid call?" cannot be answered with the "Ali called Jamshid", although these two sentences have the most overlap. Hence, the bag-of- words features are poor for sentence similarity（bag-of- words特征不捉不到） (Surdeanu, Ciaramita, & Zaragoza, 2008). To solve this problem, various approaches have used lexical resources like WordNet to consider synonymous words（各种词法分析方法提出，WordNet考虑代名词） (Fernando & Stevenson, 2008). In these approaches, it was difficult to reconcile with words that were not mentioned in lexical sources（在这些方法中，很难与词汇来源中未提及的单词相协调）.

​ Many approaches have used semantic and syntactic structure to capture the linguistic and semantic features（许多方法用语义和句法特征去捕捉语言和句子特征）. In question answering（在问答领域）, it is possible to use dependency tree for the question and the candidate answers（可以对问题和候选答案使用依赖树）. The answers can be ranked based on increasing order of edit distance between the question dependency tree and each answer dependency tree（可以根据问题依存关系树和每个答案依存关系树之间的编辑距离的增加顺序对答案进行排名） (Punyakanok, Roth, & Yih, 2004).

​ In recent years, new methods have been developed to solve the answer selection problem using neural networks and deep learning. Neural network methods aim to eliminate the need for manual feature adjustment（神经网络方法旨在消除手动调整特征的需求） but need a lot of input data to extract the features（但需要很多数据去抽取特征）, which is not a simple challenge and can be a constraint.

​ Yu et al. (Yu, Hermann, Blunsom, & Pulman, 2014) have introduced the first work for AS using neural networks（第一次AS里面用了神经网络）. The proposed model uses two methods to represent inputs（用两种方法表示输入）. In the first method, bag-of-words is used that computes the sum 􏰀of the sentences words' vectors and normalizes them（在第一种方法中，使用词袋来计算句子词向量的总和并对其进行归一化）. In the second method, the bigram is used（使用了二元，2-gram）, in such a way that the word vectors are considered as bigram（单词维度看作二元组） using convolutional neural networks. This model focuses on the use of the convolutional neural network in answer selection（这一模型聚焦卷积神经网络用于答案选择） and does not consider any semantic relationships between question and answer（没有考虑答案和问题间的语义关系）.Feng et al. (Feng, Xiang, Glass, Wang, & Zhou, 2015) have combined convolutional neural networks with fully-connected neural networks in different ways and have produced various models（使用全连接结合卷积神经网络产生各种模型）.These models come from the combination of hidden layers, convolutional operators, pooling operators, and the activation function（这种模型结合隐藏层，卷积操作，池化操作和激活函数）. In these models, the ranking method is changed to the pairwise, which is shown that using pairwise ranking is better than pointwise ranking（成对排序比单一排序好）.Severyn et al. (Severyn & Moschitti, 2015) have provided a lightweight model instead of various models(提供了轻量级模型，而不是各种模型) and had shorter training time than other models(比其他模型有更短的训练时间). This model is the first model which uses feature vectors and wide convolution and shows that using external vectors can be useful（是第一个使用特征向量和宽卷积，且表明使用外部向量可能是有用的）.Tay et al. (Tay, Phan, Tuan, & Hui, 2017) have presented a model similar to Severyn et al. (Severyn & Moschitti, 2015) but have used recurrent neural networks instead of convolutional neural networks（但使用的是循环神经网络而不是卷积神经网络）. This model shows recurrent neural networks have a better understanding of the context and as well as uses external vectors（该模型显示递归神经网络对上下文有更好的理解，并使用外部向量）. He et al. (He, Gimpel, & Lin, 2015) have presented a model using a convolutional neural network（提出使用卷积神经网络的模型）. This model uses a multi-perspective convolutional neural network instead of a lightweight model such as Severyn et al（使用多层感知卷积神经网络而不是Severyn那种轻量级模型）.(Severyn & Moschitti, 2015) and several different models such as Feng et al. (Feng, et al., 2015). The model does not use any external components to demonstrate that the neural network alone is sufficient for answer selection（不用任何外部成分，证明单独的神经网络足够用于答案选择）. Tu (Tu, 2018) has shown that some components of MPCNN model (He, et al., 2015) can be omitted without altering the accuracy of the model（Tu已经表明，在不改变模型准确性的情况下，可以省略MPCNN模型的某些组件（He等，2015））. The model has also shown that the attention mechanism can be somewhat useful（同时表明attention机制有时候更好）. Rao et al. (Rao, He, & Lin, 2016) have presented a pairwise model which have used MPCNN (He, et al., 2015) as a pointwise model. This model shows pairwise is better than pointwise for answer selection（显示在答案选择，成对比当个更好）. The related works are shown in Table II.

![](/files/-LpX4I1932sBfjwLd8xF)

## 3. Model Architecture

​ In this paper, **we propose an Attention-based Pairwise Multi-Perspective Convolutional Neural Network model**, **AP-MPCNN**（我们提出了AP-MPCNN）. This model follows the Siamese structure（这个模型遵循孪生结构） (Bromley, et al., 1993), in which there are two parallel subnets（两个平行的子网络）, each of which processes an input sentence（每个处理一个输入句子）. All parameters are shared in these two subnets（在这两子网络中所有参数共享）. Figure 2 shows the AP-MPCNN architecture.

![](/files/-LpX4I1DgL27f1jp0VOO)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://im-qianuxn.gitbook.io/pytorch/lun-wen-yue-du/1-attention-based-pairwise-multi-perspective-convolutional-neural-network-for-answer-selection-in-qu.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.