# Sequence to Sequence Learning with Neural Networks

<https://arxiv.org/pdf/1409.3215.pdf>

在大范围的使用注意力机制之前，Seq2Seq模型已经得到了广泛的应用。它最初用于语言建模型领域，简要来说，它的目标就是希望将一个输入序列转换为一个新的目标序列，并且输入序列和目标序列的长度可以是不固定的，Seq2Seq模型的应用领域有机器翻译、生成问答对话、句法分析等。

Seq2Seq模型主要是一种**Encoder-Decoder**的架构，它包括：

* **Encoder**：它主要所做的事将输入序列数据的信息压缩到一个固定长度的上下文向量（context vector）\[也可称为sentence embedding、thought vector]中，希望得到的表示向量可以较好的包含整个输入的信息
* **Decoder**：它主要所做的是使用上下文向量初始化，然后输出转换后的向量。早期的工作仅使用Encoder的最后状态来初始化Decoder

Encoder和Decoder是一个循环神经网咯模型，通常可选择LSTM、GRU做为基本单元。例如在下面机器翻译的例子中，我们将**She is eating a green apple**输入到Encoder中，希望Decoder输出**她在吃一个绿苹果**。通常我们关注的并不是某一次具体的翻译任务，而是希望中间的Context vector可以学到关于输入的重要信息的表示。

![](/files/-Lpsm_O10yVmOL8L7-uS)

这种使用固定长度的上下文向量的一个关键的不足是它不能记住长的句子，一旦完成了对于某个输入的处理，它就会忘记之前已经学到的部分。为了解决这个问题，注意力机制应时而生。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://im-qianuxn.gitbook.io/pytorch/lun-wen-yue-du/sequence-to-sequence-learning-with-neural-networks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.