Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
By Maha Elbayad et al.
Published on Nov. 1, 2018
Read the original document by opening this link in a new tab.
Table of Contents
1 Introduction
2 Related work
3 Translation by 2D Convolution
4 Self attention
4 Experimental evaluation
Summary
Current state-of-the-art machine translation systems are based on encoder-decoder architectures with an attention mechanism. This paper proposes an alternative approach using 2D convolutional neural networks for sequence-to-sequence prediction. The model re-codes source tokens based on the output sequence produced so far, incorporating attention-like properties throughout the network. Experimental evaluation on the IWSLT 2014 dataset shows competitive results with simpler conceptualization and fewer parameters compared to encoder-decoder systems. The paper discusses related work, the model architecture in detail, and presents experimental results and comparisons to state-of-the-art models.