Conformer-Based End-to-End Speech Recognition With Rotary Position Embedding
By Shengqiang Li et al
Read the original document by opening this link in a new tab.
Table of Contents
Abstract Introduction Related Work Method Experiments Conclusions
Summary
This paper explores the use of rotary position embedding (RoPE) in the conformer architecture for speech recognition tasks. The RoPE method encodes absolute positional information and incorporates explicit relative position information into the self-attention module. Experimental results on AISHELL-1 and LibriSpeech corpora show that the enhanced conformer with RoPE outperforms the original conformer and other models. The proposed model achieves significant improvements in word error rate reductions on test sets. The study compares different position embedding methods and demonstrates the superiority of RoPE over absolute and relative position embeddings.