Conformer-Based End-to-End Speech Recognition With Rotary Position Embedding

By Shengqiang Li et al
Read the original document by opening this link in a new tab.

Table of Contents

Abstract Introduction Related Work Method Experiments Conclusions

Summary

This paper explores the use of rotary position embedding (RoPE) in the conformer architecture for speech recognition tasks. The RoPE method encodes absolute positional information and incorporates explicit relative position information into the self-attention module. Experimental results on AISHELL-1 and LibriSpeech corpora show that the enhanced conformer with RoPE outperforms the original conformer and other models. The proposed model achieves significant improvements in word error rate reductions on test sets. The study compares different position embedding methods and demonstrates the superiority of RoPE over absolute and relative position embeddings.
×
This is where the content will go.