Summary
The paper discusses the adaptation of GPT, GPT-2, and BERT language models for automatic speech recognition (ASR). It presents results using fine-tuned GPT, GPT-2, and their combination for ASR tasks. The study compares unidirectional and bidirectional LMs, highlighting the importance of accurate language prior probabilities. Experimental results show significant improvements in ASR performance using the combined models. The paper provides insights into LM combination techniques and bidirectional LM output probability conversion for ASR applications.