Generating Synthetic Documents for Cross-Encoder Re-Rankers: A Comparative Study of ChatGPT and Human Experts

By Arian Askari et al

Published on May 3, 2023

Read the original document by opening this link in a new tab.

1. ABSTRACT 2. INTRODUCTION 3. DATASET 4. METHODS 4.1 First-stage ranker: BM25 4.2 Cross-encoder re-rankers 5. EXPERIMENTAL DESIGN 6. RESULTS 6.1 Main results 6.2 Domain-level re-ranker effectiveness 7. DISCUSSION

Summary

This paper presents a comparative study on generating synthetic documents for cross-encoder re-rankers using ChatGPT and human experts. The authors introduce the ChatGPT-RetrievalQA dataset and evaluate the effectiveness of models fine-tuned on ChatGPT-generated and human-generated data. The study shows that models trained on ChatGPT responses are more effective zero-shot re-rankers, while human-trained models outperform in supervised settings. The paper highlights the potential of generative LLMs in generating training data for neural retrieval models. Further analysis is done on the domain-level effectiveness of the re-rankers across different domains. Results indicate that human-trained models perform better in specific domains such as Medicine. The study also discusses the effectiveness of BM25 on human and ChatGPT-generated responses in various datasets, as well as the performance of cross-encoder re-rankers on unseen documents of human-generated collection. Overall, the findings suggest the usefulness of generative LLMs in augmenting training data for retrieval models.

This is where the content will go.

Innervu Knowledge Navigator

Generating Synthetic Documents for Cross-Encoder Re-Rankers: A Comparative Study of ChatGPT and Human Experts

By Arian Askari et al

Published on May 3, 2023

Read the original document by opening this link in a new tab.

Table of Contents

Summary