Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
By L. Xie et al
Published on June 6, 2023
Read the original document by opening this link in a new tab.
Table of Contents
Abstract
Introduction
Artificial General Intelligence
AGI in Computer Vision
Imaginary Pipeline towards AGI in CV
Conclusion
Summary
The AI community has been pursuing algorithms known as artificial general intelligence (AGI) that apply to any kind of real-world problem. Recently, chat systems powered by large language models (LLMs) emerge and rapidly become a promising direction to achieve AGI in natural language processing (NLP), but the path towards AGI in computer vision (CV) remains unclear. This paper explores the lessons learned from GPT and LLMs to shed light on the challenges and potential solutions in achieving AGI in CV. The authors propose an imaginary pipeline towards AGI in CV involving three stages: establishing interactive environments, training agents, and fine-tuning for various tasks. The paper concludes by highlighting the need for substantial research and engineering efforts to advance this vision.