A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLO V1 to YOLO V8 and YOLO-NAS
By Juan R. Terven et al
Published on Feb. 4, 2024
Read the original document by opening this link in a new tab.
Table of Contents
1. Abstract
2. Introduction
3. Object Detection Metrics and Non-Maximum Suppression (NMS)
4. YOLO: You Only Look Once
5. YOLOv1 Architecture
6. YOLOv1 Training
Summary
This paper presents a comprehensive analysis of YOLO's evolution in computer vision, covering YOLO V1 to YOLO V8 and YOLO-NAS. It discusses the innovations and contributions in each iteration, starting from the standard metrics to major changes in network architecture and training tricks for each model. The paper also highlights the tradeoffs between speed and accuracy in YOLO models and envisions future directions for real-time object detection systems. YOLO's applications across diverse fields are explored, showcasing its impact in autonomous vehicles, surveillance, agriculture, healthcare, remote sensing, security systems, manufacturing, traffic applications, wildlife monitoring, robotics, and more. Object detection metrics such as Average Precision and Non-Maximum Suppression are explained in detail. The architecture and training process of YOLOv1 are described, emphasizing the end-to-end approach for object detection and the use of convolutional layers and fully-connected layers for prediction. The training methodology involving pre-training on ImageNet and fine-tuning on PASCAL VOC datasets is outlined, along with the loss function used in YOLOv1 for bounding box predictions and confidence scores.