You Only Look Once: Unified, Real-Time Object Detection

By Joseph Redmon et al
Published on May 9, 2016
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Unified Detection
2.1. Network Design
2.2. Training
2.3. Inference
2.4. Limitations of YOLO
3. Comparison to Other Detection Systems

Summary

YOLO is a new approach to object detection that frames detection as a regression problem to spatially separated bounding boxes and class probabilities. It uses a single neural network to predict detections directly from images. The unified architecture is extremely fast, processing images in real-time. YOLO reasons globally about the image, learns generalizable representations of objects, and outperforms other detection methods in various domains. The system unifies the components of object detection into a single network, simplifying the detection process. Training involves pretraining on the ImageNet dataset followed by fine-tuning for detection. Inference is fast and accurate, predicting detections for test images efficiently. However, YOLO has limitations in predicting small objects and objects with unusual configurations. It compares favorably to other detection systems like Deformable Parts Models (DPM) and R-CNN, offering speed and accuracy.
×
This is where the content will go.