You Only Look Once: Unified, Real-Time Object Detection

By Joseph Redmon et al

Published on May 9, 2016

Read the original document by opening this link in a new tab.

1. Introduction
2. Unified Detection
2.1. Network Design
2.2. Training
2.3. Inference
2.4. Limitations of YOLO
3. Comparison to Other Detection Systems

Summary

YOLO is a new approach to object detection that frames detection as a regression problem to spatially separated bounding boxes and class probabilities. It uses a single neural network to predict detections directly from images. The unified architecture is extremely fast, processing images in real-time. YOLO reasons globally about the image, learns generalizable representations of objects, and outperforms other detection methods in various domains. The system unifies the components of object detection into a single network, simplifying the detection process. Training involves pretraining on the ImageNet dataset followed by fine-tuning for detection. Inference is fast and accurate, predicting detections for test images efficiently. However, YOLO has limitations in predicting small objects and objects with unusual configurations. It compares favorably to other detection systems like Deformable Parts Models (DPM) and R-CNN, offering speed and accuracy.

This is where the content will go.

Innervu Knowledge Navigator

You Only Look Once: Unified, Real-Time Object Detection

By Joseph Redmon et al

Published on May 9, 2016

Read the original document by opening this link in a new tab.

Table of Contents

Summary