YOLO Architecture Evolution (v3 → v8)
The YOLO (You Only Look Once) family of object detection models has evolved rapidly since its introduction.
Below is a concise overview of the main architectural changes and improvements starting from YOLOv3.
YOLOv3 (2018)
- Introduced Darknet-53 backbone (53 convolutional layers, residual connections).
- Multi-scale prediction: 3 detection layers at different feature map resolutions (good for small/medium/large objects).
- Loss function improved with binary cross entropy for class predictions.
YOLOv4 (2020)
- Implemented on CSPDarknet53 backbone (Cross Stage Partial connections).
- Added SPP (Spatial Pyramid Pooling) and PAN (Path Aggregation Network) neck.
- Training tricks: Mosaic data augmentation, DropBlock regularization, CIoU loss.
- Much better accuracy and speed balance compared to YOLOv3.
YOLOv5 (2020, Ultralytics)
- Not an official paper, but a PyTorch implementation (very popular in industry).
- Introduced easy training pipeline, modular design.
- Backbone: CSPDarknet variants, plus auto-learning bounding box anchors.
- Variants: YOLOv5s, v5m, v5l, v5x (small → extra large).
YOLOv6 (2022, Meituan)
- Optimized for real-time inference in industrial applications.
- Introduced RepVGG-style backbone with efficient training–inference decoupling.
- Anchor-free head option.
YOLOv7 (2022)
- Unified training of convolutional and transformer-based models.
- Extended E-ELAN backbone for deeper networks without performance drop.
- Introduced model re-parameterization and dynamic label assignment.
- State-of-the-art real-time detection on COCO at the time.
YOLOv8 (2023, Ultralytics)
- Anchor-free design by default.
- New head architecture similar to modern detectors (Decoupled classification/regression).
- Supports detection, segmentation, pose estimation.
- Available in sizes: n, s, m, l, x.
In summary, YOLO has progressed from a simple real-time detector (v3) to a versatile family (v8) supporting multiple computer vision tasks,
with improvements in backbone networks, training strategies, and loss functions.