YOLO Architecture Evolution (v3 → v8)

The YOLO (You Only Look Once) family of object detection models has evolved rapidly since its introduction. Below is a concise overview of the main architectural changes and improvements starting from YOLOv3.

YOLOv3 (2018)

Introduced Darknet-53 backbone (53 convolutional layers, residual connections).
Multi-scale prediction: 3 detection layers at different feature map resolutions (good for small/medium/large objects).
Loss function improved with binary cross entropy for class predictions.

YOLOv4 (2020)

Implemented on CSPDarknet53 backbone (Cross Stage Partial connections).
Added SPP (Spatial Pyramid Pooling) and PAN (Path Aggregation Network) neck.
Training tricks: Mosaic data augmentation, DropBlock regularization, CIoU loss.
Much better accuracy and speed balance compared to YOLOv3.

YOLOv5 (2020, Ultralytics)

Not an official paper, but a PyTorch implementation (very popular in industry).
Introduced easy training pipeline, modular design.
Backbone: CSPDarknet variants, plus auto-learning bounding box anchors.
Variants: YOLOv5s, v5m, v5l, v5x (small → extra large).

YOLOv6 (2022, Meituan)

Optimized for real-time inference in industrial applications.
Introduced RepVGG-style backbone with efficient training–inference decoupling.
Anchor-free head option.

YOLOv7 (2022)

Unified training of convolutional and transformer-based models.
Extended E-ELAN backbone for deeper networks without performance drop.
Introduced model re-parameterization and dynamic label assignment.
State-of-the-art real-time detection on COCO at the time.

YOLOv8 (2023, Ultralytics)

Anchor-free design by default.
New head architecture similar to modern detectors (Decoupled classification/regression).
Supports detection, segmentation, pose estimation.
Available in sizes: n, s, m, l, x.

In summary, YOLO has progressed from a simple real-time detector (v3) to a versatile family (v8) supporting multiple computer vision tasks, with improvements in backbone networks, training strategies, and loss functions.