Lab-3 Part2 Report
Lab Report
Group Members:
Carolyn Cui 100399399
Haolin Li 218600849
Robert Palermo
Introduction
In our exploration of object detection models, we aim to determine the performance of existing popular models and, in addition,
compare the process of single-node deep learning to distributed deep learning.
The topic that best suits our needs in this foray into object detection models is stop signs our motivations for choosing this topic is
that stop sign detection, along with general street sign detection, is still a key challenge that any fully automatic or otherwise semi-
automated car has to overcome to be viable for the market.
There are nearly 700,000 accidents that occur around stop signs each year in the U.S., of which more than 80% can be prevented.
Numerous solutions do exist already, but performance is also continuously improving. Reasonably, because of scrutiny and safety
concerns, these models must be rigorously tested and trained, then fine-tuned. The rollout of AI-assisted technology in cars proves
to be beneficial not only for day-to-day drivers but also for disabled, elderly, and otherwise impaired drivers. We inch ever closer
toward fully automated rider experiences as these models improve. As such, we decided that this topic is doable given the time limit,
familiar to our group, and allows us to experience firsthand the challenges accompanying an object detection task.
Due to familiarity and ease of use, we have decided to use the YOLO models, known for their accuracy and real-time performance.
This makes them optimal for tasks that require high accuracy and speed, such as our case: detecting stop signs.
Dataset Curation
The dataset was created last year from photos taken around UC Merced, the Bay Area, and San Diego. Additional images are
screenshots courtesy of Google Street View. Totaling around 70 images, our dataset includes nighttime photos, stop signs at varying
distances, and other street signs (e.g., do not enter). Roboflow was used to annotate the dataset.
We acknowledge that, for object detection, the dataset is on the smaller end. Though the optimal dataset size depends on the
nature of the data and how accurate we want the model to be, 150 images are preferred for a more usable model. We opted for a
lower number due to time and resource constraints.
Single-Node
For all single-node training, we decided to use the YOLO models. The original YOLO, and all versions through YOLOv3, were
created by Joseph Redmon and based on the darknet neural network. Alexey Bochkovskiy continued his work in YOLOv4, which
boasts higher accuracy but is not necessarily faster.
Branching off of there, we chose YOLOv5 due to its existing documentation for both single-node and distributed training, and the
version also means it’s suitable for comparison to YOLOv4.
All training was conducted through Google Colab with a provided Tesla T4 GPU.
YOLOv4
Overview
As this initial application of YOLOv4 was one of our first times doing anything related to AI/ML, the approach and evaluation could
be better, and some metrics may need to be included.
Training
Though a model was provided, we made adjustments to suit our needs better. Our model can run up to 2000 iterations, with some
predetermined values as required by Darknet, such as a 0.001 learning rate and 64 batches. We settled on 32 subdivisions,
meaning two images are evaluated simultaneously.