# Vehicle Detection Project
<p align="center">
<a href="https://www.youtube.com/watch?v=Cd7p5pnP3e0"><img src="./img/overview.gif" alt="Overview" width="50%" height="50%"></a>
<br>Qualitative results. (click for full video)
</p>
## [Rubric](https://review.udacity.com/#!/rubrics/513/view) Points
### Abstract
The goal of the project was to develop a pipeline to reliably detect cars given a video from a roof-mounted camera: in this readme the reader will find a short summary of how I tackled the problem.
**Long story short**:
- (baseline) HOG features + linear SVM to detect cars, temporal smoothing to discard false positive
- (submission) [SSD deep network](https://arxiv.org/pdf/1512.02325.pdf) for detection, thresholds on detection confidence and label to discard false positive
*That said, let's go into details!*
### Good old CV: Histogram of Oriented Gradients (HOG)
#### 1. Feature Extraction.
In the field of computer vision, a *features* is a compact representation that encodes information that is relevant for a given task. In our case, features must be informative enough to distinguish between *car* and *non-car* image patches as accurately as possible.
Here is an example of how the `vehicle` and `non-vehicle` classes look like in this dataset:
<p align="center">
<img src="./img/noncar_samples.png" alt="non_car_img">
<br>Randomly-samples non-car patches.
</p>
<p align="center">
<img src="./img/car_samples.png" alt="car_img">
<br>Randomly-samples car patches.
</p>
The most of the code that relates to feature extraction is contained in [`functions_feat_extraction.py`](functions_feat_extraction.py). Nonetheless, all parameters used in the phase of feature extraction are stored as dictionary in [`config.py`](config.py), in order to be able to access them from anywhere in the project.
Actual feature extraction is performed by the function `image_to_features`, which takes as input an image and the dictionary of parameters, and returns the features computed for that image. In order to perform batch feature extraction on the whole dataset (for training), `extract_features_from_file_list` takes as input a list of images and return a list of feature vectors, one for each input image.
For the task of car detection I used *color histograms* and *spatial features* to encode the object visual appearence and HOG features to encode the object's *shape*. While color the first two features are easy to understand and implement, HOG features can be a little bit trickier to master.
#### 2. Choosing HOG parameters.
HOG stands for *Histogram of Oriented Gradients* and refer to a powerful descriptor that has met with a wide success in the computer vision community, since its [introduction](http://vc.cs.nthu.edu.tw/home/paper/codfiles/hkchiu/201205170946/Histograms%20of%20Oriented%20Gradients%20for%20Human%20Detection.pdf) in 2005 with the main purpose of people detection.
<p align="center">
<img src="./img/hog_car_vs_noncar.jpg" alt="hog" height="128">
<br>Representation of HOG descriptors for a car patch (left) and a non-car patch (right).
</p>
The bad news is, HOG come along with a *lot* of parameters to tune in order to work properly. The main parameters are the size of the cell in which the gradients are accumulated, as well as the number of orientations used to discretize the histogram of gradients. Furthermore, one must specify the number of cells that compose a block, on which later a feature normalization will be performed. Finally, being the HOG computed on a single-channel image, arises the need of deciding which channel to use, eventually computing the feature on all channels then concatenating the result.
In order to select the right parameters, both the classifier accuracy and computational efficiency are to consider. After various attemps, I came up to the following parameters that are stored in [`config.py`](config.py):
```
# parameters used in the phase of feature extraction
feat_extraction_params = {'resize_h': 64, # resize image height before feat extraction
'resize_w': 64, # resize image height before feat extraction
'color_space': 'YCrCb', # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
'orient': 9, # HOG orientations
'pix_per_cell': 8, # HOG pixels per cell
'cell_per_block': 2, # HOG cells per block
'hog_channel': "ALL", # Can be 0, 1, 2, or "ALL"
'spatial_size': (32, 32), # Spatial binning dimensions
'hist_bins': 16, # Number of histogram bins
'spatial_feat': True, # Spatial features on or off
'hist_feat': True, # Histogram features on or off
'hog_feat': True} # HOG features on or off
```
#### 3. Training the classifier
Once decided which features to used, we can train a classifier on these. In [`train.py`](train.py) I train a linear SVM for task of binary classification *car* vs *non-car*. First, training data are listed a feature vector is extracted for each image:
```
cars = get_file_list_recursively(root_data_vehicle)
notcars = get_file_list_recursively(root_data_non_vehicle)
car_features = extract_features_from_file_list(cars, feat_extraction_params)
notcar_features = extract_features_from_file_list(notcars, feat_extraction_params)
```
Then, the actual training set is composed as the set of all car and all non-car features (labels are given accordingly). Furthermore, feature vectors are standardize in order to have all the features in a similar range and ease training.
```
feature_scaler = StandardScaler().fit(X) # per-column scaler
scaled_X = feature_scaler.transform(X)
```
Now, training the LinearSVM classifier is as easy as:
```
svc = LinearSVC() # svc = SVC(kernel='rbf')
svc.fit(X_train, y_train)
```
In order to have an idea of the classifier performance, we can make a prediction on the test set with `svc.score(X_test, y_test)`. Training the SVM with the features explained above took around 10 minutes on my laptop.
### Sliding Window Search
#### 1. Describe how (and identify where in your code) you implemented a sliding window search. How did you decide what scales to search and how much to overlap windows?
In a first phase, I implemented a naive sliding window approach in order to get windows at different scales for the purpose of classification. This is shown in function `compute_windows_multiscale` in [`functions_detection.py`](functions_detection.py). This turned out to be very slow. I utlimately implemented a function to jointly search the region of interest and to classify each window as suggested by the course instructor. The performance boost is due to the fact that HOG features are computed only once for the whole region of interest, then subsampled at different scales in order to have the same effect of a multiscale search, but in a more computationally efficient way. This function is called `find_cars` and implemented in [`functions_feat_extraction.py`](functions_feat_extraction.py). Of course the *tradeoff* is evident: the more the search scales and the more the overlap between adjacent windows, the less performing is the search from a computational point of view.
#### 2. Show some examples of test images to demonstrate how your pipeline is working. What did you do to optimize the performance of your classifier?
Whole classification pipelin using CV approach is implemented in [`main_hog.py`](main_hog.py). Each test image undergoes through the `process_pipeline` function, which is responsbile for all phases: feature extraction, classification and showing the results.
<p align="center">
<img src="./img/pipeline_hog.jpg" alt="hog" height="256">
<br>Result of HOG pipeline
project_5_vehicle_detection.rar_detection object _machine vision
版权申诉
111 浏览量
2022-07-14
18:00:17
上传
评论
收藏 12.28MB RAR 举报
四散
- 粉丝: 51
- 资源: 1万+
最新资源
- 海尔618算价表_七海5.20_16.00xlsx(1)(2).xlsx
- WebCrawler.scr
- 【计算机专业毕业设计】大学生就业信息管理系统设计源码.zip
- YOLO 数据集:8种路面缺陷病害检测【包含划分好的数据集、类别class文件、数据可视化脚本】
- JAVA实现Modbus RTU或Modbus TCPIP案例.zip
- 基于YOLOv8的FPS TPS AI自动锁定源码+使用步骤说明.zip
- JAVA实现Modbus RTU或Modbus TCPIP案例.zip
- 基于yolov8+streamlit的火灾检测部署源码+模型.zip
- 测试aaaaaaabbbbb
- VID20240521070643.mp4
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈