1
MATH 304 - Numerical Analysis and Optimization
Project – Traffic flow prediction by using LSR and SVR
Haitong Lin
hl345@duke.edu
Abstract
In this project, I use Least Square Regression (LSR) and
Support Vector Regression (SVR) to predict the traffic flow
information in June 2016. This report contains the overview of
this prediction project, related mathematical formulation and
implementation. The experimental results and discussion of the
results are also included in this report, which contains several
comparisons between different models and also analysis of
different performances within each model.
1. Overview
This project uses dataset collected by Highways-England
[1]
,
which shows the traffic flow information of June 2016. I use the
Thursdays traffic flow information for this project. The training
set is the first four Thursdays data (June 2, 9, 16 and 23), and
the test data set is the data from June 30. I used Matlab for this
project.
In the first part of the project, I implement Least Square
Regression (LSR) models for prediction. I tried 9 different
models for LSR (n=1,2,3……,9), fitted the models with training
data and tested the derived models with test data. I compare
errors for different models and obtained the best model that has
the smallest error on both train and test data.
In the second part of the project, I implement Support Vector
Regression (SVR) models for prediction. I implement three
different kernels to train the models: gaussian, RBF and
polynomial. For each kernel, I test three different settings (one
default setting and two other personal selected settings) and also
the optimized setting.
Finally, I selected the best models from LSR, SVR (one best
model for each kernel) collectively and make visualizations to
display the differences.
2. Mathematical formulation and implementation
Least Square Regression (LSR) aims at minimizing the sum of
the squares of the residuals between the measured y and the y
calculated by the model, which fits a unique line for a given set
of data
[2]
. LSR can be extended to polynomial regression, which
is what I used in this project. The logic of LSR is to minimize
the prediction error.
Given 𝐴 ∈ 𝑅
mxn
and 𝐵 ∈ 𝑅
𝑚
, a general solution for LSR
problems are obtained by minimizing ||𝐴𝑋−𝐵||
2
, which is
𝑋 = (𝐴
T
𝐴)
-1
𝐴
T
𝐵
Figure 1 Flowchart for LSR
The flowchart showcases my implementation process for the
LSR part in this project. I firstly loaded the data, specified
training data and test data. Then I used the training data set to fit
the LSR models with different polynomials, from n=1 to n=9,
respectively. Then I fit the test data into the trained models,
compared the predicted values with the actual data. The results
and discussion are stated in later sections of this report.
Support Vector Machine (SVM) is a well-known classification
algorithm, which separates two sets of points by a maximum of
margin. SVM can also be applied to regression analysis
problems, which is referred to as Support Vector Regression
(SVR)
[3]
. For the second part of this project, I applied SVR
models for traffic flow prediction.
In SVR, the object is to minimize the error while maximizing
the margin, keeping in mind that some errors are tolerated.
Figure 2 below illustrates the SVR.