CS 229 – Machine Learning https://stanford.edu/~shervine
VIP Refresher: Linear Algebra and Calculus
Afshine Amidi and Shervine Amidi
October 6, 2018
General notations
r Vector – We note x ∈ R
n
a vector with n entries, where x
i
∈ R is the i
th
entry:
x =
x
1
x
2
.
.
.
x
n
!
∈ R
n
r Matrix – We note A ∈ R
m×n
a matrix with m rows and n columns, where A
i,j
∈ R is the
entry located in the i
th
row and j
th
column:
A =
A
1,1
· · · A
1,n
.
.
.
.
.
.
A
m,1
· · · A
m,n
!
∈ R
m×n
Remark: the vector x defined above can be viewed as a n × 1 matrix and is more particularly
called a column-vector.
r Identity matrix – The identity matrix I ∈ R
n×n
is a square matrix with ones in its diagonal
and zero everywhere else:
I =
1 0 · · · 0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0 · · · 0 1
Remark: for all matrices A ∈ R
n×n
, we have A × I = I × A = A.
r Diagonal matrix – A diagonal matrix D ∈ R
n×n
is a square matrix with nonzero values in
its diagonal and zero everywhere else:
D =
d
1
0 · · · 0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0 · · · 0 d
n
Remark: we also note D as diag(d
1
,...,d
n
).
Matrix operations
r Vector-vector multiplication – There are two types of vector-vector products:
• inner product: for x,y ∈ R
n
, we have:
x
T
y =
n
X
i=1
x
i
y
i
∈ R
• outer product: for x ∈ R
m
, y ∈ R
n
, we have:
xy
T
=
x
1
y
1
· · · x
1
y
n
.
.
.
.
.
.
x
m
y
1
· · · x
m
y
n
∈ R
m×n
r Matrix-vector multiplication – The product of matrix A ∈ R
m×n
and vector x ∈ R
n
is a
vector of size R
m
, such that:
Ax =
a
T
r,1
x
.
.
.
a
T
r,m
x
=
n
X
i=1
a
c,i
x
i
∈ R
m
where a
T
r,i
are the vector rows and a
c,j
are the vector columns of A, and x
i
are the entries
of x.
r Matrix-matrix multiplication – The product of matrices A ∈ R
m×n
and B ∈ R
n×p
is a
matrix of size R
n×p
, such that:
AB =
a
T
r,1
b
c,1
· · · a
T
r,1
b
c,p
.
.
.
.
.
.
a
T
r,m
b
c,1
· · · a
T
r,m
b
c,p
=
n
X
i=1
a
c,i
b
T
r,i
∈ R
n×p
where a
T
r,i
, b
T
r,i
are the vector rows and a
c,j
, b
c,j
are the vector columns of A and B respec-
tively.
r Transpose – The transpose of a matrix A ∈ R
m×n
, noted A
T
, is such that its entries are
flipped:
∀i,j, A
T
i,j
= A
j,i
Remark: for matrices A,B, we have (AB)
T
= B
T
A
T
.
r Inverse – The inverse of an invertible square matrix A is noted A
−1
and is the only matrix
such that:
AA
−1
= A
−1
A = I
Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)
−1
=
B
−1
A
−1
r Trace – The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:
tr(A) =
n
X
i=1
A
i,i
Remark: for matrices A,B, we have tr(A
T
) = tr(A) and tr(AB) = tr(BA)
r Determinant – The determinant of a square matrix A ∈ R
n×n
, noted |A| or det(A) is
expressed recursively in terms of A
\i,\j
, which is the matrix A without its i
th
row and j
th
column, as follows:
Stanford University 1 Fall 2018