CONVEX OPTIMIZATION
FOR MACHINE LEARNING
CHANGHO SUH
Published, sold and distributed by:
now Publishers Inc.
PO Box 1024
Hanover, MA 02339
United States
Tel. +1-781-985-4510
www.nowpublishers.com
sales@nowpublishers.com
Outside North America:
now Publishers Inc.
PO Box 179
2600 AD Delft
The Netherlands
Tel. +31-6-51115274
ISBN: 978-1-63828-052-1
E-ISBN: 978-1-63828-053-8
DOI: 10.1561/9781638280538
Copyright © 2022 Changho Suh
Suggested citation: Changho Suh. (2022). Convex Optimization for Machine Learning. Boston–Delft:
Now Publishers
The work will be available online open access and governed by the Creative Commons
“Attribution-Non Commercial” License (CC BY-NC), according to https://creativecommons.or
g/licenses/by-nc/4.0/
We would like to express our sincere gratitude to Hyun Seung Suh, who has provided a flurry of
constructive comments and feedback on the structure of the book, writing, and readability from a
beginner’s viewpoint.
This work was supported by the 2019 Google Education Grant; and Institute of Information &
Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government
(MSIT) (2020-0-00626, Ensuring high AI learning performance with only a small amount of training
data).
Table of Contents
Preface v
Chapter 1 Convex Optimization Basics 1
1.1 Overview of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Definition of Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Tractability of Convex Optimization and Gradient Descent . . . . . . . 18
Problem Set 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 Linear Program (LP). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 LP: Examples and Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.6 LP: Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.7 LP: CVXPY Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Problem Set 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.8 Least Squares (LS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
1.9 LS: Test Error, Regularization and CVXPY Implementation . . . . . . 64
1.10 LS: Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Problem Set 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
1.11 Quadratic Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
1.12 Second-order Cone Program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
1.13 Semi-definite Program (SDP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
1.14 SDP Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Problem Set 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
iii
iv Table of Contents
Chapter 2 Duality 124
2.1 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
2.2 Interior Point Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Problem Set 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2.3 Proof of Strong Duality Theorem (1/2) . . . . . . . . . . . . . . . . . . . . . . . . 143
2.4 Proof of Strong Duality Theorem (2/2) . . . . . . . . . . . . . . . . . . . . . . . . 150
Problem Set 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
2.5 Weak Duality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.6 Lagrange Relaxation for Boolean Problems . . . . . . . . . . . . . . . . . . . . . 169
2.7 Lagrange Relaxation for the MAXCUT Problem . . . . . . . . . . . . . . . . 175
Problem Set 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Chapter 3 Machine Learning Applications 185
3.1 Supervised Learning and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 185
3.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
3.3 Deep Learning I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
3.4 Deep Learning II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
3.5 Deep Learning: TensorFlow Implementation . . . . . . . . . . . . . . . . . . . 221
Problem Set 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
3.6 Unsupervised Learning: Generative Modeling . . . . . . . . . . . . . . . . . . . 240
3.7 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . . . . . . . 246
3.8 GANs: TensorFlow Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Problem Set 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
3.9 Wasserstein GAN I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
3.10 Wasserstein GAN II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
3.11 Wasserstein GAN: TensorFlow Implementation . . . . . . . . . . . . . . . . . 285
Problem Set 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
3.12 Fair Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
3.13 A Fair Classifier and Its Connection to GANs. . . . . . . . . . . . . . . . . . . 307
3.14 A Fair Classifier: TensorFlow Implementation . . . . . . . . . . . . . . . . . . . 313
Problem Set 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Appendix A Python Basics 329
A.1 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
A.2 Basic Python Syntaxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Appendix B CVXPY Basics 343
Appendix CTensorFlow and Keras Basics 348
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Preface
Features of the book The writing of this book was prompted by the huge surge
of research activities in machine learning and deep learning, and the crucial roles of
convex optimization in the spotlight fields. This forms the motivation of this book,
enabling three key features.
The first feature of this book is the focus of optimization contents tailored for
modern applications in machine learning and deep learning. Since the optimiza-
tion field has contributed to many disciplines, ranging from statistics, dynamical
systems & control, complexity theory to algorithms, there are numerous optimiza-
tion books that have been published with great interest during the past decades. In
addition, the optimization discipline has been applied to a widening array of con-
texts, including science, engineering, economics, finance and management. Hence,
the books contain a broad spectrum of contents to cover many applications in var-
ious areas. On the other hand, this book focuses on a single yet prominent field:
machine learning. Among the vast contents, we put a special emphasis on the con-
cepts and theories concerning machine learning and deep learning. Moreover, we
employ many toy examples to ease illustration of some important concepts. Exam-
ples include historical and canonical instances, as well as the ones that arise in
machine learning.
Second, this book is written in a lecture style. A majority of optimization books
deal with many mathematical concepts and theories together with numerous appli-
cations in a wide variety of domains. So the concepts and relevant theories are sim-
ply enumerated to present topics sequentially, following a dictionary style organi-
zation. While the dictionary style eases search for targeted materials, it often lacks
coherent stories that may help motivate readers. This books aims at motivating
readers who are interested in machine learning inspired by optimization funda-
mentals. So we intend to make an interesting storyline that well demonstrates the
v