Abstract
Many problems of recent interest in statistics and machine learning
can be posed in the framework of convex optimization. Due to the
explosion in size and complexity of modern datasets, it is increasingly
important to be able to solve problems with a very large number of fea-
tures or training examples. As a result, both the decentralized collection
or storage of these datasets as well as accompanying distributed solu-
tion methods are either necessary or at least highly desirable. In this
review, we argue that the alternating direction method of multipliers
is well suited to distributed convex optimization, and in particular to
large-scale problems arising in statistics, machine learning, and related
areas. The method was developed in the 1970s, with roots in the 1950s,
and is equivalent or closely related to many other algorithms, such
as dual decomposition, the method of multipliers, Douglas–Rachford
splitting, Spingarn’s method of partial inverses, Dykstra’s alternating
projections, Bregman iterative algorithms for
1
problems, proximal
methods, and others. After briefly surveying the theory and history of
the algorithm, we discuss applications to a wide variety of statistical
and machine learning problems of recent interest, including the lasso,
sparse logistic regression, basis pursuit, covariance selection, support
vector machines, and many others. We also discuss general distributed
optimization, extensions to the nonconvex setting, and efficient imple-
mentation, including some details on distributed MPI and Hadoop
MapReduce implementations.