Foundations and Trends
R
in Machine Learning
Vol. 9, No. 4-5 (2016) 249–429
c
2017 A. Cichocki et al.
DOI: 10.1561/2200000059
Tensor Networks for Dimensionality Reduction
and Large-Scale Optimization
Part 1 Low-Rank Tensor Decompositions
Andrzej Cichocki
RIKEN Brain Science Institute (BSI), Japan and
Skolkovo Institute of Science and Technology (SKOLTECH)
a.cichocki@riken.jp
Namgil Lee
RIKEN BSI, namgil.lee@riken.jp
Ivan Oseledets
Skolkovo Institute of Science and Technology (SKOLTECH) and
Institute of Numerical Mathematics of Russian Academy of Sciences
i.oseledets@skolkovotech.ru
Anh-Huy Phan
RIKEN BSI, phan@brain.riken.jp
Qibin Zhao
RIKEN BSI, qbzhao@brain.riken.jp
Danilo P. Mandic
Department of Electrical and Electronic Engineering
Imperial College London
d.mandic@imperial.ac.uk
Contents
1 Introduction and Motivation 250
1.1 Challenges in Big Data Processing . . . . . . . . . . . . . 251
1.2 Tensor Notations and Graphical Representations . . . . . . 252
1.3 Curse of Dimensionality and Generalized Separation of Vari-
ables for Multivariate Functions . . . . . . . . . . . . . . . 260
1.4 Advantages of Multiway Analysis via Tensor Networks . . . 268
1.5 Scope and Objectives . . . . . . . . . . . . . . . . . . . . 269
2 Tensor Operations and Tensor Network Diagrams 272
2.1 Basic Multilinear Operations . . . . . . . . . . . . . . . . 272
2.2 Graphical Representation of Fundamental Tensor Networks 292
2.3 Generalized Tensor Network Formats . . . . . . . . . . . . 310
3 Constrained Tensor Decompositions: From Two-way to Mul-
tiway Component Analysis 314
3.1 Constrained Low-Rank Matrix Factorizations . . . . . . . . 314
3.2 The CP Format . . . . . . . . . . . . . . . . . . . . . . . 317
3.3 The Tucker Tensor Format . . . . . . . . . . . . . . . . . 323
3.4 Higher Order SVD (HOSVD) for Large-Scale Problems . . 332
3.5 Tensor Sketching Using Tucker Model . . . . . . . . . . . 342
3.6 Multiway Component Analysis (MWCA) . . . . . . . . . . 351
ii
iii
3.7 Nonlinear Tensor Decompositions – Infinite Tucker . . . . 360
4 Tensor Train Decompositions: Graphical Interpretations and
Algorithms 363
4.1 Tensor Train Decomposition – Matrix Product State . . . 363
4.2 Matrix TT Decomposition – Matrix Product Operator . . . 369
4.3 Links Between CP, BTD Formats and TT/TC Formats . . 374
4.4 Quantized Tensor Train (QTT) – Blessing of Dimensionality 377
4.5 Basic Operations in TT Formats . . . . . . . . . . . . . . 383
4.6 Algorithms for TT Decompositions . . . . . . . . . . . . . 393
5 Discussion and Conclusions 407
Acknowledgements 409
References 410
Abstract
Modern applications in engineering and data science are increasingly
based on multidimensional data of exceedingly high volume, variety,
and structural richness. However, standard machine learning algo-
rithms typically scale exponentially with data volume and complex-
ity of cross-modal couplings - the so called curse of dimensionality -
which is prohibitive to the analysis of large-scale, multi-modal and
multi-relational datasets. Given that such data are often efficiently
represented as multiway arrays or tensors, it is therefore timely and
valuable for the multidisciplinary machine learning and data analytic
communities to review low-rank tensor decompositions and tensor net-
works as emerging tools for dimensionality reduction and large scale
optimization problems. Our particular emphasis is on elucidating that,
by virtue of the underlying low-rank approximations, tensor networks
have the ability to alleviate the curse of dimensionality in a number
of applied areas. In Part 1 of this monograph we provide innovative
solutions to low-rank tensor network decompositions and easy to in-
terpret graphical representations of the mathematical operations on
tensor networks. Such a conceptual insight allows for seamless migra-
tion of ideas from the flat-view matrices to tensor network operations
and vice versa, and provides a platform for further developments, prac-
tical applications, and non-Euclidean extensions. It also permits the
introduction of various tensor network operations without an explicit
notion of mathematical expressions, which may be beneficial for many
research communities that do not directly rely on multilinear algebra.
Our focus is on the Tucker and tensor train (TT) decompositions and
their extensions, and on demonstrating the ability of tensor networks
to provide linearly or even super-linearly (e.g., logarithmically) scalable
solutions, as illustrated in detail in Part 2 of this monograph.
A. Cichocki et al. Tensor Networks for Dimensionality Reduction and Large-Scale
Optimization Part 1 Low-Rank Tensor Decompositions. Foundations and Trends
R
in Machine Learning, vol. 9, no. 4-5, pp. 249–429, 2016.
DOI: 10.1561/2200000059.
1
Introduction and Motivation
This monograph aims to present a coherent account of ideas and
methodologies related to tensor decompositions (TDs) and tensor net-
works models (TNs). Tensor decompositions (TDs) decompose complex
data tensors of exceedingly high dimensionality into their factor (com-
ponent) tensors and matrices, while tensor networks (TNs) decompose
higher-order tensors into sparsely interconnected small-scale factor ma-
trices and/or low-order core tensors. These low-order core tensors are
called “components”, “blocks”, “factors” or simply “cores”. In this way,
large-scale data can be approximately represented in highly compressed
and distributed formats.
In this monograph, the TDs and TNs are treated in a unified way,
by considering TDs as simple tensor networks or sub-networks; the
terms “tensor decompositions” and “tensor networks” will therefore be
used interchangeably. Tensor networks can be thought of as special
graph structures which break down high-order tensors into a set of
sparsely interconnected low-order core tensors, thus allowing for both
enhanced interpretation and computational advantages. Such an ap-
proach is valuable in many application contexts which require the com-
putation of eigenvalues and the corresponding eigenvectors of extremely
high-dimensional linear or nonlinear operators. These operators typi-
cally describe the coupling between many degrees of freedom within
real-world physical systems; such degrees of freedom are often only
weakly coupled. Indeed, quantum physics provides evidence that cou-
plings between multiple data channels usually do not exist among all
250