SSA Toolbox 1.4 — Manual
Jan Saputra M¨uller, Paul von B¨unau,
Frank C. Meinecke, Franz J. Kir´aly, Klaus-Robert M¨uller
December 3, 2018
Contents
1 Introduction and Overview 2
2 Installation and Running 2
3 Stationary Subspace Analysis in Brief 4
4 Input, Output and Parameters 5
5 Examples and Toy Data 9
6 Frequently Asked Questions 12
7 Developers 14
8 Contact and Support 17
Appendices 18
A Interpreting the objective function value 18
B Measuring the Error of SSA 20
1
1 Introduction and Overview
This is the manual for the SSA Toolbox, an efficient open-source implementa-
tion of the Stationary Subspace Analysis [1] algorithm. Stationary Subspace
Analysis (SSA) is a general purpose algorithm for the explorative analysis of
non-stationary data, i.e. data whose statistical properties change over time.
SSA can help to detect, characterize and visualize temporal changes in complex
high-dimensional data sets.
The SSA Toolbox is written entirely in Java and is thus platform-independent.
It has been tested successfuly under Windows, Linux and MacOS. The SSA
Toolbox comes with a state-of-the-art native Linear Algebra library (BLAS,LAPACK).
For maximum platform independence, the user can also choose the purely Java-
based library COLT. Data and results can be imported and exported as comma-
separated values .csv files, the fail-safe format of last resort, and through Mat-
lab’s proprietary .mat files, a de-facto standard in the Machine Learning com-
munity.
The source code of the SSA Toolbox is fully documented (using the JavaDoc
standard) and accompanied by a set of unit tests written in JUnit. The latest
version is always available from github
1
, a hosting service for the git versioning
system. Section 7 contains further information for developers, including a high-
level overview of the class structure.
There are four ways to use the SSA Toolbox
1. As a standalone application with a graphical user interface.
2. As a standalone application from the command line.
3. From within Matlab, using the wrapper script ssa.m.
4. As a Java library, from your own application.
2 Installation and Running
Obtaining the latest SSA Toolbox The latest version of the SSA Toolbox
is available from the offical SSA homepage:
http://www.stationary-subspace-analysis.org/toolbox
There you can also find pointers to further references, example data and a link
to the SSA mailing list.
Platforms The SSA Toolbox is written in the platform-independent Java pro-
gramming language; platform-specific numerical libraries
2
are included for sev-
eral target architectures. The SSA Toolbox requires the Java Runtime Envi-
ronment
3
version 1.5 or later. Most operating systems have a Java Runtime
1
See http://https://github.com/paulbuenau/SSA-Toolbox
2
BLAS+LAPACK provided through jblas (see http://www.jblas.org).
3
See http://www.java.com/getjava
2
Environment pre-installed, you might be able to find out the version by typing
java -version on the command line. The SSA Toolbox has been tested on the
following platforms.
• Microsoft Windows (32 and 64 bit)
• Linux (32 and 64 bit)
• Mac OS X (32 and 64 bit)
Installation and Running The SSA Toolbox comes as a single .zip or
.tar.gz archive. After unpacking, you can start the SSA Toolbox by open-
ing the file ssa.jar with the default method of your operating system, e.g. by
double-clicking on it under Microsoft Windows, OS X and some Linux distribu-
tions.
You can also manually invoke the SSA Toolbox by typing
java -jar ssa.jar
on the command line of your operating system.
In some cases, if you want to run SSA on very large data sets, it might be
necessary to start the toolbox with a higher amount of Java heap space (in those
cases the SSA toolbox will inform you about this issue). There is a section in
the Frequently Asked Questions which explains how to do that (see Section 6).
If you want to use the SSA Toolbox directly from Matlab, you can use the
wrapper script ssa.m. Type help ssa on the Matlab command line to find
out about the format of its input and output parameters. Note that if you
invoke the SSA Toolbox from within Matlab, it will use its internal JVM unless
you specify an external JVM, e.g. using the environment variable MATLAB JAVA
under Linux.
You can also use the toolbox from the command line. In this case, you have
to pass options after java -jar ssa.jar. The following table shows the avail-
able options:
3
Option Meaning/Argument
-i Input file (in .csv or .mat format). Required.
-o Output file or directory. If it ends with .mat a Matlab file will be
created, otherwise .csv files are created in the specified directory.
Required.
-d Number of stationary sources. Required.
-r Number of restarts. Optional. Default: 5
-n Number of equally-sized epochs. Optional. If this option is not
specified, and no custom epochization has been given, a heuristic
is used to determine the number of epochs.
-e Epochization file in .csv format. Optional.
-m Use the means during optimization. Has to be 0 or 1. Optional.
Default: 1
-c Use the covariances during optimization. Has to be 0 or 1. Op-
tional. Default: 1
-s Random seed. Optional.
-j Use jBlas instead of Colt. Has to be 0 or 1. Optional. Default: 0
For example, the following line would run SSA on the example data set:
java -jar ssa.jar -i example data/example data.mat -d 2 -o results.mat
3 Stationary Subspace Analysis in Brief
Stationary Subspace Analysis [1] factorizes a multivariate time-series into its sta-
tionary and non-stationary components. That is, we assume that the data gen-
erating system consists of d stationary source signals s
s
(t) = [s
1
(t), . . . , s
d
(t)]
>
and D − d non-stationary source signals s
n
(t) = [s
d+1
(t), . . . , s
D
(t)]
>
and that
the observed signals x(t) are a linear superposition of these sources,
x(t) = As(t) =
A
s
A
n
s
s
(t)
s
n
(t)
(1)
where A is an invertible matrix. Note that we do not assume that the sources
s(t) are independent. We refer to the spaces spannend by the columns of A
s
and A
n
as the stationary (s-) and non-stationary (n-) space respectively.
The SSA algorithm factorizes the observed signals x(t) according to Equa-
tion 1, i.e. it finds a linear transformation
ˆ
A
−1
=
ˆ
P
s
ˆ
P
n
(2)
that separates the s-sources from the n-sources. The inverse of the estimated
demixing matrix
ˆ
A
−1
is the estimated mixing matrix,
ˆ
A =
ˆ
A
s
ˆ
A
n
, (3)
4
and the estimated stationary and non-stationary sources are thus given by
ˆ
s
s
(t) =
ˆ
P
s
x(t) (4)
ˆ
s
n
(t) =
ˆ
P
n
x(t) (5)
respectively. Note that only the s-projection and the n-space are uniquely iden-
tifiable. The projection to the n-sources
ˆ
P
n
is found by maximizing the non-
stationarity of the estimated n-sources.
The SSA Toolbox allows for input and output in (time × channel) and
(channel × time) format. Note that the above definitions of sources, projections
and basis correspond to the (channel × time) format.
4 Input, Output and Parameters
The input to the SSA Toolbox consists of
• Data: the time series x(t), either as (channels×time) or (channels×time).
• Segmentation of the time series x(t) into epochs, either
– equally-sized, where the number of epochs is supplied by the user; or
– equally-sized, where the number of epochs is set automatically by a
heuristic;
– according to a user-supplied custom epoch definition.
• Parameters to the SSA Algorithm (see Section 4.3).
The parameters are set via the graphical user interface. The time series
x(t) and a custom epoch definition can be loaded from comma-separated values
(CSV) and Matlab (.mat) files.
4.1 Comma-Separated-Values File Format
Comma Separated Values (CSV) files are human-readable text files for storing
tabular data. The columns are separated by commas and each line of the file
corresponds to a row. Lines starting with a hash (#) are ignored. If the data has
more rows than columns, then each row will be interpreted as a time point and
each column as a channel. Otherwise, the format is assumed to be (channels ×
time). See Figure 1 for an example time series file.
A custom segmentation of the time series into epochs can be specified by
means of a separate CSV file, which must have the same number of rows as the
time series and one column. The entries correspond to the index (starting with
1) of the epoch that a time point belongs to. Figure 2 shows an example CSV
file for a segmentation of the time series into custom epochs.
5
评论0