TAUCS
A Library of
Sparse
Linear Solvers
S T
School of Computer Science
Tel-Aviv University
stoledo@tau.ac.il
http://www.tau.ac.il/~stoledo/taucs
4th September 2003
With contributions by:
D C Maximum-weight-basis and Vaidya preconditioners
V R Out-of-core sparse Cholesky factorization
O M New configuration system
This document is the user manual for version 2.2 of T. Version 2.2 is the first to support
multithreading.
The main new innovations in Version 2.1 were a new build and configuration system, and a unified
interface to all the linear solvers. Smaller innovations in 2.1 include compilation with no warnings
on several platforms, and out-of-the-box builds for Windows and MacOS (in addition to Linux,
Irix, and Solaris, which were already well supported).
Version 2.0 was the first version to support both real and complex data type (both in single and
double precisions). As a consequence, the interfaces to subroutines this version are somewhat
different than in version 1.0.
Contents
1 Preliminaries 2
1.1 Introduction ....................................... 2
1.2 License .......................................... 5
2 Installation and Configuration 5
2.1 QuickStart........................................ 5
2.2 AnOverviewoftheConfigurationandBuildProcess................. 6
2.3 Configuration ...................................... 6
1
Preliminaries 2
2.4 Controlling Build Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 VariantBuilds ...................................... 8
2.6 BuildingaMultithreadedLibrary............................ 9
2.7 Testing .......................................... 9
2.8 ObscureDetails ..................................... 10
3 Using T without Programming 10
3.1 Ready-to-useExecutables................................ 10
3.2 CallingTfromM .............................. 11
3.3 TRoutinesinotherSoftware ........................... 11
4TFundamentals 12
4.1 SparseMatrixRepresentationandInterfaceConventions .............. 12
4.2 Vectors .......................................... 14
4.3 UtilityRoutines ..................................... 15
4.4 ErrorCodes ....................................... 15
5 The Unified Linear Solver 16
5.1 UsagebyExample .................................... 16
5.2 MoreonOptionsandtheirArguments ........................ 17
5.3 ACatalogofOptions .................................. 18
6 Matrix Reordering 19
7 Sparse Direct Linear Solvers 20
7.1 In-CoreSparseSymmetricFactorizations ....................... 20
7.2 Out-of-CoreSparseSymmetricFactorizations..................... 22
7.3 Out-of-CoreSparseUnsymmetricFactorizations ................... 23
7.4 InverseFactorizations.................................. 24
8 Iterative Linear Solvers 24
9 Preconditioners for Iterative Linear Solvers 25
9.1 Drop-ToleranceIncompleteCholesky ......................... 25
9.2 Maximum-Weight-Basis(Vaidya’s)Preconditioners ................. 25
9.3 Multilevel Support-Graph Preconditioners (Including Gremban-Miller Precondi-
tioners).......................................... 26
10 Matrix Generators 27
1 Preliminaries
1.1 Introduction
T is a C library of sparse linear solvers. The current version of the library includes the
following functionality:
Multifrontal Supernodal Cholesky Factorization. This code is quite fast (several times faster
than M 6’s sparse Cholesky). It uses the and to factor and compute
updates from supernodes. It uses relaxed and amalgamated supernodes. This routine is
multithreaded.
Preliminaries 3
Left-Looking Supernodal Cholesky Factorization. Slower than the multifrontal solver but uses
less memory.
Out-of-core Sparse Choleksy Factorization. This is a supernodal left-looking factorization code
with an associated solve routine that can solve very large problems by storing the Cholesky
factor on disk. See [13] for further details.
Out-of-core Sparse Pivoting LU Factorization. This is a supernodal left-looking factorization
code with an associated solve routine that can solve very large problems by storing the LU
factors on disk. The algorithm is a supernodal version of the algorithm described in [8].
Drop-Tolerance Incomplete-Cholesky Factorization. Much slower than the supernodal solvers
when it factors a matrix completely, but it can drop small elements from the factorization. It
can also modify the diagonal elements to maintain row sums. The code uses a column-based
left-looking approach with row lists.
LDL
T
Factorization. Column-based left-looking with row lists. Use the supernodal codes in-
stead, since they are faster, unless you really need an LDL
T
factorization and not an LL
T
Cholesky factorization.
Ordering Codes and Interfaces to Existing Ordering Codes. The library includes a unified in-
terface to several ordering codes, mostly existing ones. The ordering codes include Joseph
Liu’s
genmmd (a minimum-degree code in Fortran), Tim Davis’s amd codes (approximate
minimum degree), M (a nested-dissection/minimum-degree code by George Karypis
and Vipin Kumar), and a special-purpose minimum-degree code for no-fill ordering of
tree-structured matrices. All of these are symmetric orderings. The library also includes
an interface to Tim Davis’s
colamd column ordering code for LU factorization with partial
pivoting.
Matrix Operations. Matrix-vector multiplication, triangular solvers, matrix reordering.
Matrix Input/Output. Routines to read and write sparse matrices using a simple file format with
one line per nonzero, specifying the row, column, and value.
Matrix Generators. Routines that generate finite-differences discretizations of 2- and 3-
dimensional partial differential equations. Useful for testing the solvers.
Iterative Solvers. Preconditioned conjugate-gradients and preconditioned (See [1], for
example).
Support-Graph Preconditioners. These preconditioners construct a matrix larger than the co-
efficient matrix and use the Schur complement of the larger matrix as the preconditioner.
The construction routine can construct Gremban-Miller preconditioners [9, 10] along with
other (yet undocumented) variants.
Vaidya’s Preconditioners. Augmented Maximum-weight-basis and Maximum-spanning-tree
preconditioners [2, 4, 6, 16]. These preconditioners work by dropping nonzeros from
the coefficient matrix and them factoring the preconditioner directly.
Recursive Vaidya’s Preconditioners. These preconditioners [3, 12, 16] also drop nonzeros, but
they don’t factor the resulting matrix completely. Instead, they eliminate rows and columns
which can be eliminated without producing much fill. They then form the Schur complement
of the matrix with respect to these rows and columns and drop elements from the Schur
complement, and so on. During the preconditioning operation, we solve for the Schur
complement elements iteratively.
Preliminaries 4
Utility Routines. Timers (wall-clock and CPU time), physical-memory estimator, and logging.
The routines that you are not likely to find in other libraries of sparse linear solvers are the direct
supernodal solvers, the out-of-core solvers, and Vaidya’s preconditioners. The supernodal solvers
are fast and not many libraries include them; in particular, I don’t think any freely-distributed
library includes a sparse Cholesky factorization that is as fast as T’s multifrontal code. I am
not aware of any othe library at all that includes efficient out-of-core sparse factorizations.
As of version 2.0, the direct solvers work on real and complex matrices, single or double
precision. The iterative solvers work on real matrices only.
To get a sense of the speed of the in-core multifrontal sparse Cholesky routine, let’s compare
it to M 6’s sparse Cholesky solver. On a 600 × 600 mode problem (matrix order is 360000)
T reorders the matrix using a minimum degree code that results in a Cholesky factor with
approximately 12 million nonzeros. T factors the reordered matrix in 15.6 seconds, whereas
M 6 takes 81.6 seconds to perform the same factorization, more than 5 times slower. The
ratio is probably even higher on 3D meshes. (These experiments were performed with version 1.0
of the library on one processor of a 600MHz dual-Pentium III computer running Linux.)
T is easy to use and easy to cut up in pieces. It uses a nearly trivial design with only
one externally-visible structure. If you need to use just a few routines from the library (say, the
supernodal solvers), you should be able to compile and use almost only the files that include these
routines; there are not many dependences among source files. The new configuration system,
introduced in Version 2.1 makes it almost trivial to build a subset library that contains only the
routines that you need (and the ones they depend on).
Two minor design goals that the library does attempt to achieve is avoidance of name-space
pollution and clean failures. All the C routines in the library start with the prefix
taucs and so do
the name of structures and preprocessor macros. Therefore, you should not have any problems
using the library together with other libraries. Also, the library attempts to free all the memory
it allocates even if it fails, so you should not worry about memory leaks. This also allows you
to try to call a solver in your program, and if it fails, simply call another. The failed call to the
first solver should not have any side effects. In particular, starting in version 2.0 we use special
infrastructure to find and eliminate memory leaks. This infrastructure allows us to ensure that no
memory remains allocated after the user’s program calls the appropriate
free routines, and that
no memory remains allocated in case of failures. This infrastructure also allows us to artificially
induce failures; we use this feature to test the parts of the code that handle failures (e.g., failures
of
malloc), parts that are normally very rarely used.
The library is currently sequential. You can use parallelized , which may give some
speedup on shared-memory multiprocessors. We have an experimental parallel version of the
multifrontal Cholesky factorization, but it is not part of this release.
A Preview of Things to Come
The next versions of the library should include
• In-core and out-of-core sparse symmetric indefinite factorizations.
• High-performance multithreaded LU factorization for unsymmetric matrices.
• A drop-tolerance incomplete LU factorization and nonsymmetric iterative solvers. The
code is written but some of it needs to be converted from Fortran to C and it needs to be
integrated into the library.
More distant versions may include
• A multithreaded version of the supernodal Cholesky factorizations.
Installation and Configuration 5
Your input is welcome regarding which features you would like to see. We have implemented
quite a few features as a direct response to users’s requests (e.g., the complex routines and the
out-of-core sparse LU), so don’t be shy!
1.2 License
T comes with no warranty whatsoever and is distributed under the GNU LGPL (Library or
Lesser GNU Public Library). The license is available in
www.gnu.org. Alternatively, you can also
elect to use T under the following -style license, which is simpler to understand
than the LGPL:
T Version 1.0, November 29, 2001. Copyright (c) 2001 by Sivan Toledo, Tel-Aviv
Univesity, stoledo@tau.ac.il. All Rights Reserved.
TAUCS License:
Your use or distribution of TAUCS or any derivative code implies that you agree to
this License OR to the GNU LGPL.
THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
EXPRESSED OR IMPLIED. ANY USE IS AT YOUR OWN RISK.
Permission is hereby granted to use or copy this program, provided that the Copyright,
this License, and the Availability of the original version is retained on all copies. User
documentation of any code that uses this code or any derivative code must cite the
Copyright, this License, the Availability note, and "Used by permission." If this code or
any derivative code is accessible from within M, then typing "help taucs" must
cite the Copyright, and "type taucs" must also cite this License and the Availability
note. Permission to modify the code and to distribute modified code is granted,
provided the Copyright, this License, and the Availability note are retained, and a
notice that the code was modified is included. This software is provided to you free
of charge.
The distribution also includes the AMD symmetric ordering routines, which come under a
different, more restrictive license. Please consult this license in the source files (say
src/amdtru.f).
You can compile and use the library without these routines if you cannot accept their license.
2 Installation and Configuration
This section explains how to build and how to configure it to suit different needs and
different platforms. The configuration and build system described here was introduced in Ver-
sion 2.1 of . This system simplifies the installation process, allows the user to control the
configuration of the library, and allows the user to maintain in a single diretory tree builds with
different configurations and for different platforms.
2.1 Quick Start
In the top-level directory that you unpacked, type configure and then type make (or nmake on
windows). This should build the library and a few test programs. If this fails, or if you need to
tune the library to your needs, you’ll have to read a bit more. If this succeeds, the build process
will put the resulting library in the directory
lib/OSTYPE,whereOSTYPE stands for the operating
system, such as
win32 (Windows), darwin (MacOS X), linux, solaris, irix, aix,andsoon.
Test programs will reside in
bin/OSTYPE, and object files, which you probably do not need ,will
be stored in
obj/OSTYPE.
- 1
- 2
- 3
- 4
- 5
- 6
前往页