oligoFAR 3.101 03-NOV-2009 1-NCBI
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
This file may be obsolete and will be removed - see man/oligofar.*
for documentation
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NAME
oligoFAR version 3.101 - global alignment of single or paired short reads
SYNOPSIS
usage: [-hV] [--help[=full|brief|extended]] [-U version]
[short-read-options] [-0 qbase] [-d genomedb] [-b snpdb] [-g guidefile]
[-v featfile] [-l gilist|-y seqID] [--hash-bitmap-file=file]
[-o output] [-O -eumxtdhz] [-B batchsz] [-s 1|2|3] [-k skipPos]
[--pass0 hash-options] [--pass1 hash-options]
[-a maxamb] [-A maxamb] [-P phrap] [-F dust] [-X xdropoff] [-Y bandhw]
[-I idscore] [-M mismscore] [-G gapcore] [-Q gapextscore]
[-D minPair[-maxPair]] [-m margin] [-R geometry]
[-p cutoff] [-x dropoff] [-u topcnt] [-t toppct] [-L memlimit] [-T +|-]
[--NaHSO3=yes|no]
where hash-options are:
[-w win[/word]] [-N wcnt] [-f wstep] [-r wstart] [-S stride] [-H bits]
[-n mism] [-e gaps] [-j ins] [-J del] [-E dist]
[--add-splice=pos([min:]max)] [--longest-del=val] [--longest-ins=val]
[--max-inserted=val] [--max-deleted=val]
and short-read-options are:
[-i reads.col] [-1 reads1] [-2 reads2] [-q 0|1|4] [-c yes|no]
EXAMPLES
oligofar -i pairs.tbl -d contigs.fa -b snpdb.bdb -l gilist -g pairs.guide \
-w 20/12 -B 250000 -H32 -n2 -p90 -D100-500 -m50 -Rp \
-L16G -o output -Omx
INPUT FORMAT OPTIONS
following combinations of input format and data flags are allowed:
1. with column file:
-q0 -i input.col -c no
-q1 -i input.col -c no
-q0 -i input.col -c yes
2. with fasta or fastq files:
-q0 -1 reads1.fa [-2 reads2.fa] -c yes|no
-q1 -1 reads1.faq [-2 reads2.faq] -c no
3. with Solexa 4-channel data
-q4 -i input.id -1 reads1.prb [-2 reads2.prb] -c no
See options and file formats for more info.
CHANGES
Following parameters are new, have changed or have disappeared
in version 3.25: -n, -w, -N, -S, -x, -f, -R
in version 3.26: -n, -w, -N, -z, -Z, -D, -m, -S, -x, -f, -k
in version 3.27: -n, -w, -e, -H, -S, -a, -A, --pass0, --pass1
in version 3.28: -y, -R, -N
in version 3.29: --NaHSO3 (Development)
in version 3.91: -X -Y -r -O --NaHSO3
in version 3.98: -x -g -O -B
in version 3.100: -v
in verison 3.101: -i -1 -2 -q -O
DESCRIPTION
Performs global alignments of multiple single or paired short reads
with noticeable error rate to a genome or to a set of transcripts
provided in a blast-db or a fasta file.
Reads may be provided as UIPACna base calls, possibly accompanied
with phrap scores (referred below as 1-channel quality scores),
or as 4-channel Solexa scores. Input file format is described
below in section FILE FORMATS.
Output of srsearch (referred below as guide-file) or of a similar
program which performs exact or nealy exact short read alignment
may be used as input for oligoFAR to ignore processing of perfectly
matched reads, but format the matches to output in uniform with
oligoFAR matches way.
Input is processed by batches of size controlled by option -B.
Reads to match are hashed (one window (unless option -N is used) per read,
preferrably at the 5' end) with a window size controlled by option -w.
Option -n controls how many mismatches are allowed within hashed values.
Option -a controls how many ambiguous bases withing a window of a read
may be hashed independently to mismatches allowed. Low quality 3' ends
of the reads may be clipped. Low complexity (controlled by -F argument)
and low quality reads may be ignored.
OligoFAR may use different implementations of the hash table (see -H):
vector (uses a lot of memory, but is faster for big batches) and
arraymap (lower memory requirements for smaller batches).
For vector -L should always be used and set to large value (GygaBytes).
Database is scanned. If database is provided as blastdb, it is
possible to limit scan to a number of gis with option -l. If snpdb is
provided, all variants of alleles are used to compute hash values, as well
as regular IUPACna ambiguities of the sequences in database. Option -A
controls maximum number of ambiguities in the same window.
Alignments are seeded by hash and may be extended by Smith-Watermann
algorithm (unless -X0 or -Y0 is used).
Alignments are filtered (see -p option). For paired reads geometrical
constraints are applied (reads of the same pair should be mutually
oriented according to -R option, distance is set by -D and -m
options). Then hits ranked by score (hits of the same score have same
rank, best hits have rank 0). Week hits or too repetitive hits are
thrown away (see -t and -u options).
At the end of each batch both alignments produced by oligoFAR and
alignments imported from guide-file which have passed filtering and
ranking get printed to output file (if set) or stdout (see FILE FORMATS
for output format).
NOTE
Since it is global alignment tool, independent runs against, say,
individual chromosomes and run against full genome will produce different
results.
To save disk space and computational resources, oligoFAR ranks hits by
score and reports only the best hits and ties to the best hits.
In the two-pass mode tie hits may be incompletely reported - in this
case only hits of same score as the best are guarranteed to appear in
output no matter what value of -t is set.
Scores of hits reported are in percent to the best score theoretically
possible for the reads. Scores of paired hits are sums of individual
scores, so they may be as high as 200.
PAIRED READS
Pairs are looked-up constrained by following requirements:
- relative orientation (geometry) which may be set by --geometry or -R
(see section OPTIONS subsection ``Filtering and ranking options'')
- distance between lowest position of the two reads and highest
position of the two reads one should be in range [ $a - $m ; $b + $m ]
where $a, $b and $m are arguments of parameters -D $a-$b -m $m.
If pair has no hits which comply constraints mentioned above, individual
hits for the pair components still will be reported. Also for each
component unpaired hits better then the best paired hit will be reported.
Paired reads have one ID per pair. Individual reads in this case do not
have individual ID, although report provides info which component(s) of
the pair produce the hit.
SODIUM BISULFITE TREATMENT
To discover methylation state of DNA sodium bisulfite curation may be
used before producing reads. In order to simulate this procedure
oligoFAR has special mode, which may be turned on by:
--NaHSO3=true
It is advised to use longer words and windows in this mode for better
performance.
This mode is not compatible with colorspace computations.
MULTIPASS MODE
By default oligoFAR aligns all reads just once, but if option --pass1 is
used, oligoFAR switches to the two-pass mode. Parameters -w, -n, -e, -H,
and some other, preceeding --pass1 or following --pass0 affect first run, same
parameters when follow --pass1 are for the second run. For the second run
only reads (or pairs) having more mismatches or indels then allowed in
parameters for the first pass will be hashed and aligned. So using something
like:
oligofar --pass0 -w22/22 -n0 -e0 --pass1 -w22/13 -n2 -e1
will pick up exact matches first, and then run search with less strict
parameters only for those reads which did not have exact hits.
WINDOW