Copyright (C) 2018 Howard Hughes Medical Institute.
HMMER and its documentation are freely distributed under the 3-Clause BSD open source license. For a
copy of the license, see opensource.org/licenses/BSD-3-Clause.
HMMER development is supported in part by the National Human Genome Research Institute of the US
National Institutes of Health under grant number R01HG009116. The content is solely the responsibility of
the authors and does not necessarily represent the official views of the National Institutes of Health.
Contents
Introduction 7
How to avoid reading this manual . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Background and brief history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Problems HMMER is designed for . . . . . . . . . . . . . . . . . . . . . . . . . . 9
HMMER uses ensemble algorithms, not optimal alignment . . . . . . . . . . . . 10
Assumptions and limitations of profile HMMs . . . . . . . . . . . . . . . . . . . 12
How to learn more . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
How to cite HMMER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
How to report a bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
When’s HMMER4 coming? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
What’s still missing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
How to avoid using this software (links to similar software) . . . . . . . . . . . 15
Installation 17
Quickest: install a precompiled binary package . . . . . . . . . . . . . . . . . . 17
Quick-ish: compile the source code . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Geeky: compile source from our github repository . . . . . . . . . . . . . . . . . 18
Gory details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
System requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Multicore parallelization is default . . . . . . . . . . . . . . . . . . . . . . . 20
MPI cluster parallelization is optional . . . . . . . . . . . . . . . . . . . . . 21
Using build directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Makefile targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Compiling the user guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
What gets installed by make install, and where? . . . . . . . . . . . . . . . . 22
Installing both HMMER2 and HMMER3 . . . . . . . . . . . . . . . . . . . 23
Seeing more output from make . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Staged installations in a buildroot, for a packaging system . . . . . . . . . 24
Workarounds for unusual configure/compilation problems . . . . . . . . 24
Tutorial 27
Tap, tap; is this thing on? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
The programs in HMMER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Running a HMMER program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 sean r. eddy
Files used in the tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
On sequence file formats, briefly . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Searching a sequence database with a profile . . . . . . . . . . . . . . . . . . . . 30
Step 1: build a profile with hmmbuild . . . . . . . . . . . . . . . . . . . . . . . 31
Step 2: search the sequence database with hmmsearch . . . . . . . . . . . 32
Single sequence protein queries using phmmer . . . . . . . . . . . . . . . . . . . 41
Iterative protein searches using jackhmmer . . . . . . . . . . . . . . . . . . . . . 42
Searching a profile database with a query sequence . . . . . . . . . . . . . . . . 44
Step 1: create a profile database file . . . . . . . . . . . . . . . . . . . . . . 44
Step 2: compress and index the flatfile with hmmpress . . . . . . . . . . . 46
Step 3: search the profile database with hmmscan . . . . . . . . . . . . . . 46
Creating multiple alignments with hmmalign . . . . . . . . . . . . . . . . . . . 48
Searching DNA sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Step 1: build a profile with hmmbuild . . . . . . . . . . . . . . . . . . . . . 50
Step 2: search the DNA sequence database with nhmmer . . . . . . . . . . 51
The HMMER profile/sequence comparison pipeline 55
Null model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
MSV filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Biased composition filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Viterbi filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Forward filter/parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Domain definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Modifications to the pipeline as used for DNA search . . . . . . . . . . . . . . . 63
SSV, not MSV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
There are no domains, but there are envelopes . . . . . . . . . . . . . . . . 64
Biased composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Tabular output formats 65
The target hits table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
The domain hits table (protein search only) . . . . . . . . . . . . . . . . . . . . . 68
Manual pages for HMMER programs 71
alimask - calculate and add column mask to a multiple sequence alignment . . 71
hmmalign - align sequences to a profile . . . . . . . . . . . . . . . . . . . . . . . . 75
hmmbuild - construct profiles from multiple sequence alignments . . . . . . . . . 77
hmmconvert - convert profile file to various formats . . . . . . . . . . . . . . . . . 83
hmmemit - sample sequences from a profile . . . . . . . . . . . . . . . . . . . . . . 84
hmmfetch - retrieve profiles from a file . . . . . . . . . . . . . . . . . . . . . . . . . 87
hmmlogo - produce a conservation logo graphic from a profile . . . . . . . . . . . 89
hmmpgmd - daemon for database search web services . . . . . . . . . . . . . . . . . 90
hmmpress - prepare a profile database for hmmscan . . . . . . . . . . . . . . . . . 92
hmmscan - search sequence(s) against a profile database . . . . . . . . . . . . . . . 93
hmmsearch - search profile(s) against a sequence database . . . . . . . . . . . . . 98
hmmsim - collect profile score distributions on random sequences . . . . . . . . . 103
hmmer user’s guide 5
hmmstat - summary statistics for a profile file . . . . . . . . . . . . . . . . . . . . 109
jackhmmer - iteratively search sequence(s) against a sequence database . . . . . . 111
makehmmerdb - build nhmmer database from a sequence file . . . . . . . . . . . . 120
nhmmer - search DNA queries against a DNA sequence database . . . . . . . . . 121
nhmmscan - search DNA sequence(s) against a DNA profile database . . . . . . . 129
phmmer - search protein sequence(s) against a protein sequence database . . . . . 134
Manual pages for Easel miniapps 141
esl-afetch - retrieve alignments from a multi-MSA database . . . . . . . . . . . 141
esl-alimanip - manipulate a multiple sequence alignment . . . . . . . . . . . . . 143
esl-alimap - map two alignments to each other . . . . . . . . . . . . . . . . . . . 147
esl-alimask - remove columns from a multiple sequence alignment . . . . . . . 149
esl-alimerge - merge alignments based on their reference (RF) annotation . . . . 155
esl-alipid - calculate pairwise percent identities for all sequence . . . . . . . . . 157
esl-alirev - reverse complement a multiple alignment . . . . . . . . . . . . . . . 158
esl-alistat - summarize a multiple sequence alignment file . . . . . . . . . . . 160
esl-compalign - compare two multiple sequence alignments . . . . . . . . . . . . 163
esl-compstruct - calculate accuracy of RNA secondary structure predictions . . . 165
esl-construct - describe or create a consensus secondary structure . . . . . . . . 167
esl-histplot - collate data histogram, output xmgrace datafile . . . . . . . . . . 169
esl-mask - mask sequence residues with X’s (or other characters) . . . . . . . . . 170
esl-reformat - convert sequence file formats . . . . . . . . . . . . . . . . . . . . . 172
esl-selectn - select random subset of lines from file . . . . . . . . . . . . . . . . 175
esl-seqrange - determine a range of sequences for one of many parallel . . . . . 176
esl-seqstat - summarize contents of a sequence file . . . . . . . . . . . . . . . . 177
esl-sfetch - retrieve (sub-)sequences from a sequence file . . . . . . . . . . . . . 178
esl-shuffle - shuffling sequences or generating random ones . . . . . . . . . . . 181
esl-ssdraw - create postscript secondary structure diagrams . . . . . . . . . . . . 184
esl-translate - translate DNA sequence in six frames into individual . . . . . . 195
esl-weight - calculate sequence weights in MSA(s) . . . . . . . . . . . . . . . . . 198
Input files and formats 199
Reading from files, compressed files, and pipes . . . . . . . . . . . . . . . . . . 199
.gz compressed files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
HMMER profile HMM files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
header section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
main model section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Stockholm, the recommended multiple sequence alignment format . . . . . . . 208
syntax of Stockholm markup . . . . . . . . . . . . . . . . . . . . . . . . . . 209
semantics of Stockholm markup . . . . . . . . . . . . . . . . . . . . . . . . 209
recognized #=GF annotations . . . . . . . . . . . . . . . . . . . . . . . . . . 210
recognized #=GS annotations . . . . . . . . . . . . . . . . . . . . . . . . . . 210
recognized #=GC annotations . . . . . . . . . . . . . . . . . . . . . . . . . . 211
recognized #=GR annotations . . . . . . . . . . . . . . . . . . . . . . . . . . 211