# vcflib
### A C++ library for parsing and manipulating VCF files.
#### author: Erik Garrison <erik.garrison@bc.edu>
#### license: MIT
[![Gitter](https://badges.gitter.im/Join Chat.svg)](https://gitter.im/ekg/vcflib?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Build Status](https://travis-ci.org/vcflib/vcflib.svg?branch=master)](https://travis-ci.org/vcflib/vcflib)
---
## overview
The [Variant Call Format (VCF)](http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41)
is a flat-file, tab-delimited textual format
intended to concisely describe reference-indexed variations between individuals.
VCF provides a common interchange format for the description of variation in individuals and populations of samples,
and has become the _defacto_ standard reporting format for a wide array of genomic variant detectors.
vcflib provides methods to manipulate and interpret sequence variation as it can be described by VCF.
It is both:
* an API for parsing and operating on records of genomic variation as it can be described by the VCF format,
* and a collection of command-line utilities for executing complex manipulations on VCF files.
The API itself provides a quick and extremely permissive method to read and write VCF files.
Extensions and applications of the library provided in the included utilities (*.cpp) comprise the vast bulk of the library's utility for most users.
## download and install
1. Under the repository name, click to copy the clone URL for the repository. ![](https://help.github.com/assets/images/help/repository/clone-repo-clone-url-button.png)
2. Go to the location where you want the cloned directory to be made: `cd <PathWhereIWantToCloneVcflib>`
3. Type `git clone --recursive`, and then paste the URL you copied in Step 1.
4. Enter the cloned directory and type `make` to compile the programs. If you want to use threading type `make openmp` instead of `make`. Only a few VCFLIB tools are threaded.
5. Once make is finished the executables are ready in the folder `<PathWhereIWantToCloneVcflib>/vcflib/bin/`. Set this path as an environment variable in the .bashrc file to access executables form everywhere on your proile OR call the executables from the path where they are.
## usage
vcflib provides a variety of functions for VCF manipulation:
### comparison
* Generate **haplotype-aware intersections** ([vcfintersect](#vcfintersect) -i), **unions** (vcfintersect -u), and **complements** (vcfintersect -v -i).
* **Overlay-merge** multiple VCF files together, using provided order as precedence ([vcfoverlay](#vcfoverlay)).
* **Combine** multiple VCF files together, handling samples when alternate allele descriptions are identical ([vcfcombine](#vcfcombine)).
* **Validate** the integrity and identity of the VCF by verifying that the VCF record's REF matches a given reference file ([vcfcheck](#vcfcheck)).
### format conversion
* Convert a VCF file into a per-allele or per-genotype **tab-separated (.tsv)** file ([vcf2tsv](#vcf2tsv)).
* Store a VCF file in an **SQLite3** database (vcf2sqlite.py).
* Make a **BED file** from the intervals in a VCF file (vcf2bed.py).
### filtering and subsetting
* **Filter** variants and genotypes using arbitrary expressions based on values in the INFO and sample fields ([vcffilter](#vcffilter)).
* **Randomly sample** a subset of records from a VCF file, given a rate ([vcfrandomsample](#vcfrandomsample)).
* **Select variants** of a certain type (vcfsnps, vcfbiallelic, vcfindels, vcfcomplex, etc.)
### annotation
* **Annotate** one VCF file with fields from the INFO column of another, based on position ([vcfaddinfo](#vcfaddinfo), [vcfintersect](#vcfintersect)).
* Incorporate annotations or targets provided by a *BED* file ([vcfannotate](#vcfannotate), [vcfintersect](#vcfintersect)).
* Examine **genotype correspondence** between two VCF files by annotating samples in one file with genotypes from another ([vcfannotategenotypes](#vcfannotategenotypes)).
* Annotate variants with the **distance** to the nearest variant ([vcfdistance](#vcfdistance)).
* Count the number of alternate alleles represented in samples at each variant record ([vcfaltcount](#vcfaltcount)).
* **Subset INFO fields** to decrease file size and processing time ([vcfkeepinfo](#vcfkeepinfo)).
* Lighten up VCF files by keeping only a **subset of per-sample information** ([vcfkeepgeno](#vcfkeepgeno)).
* **Numerically index** alleles in a VCF file ([vcfindex](#vcfindex)).
### samples
* Quickly obtain the **list of samples** in a given VCF file ([vcfsamplenames](#vcfsamplenames)).
* **Remove samples** from a VCF file ([vcfkeepsamples](#vcfkeepsamples), [vcfremovesamples](#vcfremovesamples)).
### ordering
* **Sort variants** by genome coordinate ([vcfstreamsort](#vcfstreamsort)).
* **Remove duplicate** variants in vcfstreamsort'ed files according to their REF and ALT fields ([vcfuniq](#vcfuniq)).
### variant representation
* **Break multiallelic** records into multiple records ([vcfbreakmulti](#vcfbreakmulti)), retaining allele-specific INFO fields.
* **Combine overlapping biallelic** records into a single record ([vcfcreatemulti](#vcfcreatemulti)).
* **Decompose complex variants** into a canonical SNP and indel representation ([vcfallelicprimitives](#vcfallelicprimitives)), generating phased genotypes for available samples.
* **Reconstitute complex variants** provided a phased VCF with samples ([vcfgeno2haplo](#vcfgeno2haplo)).
* **Left-align indel and complex variants** ([vcfleftalign](#vcfleftalign)).
### genotype manipulation
* **Set genotypes** in a VCF file provided genotype likelihoods in the GL field ([vcfglxgt](#vcfglxgt)).
* Establish putative **somatic variants** using reported differences between germline and somatic samples ([vcfsamplediff](#vcfsamplediff)).
* Remove samples for which the reported genotype (GT) and observation counts disagree (AO, RO) ([vcfremoveaberrantgenotypes](#vcfremoveaberrantgenotypes)).
### interpretation and classification of variants
* Obtain aggregate **statistics** about VCF files ([vcfstats](#vcfstats)).
* Print the **receiver-operating characteristic (ROC)** of one VCF given a truth set ([vcfroc](#vcfroc)).
* Annotate VCF records with the **Shannon entropy** of flanking sequence ([vcfentropy](#vcfentropy)).
* Calculate the heterozygosity rate ([vcfhetcount](#vcfhetcount)).
* Generate potential **primers** from VCF records ([vcfprimers](#vcfprimers)), to check for genome uniqueness.
* Convert the numerical represenation of genotypes provided by the GT field to a **human-readable genotype format** ([vcfgenotypes](#vcfgenotypes)).
* Observe how different alignment parameters, including context and entropy-dependent ones, influence **variant classification and interpretation** ([vcfremap](#vcfremap)).
* **Classify variants** by annotations in the INFO field using a self-organizing map ([vcfsom](#vcfsom)); **re-estimate their quality** given known variants.
A number of "helper" perl and python scripts (e.g. vcf2bed.py, vcfbiallelic) further extend functionality.
In practice, users are encouraged to drive the utilities in the library in a streaming fashion, using pipes, to fully utilize resources on multi-core systems during interactive work. Piping provides a convenient method to interface with other libraries (vcf-tools, BedTools, GATK, htslib, bcftools, freebayes) which interface via VCF files, allowing the composition of an immense variety of processing functions.
## development
See src/vcfecho.cpp for basic usage. src/Variant.h and src/Variant.cpp describe methods available in the API.
vcflib is incorporated into several projects, such as [freebayes](https://github.com/ekg/freebayes), which may provide a point of reference for prospective developers.
Additionally, developers should be aware of that vcflib contains submodules (git repositories) comprising its dependencie
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
freebayes生信软件 (1827个子文件)
bwa.1 26KB
tabix.1 6KB
tabix.1 6KB
htsfile.1 3KB
htsfile.1 3KB
test-simple.bash.1 3KB
libhts.so.2 2.82MB
faidx.5 5KB
faidx.5 5KB
vcf.5 3KB
vcf.5 3KB
sam.5 3KB
sam.5 3KB
libseqlib.a 7.46MB
libhts.a 4.51MB
libhts.a 3.6MB
libbwa.a 1013KB
libfml.a 901KB
configure.ac 13KB
configure.ac 7KB
configure.ac 6KB
configure.ac 3KB
configure.ac 2KB
configure.ac 2KB
Makefile.am 10KB
Makefile.am 7KB
Makefile.am 669B
Makefile.am 329B
Makefile.am 242B
Makefile.am 205B
readme.asc 1KB
test-simple.bash.asc 1KB
ttmathuint_x86_64_msvc.asm 10KB
main.aux 10KB
plos_latex_template.aux 4KB
1:883884-887618.bam.bai 288B
NA12878.chr22.tiny.hla.bam.bai 96B
NA12878.chr22.tiny.bam.bai 96B
1read.bam.bai 96B
NA12878.chr22.tiny.hla.bam 280KB
NA12878.chr22.tiny.bam 280KB
1:883884-887618.bam 22KB
1read.bam 636B
bamleftalign 6.39MB
Makefile.bamtools 9KB
test-simple.bash 3KB
bash-tap 8KB
bash-tap-bootstrap 865B
bash-tap-mock 3KB
main.bbl 7KB
plos_latex_template.bbl 49B
bed_file.bed 2KB
bed2region 175B
bgzip 1.97MB
bgziptabix 234B
bgziptabix 234B
references.bib 9KB
plos_latex_template.blg 2KB
main.blg 1KB
plos2015.bst 38KB
genome_research.bst 31KB
bwa 1.23MB
vcf.c 129KB
vcf.c 118KB
cram_io.c 110KB
cram_io.c 110KB
cram_encode.c 87KB
cram_encode.c 85KB
cram_decode.c 84KB
cram_decode.c 81KB
sam.c 78KB
hts.c 72KB
sam.c 68KB
hts.c 62KB
bgzf.c 56KB
cram_codecs.c 50KB
cram_codecs.c 47KB
bwamem.c 45KB
synced_bcf_reader.c 41KB
bgzf.c 41KB
thread_pool.c 39KB
synced_bcf_reader.c 37KB
vcfutils.c 32KB
sam_header.c 29KB
sam_header.c 29KB
bwape.c 29KB
ssw.c 27KB
vcfutils.c 26KB
hfile.c 26KB
ksw.c 26KB
ssw.c 25KB
hfile_libcurl.c 25KB
fsom.c 25KB
bwtsw2_aux.c 24KB
test_bgzf.c 24KB
rANS_static.c 22KB
rANS_static.c 21KB
hfile.c 21KB
bcf_sr_sort.c 21KB
tabix.c 19KB
共 1827 条
- 1
- 2
- 3
- 4
- 5
- 6
- 19
资源评论
llh_1178
- 粉丝: 256
- 资源: 10
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功