# JCVI utility libraries
[![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.31631.svg)](https://doi.org/10.5281/zenodo.594205)
[![Latest PyPI
version](https://img.shields.io/pypi/v/jcvi.svg)](https://pypi.python.org/pypi/jcvi)
[![Github Actions](https://github.com/tanghaibao/jcvi/workflows/build/badge.svg)](https://github.com/tanghaibao/jcvi/actions)
Collection of Python libraries to parse bioinformatics files, or perform
computation related to assembly, annotation, and comparative genomics.
| | |
| ------- | ---------------------------------------------------------------- |
| Authors | Haibao Tang ([tanghaibao](http://github.com/tanghaibao)) |
| | Vivek Krishnakumar ([vivekkrish](https://github.com/vivekkrish)) |
| | Jingping Li ([Jingping](https://github.com/Jingping)) |
| | Xingtan Zhang ([tangerzhang](https://github.com/tangerzhang)) |
| Email | <tanghaibao@gmail.com> |
| License | [BSD](http://creativecommons.org/licenses/BSD/) |
## Citations
- If you use the MCscan pipeline for synteny inference, please cite:
_Tang et al. (2008) Synteny and Collinearity in Plant Genomes. [Science](https://science.sciencemag.org/content/320/5875/486)_
![MCSCAN example](https://www.dropbox.com/s/9vl3ys3ndvimg4c/grape-peach-cacao.png?raw=1)
- If you use the ALLMAPS pipeline for genome scaffolding, please cite:
_Tang et al. (2015) ALLMAPS: robust scaffold ordering based on multiple maps. [Genome Biology](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0573-1)_
![ALLMAPS animation](https://www.dropbox.com/s/jfs8xavcxix37se/ALLMAPS.gif?raw=1)
- For other uses, please cite the package directly:
_Tang et al. (2015). jcvi: JCVI utility libraries. Zenodo. [10.5281/zenodo.31631](http://dx.doi.org/10.5281/zenodo.31631)_
![GRABSEEDS example](https://www.dropbox.com/s/yu9ehsi6sqifuaa/bluredges.png?raw=1)
## Contents
Following modules are available as generic Bioinformatics handling
methods.
- <kbd>algorithms</kbd>
- Linear programming solver with SCIP and GLPK.
- Supermap: find set of non-overlapping anchors in BLAST or NUCMER output.
- Longest or heaviest increasing subsequence.
- Matrix operations.
- <kbd>apps</kbd>
- GenBank entrez accession, Phytozome, Ensembl and SRA downloader.
- Calculate (non)synonymous substitution rate between gene pairs.
- Basic phylogenetic tree construction using PHYLIP, PhyML, or RAxML, and viualization.
- Wrapper for BLAST+, LASTZ, LAST, BWA, BOWTIE2, CLC, CDHIT, CAP3, etc.
- <kbd>formats</kbd>
Currently supports `.ace` format (phrap, cap3, etc.), `.agp`
(goldenpath), `.bed` format, `.blast` output, `.btab` format,
`.coords` format (`nucmer` output), `.fasta` format, `.fastq`
format, `.fpc` format, `.gff` format, `obo` format (ontology),
`.psl` format (UCSC blat, GMAP, etc.), `.posmap` format (Celera
assembler output), `.sam` format (read mapping), `.contig`
format (TIGR assembly format), etc.
- <kbd>graphics</kbd>
- BLAST or synteny dot plot.
- Histogram using R and ASCII art.
- Paint regions on set of chromosomes.
- Macro-synteny and micro-synteny plots.
- <kbd>utils</kbd>
- Grouper can be used as disjoint set data structure.
- range contains common range operations, like overlap
and chaining.
- Miscellaneous cookbook recipes, iterators decorators,
table utilities.
Then there are modules that contain domain-specific methods.
- <kbd>assembly</kbd>
- K-mer histogram analysis.
- Preparation and validation of tiling path for clone-based assemblies.
- Scaffolding through ALLMAPS, optical map and genetic map.
- Pre-assembly and post-assembly QC procedures.
- <kbd>annotation</kbd>
- Training of _ab initio_ gene predictors.
- Calculate gene, exon and intron statistics.
- Wrapper for PASA and EVM.
- Launch multiple MAKER processes.
- <kbd>compara</kbd>
- C-score based BLAST filter.
- Synteny scan (de-novo) and lift over (find nearby anchors).
- Ancestral genome reconstruction using Sankoff's and PAR method.
- Ortholog and tandem gene duplicates finder.
## Applications
Please visit [wiki](https://github.com/tanghaibao/jcvi/wiki) for
full-fledged applications.
## Dependencies
Following are a list of third-party python packages that are used by
some routines in the library. These dependencies are _not_ mandatory
since they are only used by a few modules.
- [Biopython](http://www.biopython.org)
- [numpy](http://numpy.scipy.org)
- [matplotlib](http://matplotlib.org/)
There are other Python modules here and there in various scripts. The
best way is to install them via `pip install` when you see
`ImportError`.
## Installation
The easiest way is to install it via PyPI:
```console
pip install jcvi
```
To install the development version:
```console
pip install git+git://github.com/tanghaibao/jcvi.git
```
Alternatively, if you want to install manually:
```console
cd ~/code # or any directory of your choice
git clone git://github.com/tanghaibao/jcvi.git
pip install -e .
```
In addition, a few module might ask for locations of external programs,
if the extended cannot be found in your `PATH`. The external programs
that are often used are:
- [Kent tools](http://hgdownload.cse.ucsc.edu/admin/jksrc.zip)
- [BEDTOOLS](http://code.google.com/p/bedtools/)
- [EMBOSS](http://emboss.sourceforge.net/)
Most of the scripts in this package contains multiple actions. To use
the `fasta` example:
```console
Usage:
python -m jcvi.formats.fasta ACTION
Available ACTIONs:
clean | Remove irregular chars in FASTA seqs
diff | Check if two fasta records contain same information
extract | Given fasta file and seq id, retrieve the sequence in fasta format
fastq | Combine fasta and qual to create fastq file
filter | Filter the records by size
format | Trim accession id to the first space or switch id based on 2-column mapping file
fromtab | Convert 2-column sequence file to FASTA format
gaps | Print out a list of gap sizes within sequences
gc | Plot G+C content distribution
identical | Given 2 fasta files, find all exactly identical records
ids | Generate a list of headers
info | Run `sequence_info` on fasta files
ispcr | Reformat paired primers into isPcr query format
join | Concatenate a list of seqs and add gaps in between
longestorf | Find longest orf for CDS fasta
pair | Sort paired reads to .pairs, rest to .fragments
pairinplace | Starting from fragment.fasta, find if adjacent records can form pairs
pool | Pool a bunch of fastafiles together and add prefix
qual | Generate dummy .qual file based on FASTA file
random | Randomly take some records
sequin | Generate a gapped fasta file for sequin submission
simulate | Simulate random fasta file for testing
some | Include or exclude a list of records (also performs on .qual file if available)
sort | Sort the records by IDs, sizes, etc.
summary | Report the real no of bases and N's in fasta files
tidy | Normalize gap sizes and remove small components in fasta
translate | Translate CDS to proteins
trim | Given a cross_match screened fasta, trim the sequence
trimsplit | Split sequences at lower-cased letters
uniq | Remove records that are the same
```
Then you need to use one action, you can just do:
```console
python -m jcvi.formats.fasta extract
```
This will tell you the options and arguments it expects.
**Feel free to check out other scripts in the package, it is not just
for FASTA.**
没有合适的资源?快使用搜索试试~ 我知道了~
PyPI 官网下载 | jcvi-1.1.8.tar.gz
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 121 浏览量
2022-01-12
15:09:17
上传
评论
收藏 800KB GZ 举报
温馨提示
共213个文件
py:187个
txt:7个
ttf:4个
资源来自pypi官网。 资源全名:jcvi-1.1.8.tar.gz
资源推荐
资源详情
资源评论
收起资源包目录
PyPI 官网下载 | jcvi-1.1.8.tar.gz (213个子文件)
setup.cfg 166B
TREDs.meta.csv 27KB
adapters.fasta 911B
chrY.hg38.unique_ccn.gc 8KB
MANIFEST.in 155B
instance.json 1000B
LICENSE 1KB
blosum80.mat 3KB
README.md 8KB
not-zip-safe 1B
PKG-INFO 10KB
PKG-INFO 10KB
gff.py 110KB
fasta.py 74KB
bed.py 70KB
versioneer.py 67KB
str.py 66KB
base.py 65KB
allmaps.py 65KB
agp.py 62KB
synteny.py 60KB
hic.py 56KB
str.py 48KB
reformat.py 42KB
blast.py 40KB
kmer.py 39KB
cnv.py 35KB
phylo.py 35KB
ca.py 34KB
goldenpath.py 34KB
base.py 34KB
ks.py 33KB
uclust.py 32KB
patch.py 32KB
landscape.py 31KB
fastq.py 29KB
sam.py 28KB
catalog.py 26KB
fractionation.py 26KB
vcf.py 25KB
lpsolve.py 24KB
napus.py 24KB
aws.py 23KB
grabseeds.py 23KB
preprocess.py 23KB
synfind.py 23KB
age.py 22KB
misc.py 22KB
chromosome.py 22KB
glyph.py 21KB
tgbs.py 21KB
ahrd.py 21KB
fetch.py 21KB
base.py 20KB
tree.py 20KB
geneticmap.py 20KB
gbsubmit.py 19KB
synteny.py 19KB
pasa.py 18KB
allpaths.py 18KB
sugarcane.py 16KB
syntenypath.py 16KB
align.py 16KB
allmaps.py 16KB
grid.py 16KB
align.py 15KB
postprocess.py 15KB
genbank.py 15KB
assembly.py 15KB
coords.py 14KB
maker.py 14KB
dotplot.py 14KB
range.py 14KB
ies.py 14KB
graph.py 14KB
automaton.py 13KB
karyotype.py 13KB
pineapple.py 12KB
tsp.py 12KB
cbook.py 12KB
stats.py 12KB
opticalmap.py 12KB
biomart.py 12KB
vanilla.py 12KB
unitig.py 12KB
qc.py 12KB
impute.py 12KB
snp.py 11KB
psl.py 10KB
sspace.py 10KB
reconstruct.py 10KB
orderedcollections.py 10KB
gaps.py 10KB
blastplot.py 10KB
blastfilter.py 10KB
histogram.py 9KB
delly.py 9KB
db.py 9KB
pad.py 9KB
synfind.py 9KB
共 213 条
- 1
- 2
- 3
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功