# cblaster
[![Python package](https://github.com/gamcil/cblaster/actions/workflows/pythonapp.yml/badge.svg)](https://github.com/gamcil/cblaster/actions/workflows/pythonapp.yml)
[![codecov](https://codecov.io/gh/gamcil/cblaster/branch/master/graph/badge.svg?token=O61R3ORNDT)](https://codecov.io/gh/gamcil/cblaster)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://badge.fury.io/py/cblaster.svg)](https://badge.fury.io/py/cblaster)
[![Documentation Status](https://readthedocs.org/projects/cblaster/badge/?version=latest)](https://cblaster.readthedocs.io/en/latest/?badge=latest)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3660769.svg)](https://doi.org/10.5281/zenodo.3660769)
## Outline
`cblaster` is a tool for finding clusters of co-located homologous sequences
in BLAST searches.
<img src="docs/source/_static/workflow.png" alt="cblaster search workflow" width=600>
Given a collection of protein sequences, `cblaster` can search sequence databases
remotely (via NCBI BLAST API) or locally (via `DIAMOND`). Search results are parsed
and filtered based on user thresholds for identity, coverage and e-value. The genomic
coordinates of remaining hits are obtained from the NCBI's Identical Protein
Group (IPG) database (or a local database in local searches). Finally,
`cblaster` scans for instances of collocation and generates visualisations:
<img src="docs/source/_static/results.png" alt="cblaster search results" width=700>
## Installation
`cblaster` can be installed via pip:
```bash
$ pip3 install cblaster --user
```
or by cloning the repository and installing:
```bash
$ git clone https://github.com/gamcil/cblaster.git
...
$ cd cblaster/
$ pip3 install .
```
Additionally, we provide executables for Windows and Mac which can be downloaded [from here](https://github.com/gamcil/cblaster/releases/latest).
Once installed, make sure you configure cblaster with your email address:
```bash
$ cblaster config --email name@domain.com
```
You can find example search files, along with generated output, in the [examples folder
of the repository](https://github.com/gamcil/cblaster/tree/master/example).
## Dependencies
`cblaster` is tested on Python 3.6, and its only external Python dependency is
the `requests` module (used for interaction with NCBI APIs).
If you want to perform local searches, you should have `diamond` installed and available
on your system $PATH.
`cblaster` will throw an error if a local search is started but it cannot find
`diamond` or `diamond-aligner` (alias when installed via apt) on the system.
## Usage
`cblaster` accepts FASTA files and collections of valid NCBI sequence identifiers
(GIs, accession numbers) as input.
A remote search can be performed as simply as:
```bash
$ cblaster search --query_file query.fasta
```
For example, to remotely search the
[burnettramic acids gene cluster, *bua*](https://pubs.acs.org/doi/10.1021/acs.orglett.8b04042)
, against the NCBI's nr database:
```bash
$ cblaster search -qf bua.fasta
[12:14:17] INFO - Starting cblaster in remote mode
[12:14:17] INFO - Launching new search
[12:14:19] INFO - Request Identifier (RID): WHS0UGYJ015
[12:14:19] INFO - Request Time Of Execution (RTOE): 25s
[12:14:44] INFO - Polling NCBI for completion status
[12:14:44] INFO - Checking search status...
[12:15:44] INFO - Checking search status...
[12:16:44] INFO - Checking search status...
[12:16:46] INFO - Search has completed successfully!
[12:16:46] INFO - Retrieving results for search WHS0UGYJ015
[12:16:51] INFO - Parsing results...
[12:16:51] INFO - Found 3944 hits meeting score thresholds
[12:16:51] INFO - Fetching genomic context of hits
[12:17:14] INFO - Searching for clustered hits across 705 organisms
[12:17:14] INFO - Writing summary to <stdout>
Aspergillus mulundensis DSM 5745
================================
NW_020797889.1
--------------
Query Subject Identity Coverage E-value Bitscore Start End Strand
QBE85641.1 XP_026607259.1 75.56 99.5918 0 742 1717881 1719409 -
QBE85642.1 XP_026607260.1 89.916 100 0 667 1719650 1720797 +
QBE85643.1 XP_026607261.1 89.532 83.1169 0 832 1721494 1722934 +
QBE85644.1 XP_026607262.1 64.829 98.9218 6.51e-157 455 1723252 1724467 -
QBE85645.1 XP_026607263.1 69.97 100 6.93e-157 449 1725113 1726277 -
QBE85646.1 XP_026607264.1 82.759 96.8447 0 670 1726892 1728302 +
QBE85647.1 XP_026607265.1 72.674 99.2048 0 764 1729735 1731338 +
QBE85648.1 XP_026607266.1 56.098 98.324 4.24e-64 205 1731701 1732402 -
QBE85649.1 XP_026607267.1 79.623 99.8746 0 6573 1732820 1745289 +
...
```
A query sequence absence/presence matrix can be generated using the `--binary` argument:
```
Organism Scaffold Start End QBE85641.1 QBE85642.1 QBE85643.1 QBE85644.1 QBE85645.1 QBE85646.1 QBE85647.1 QBE85648.1 QBE85649.1
Aspergillus mulundensis DSM 5745 NW_020797889.1 1717881 1745289 1 1 1 1 1 1 1 1 1
Aspergillus versicolor CBS 583.65 KV878126.1 3162095 3187090 1 1 1 0 1 1 1 1 1
Pseudomassariella vexata CBS 129021 MCFJ01000004.1 1606356 1628483 1 1 1 0 0 1 0 1 1
Hypoxylon sp. CO27-5 KZ112517.1 92119 112957 1 1 1 0 0 0 1 0 1
Hypoxylon sp. EC38 KZ111255.1 514739 535366 1 1 1 0 0 0 1 0 1
Epicoccum nigrum ICMP 19927 KZ107839.1 2116719 2142558 1 1 0 0 0 1 1 0 1
Aureobasidium subglaciale EXF-2481 NW_013566983.1 700476 718693 1 1 0 0 0 1 1 0 0
Aureobasidium pullulans EXF-6514 QZBF01000009.1 18721 34295 1 1 0 0 0 1 1 0 0
Aureobasidium pullulans EXF-5628 QZBI01000512.1 329 13401 1 0 0 0 0 1 1 0 0
```
`cblaster` can also generate fully interactive visualisations of the binary
table. To view an example, click [here](https://cblaster.readthedocs.io/en/latest/_static/example.html).
For further usage examples and API documentation, please refer to the
[documentation](https://cblaster.readthedocs.io/en/latest/).
## Citation
If you found this tool useful, please cite:
```
Gilchrist, C.L.M., Booth, T.J., Chooi, Y.-H., 2020. cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters. bioRxiv 2020.11.08.370601. https://doi.org/10.1101/2020.11.08.370601
```
`cblaster` makes use of the following tools:
```
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Acland, A. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, 7–17 (2014).
```
没有合适的资源?快使用搜索试试~ 我知道了~
cblaster:从 BLAST 搜索中查找聚类命中
共179个文件
py:43个
txt:31个
rst:15个
需积分: 32 3 下载量 44 浏览量
2021-08-04
01:42:24
上传
评论
收藏 15.72MB ZIP 举报
温馨提示
爆破者 大纲 cblaster是一种用于在 BLAST 搜索中查找同位同源序列簇的工具。 给定一组蛋白质序列, cblaster可以远程(通过 NCBI BLAST API)或本地(通过DIAMOND )搜索序列数据库。 根据用户对身份、覆盖范围和电子值的阈值来解析和过滤搜索结果。 剩余命中的基因组坐标是从 NCBI 的相同蛋白质组 (IPG) 数据库(或本地搜索中的本地数据库)获得的。 最后, cblaster扫描搭配实例并生成可视化: 安装 cblaster可以通过 pip 安装: $ pip3 install cblaster --user 或者通过克隆存储库并安装: $ git clone https://github.com/gamcil/cblaster.git ... $ cd cblaster/ $ pip3 install . 此外,我们提供适用于 Window
资源详情
资源评论
资源推荐
收起资源包目录
cblaster:从 BLAST 搜索中查找聚类命中 (179个子文件)
make.bat 799B
.buildinfo 230B
CITATION.cff 777B
.coveragerc 225B
basic.css 12KB
alabaster.css 11KB
pygments.css 5KB
index.css 2KB
custom.css 42B
diamond 13.82MB
test_database.dmnd 38KB
classes.doctree 88KB
remote.doctree 60KB
context.doctree 52KB
database.doctree 42KB
genbank.doctree 28KB
local.doctree 27KB
main.doctree 25KB
helpers.doctree 25KB
usage.doctree 22KB
index.doctree 5KB
test_query.embl 116KB
database_creation_file.embl 111KB
diamond.exe 5.65MB
test_query.fa 19KB
test.faa 1KB
makedb_database_gbk_embl.fasta 37KB
test_database.fasta 37KB
bua.fasta 7KB
test_gff_v_maris.fna 6.44MB
database_creation_file.gb 112KB
test_query.gb 111KB
extract_clusters_cluster2_local_genbank.gbk 105KB
bua.gbk 50KB
extract_clusters_cluster11_remote_bigscape.gbk 24KB
sample.gbk 10KB
test_gff_v_maris.gff 3.01MB
.gitignore 2KB
example.html 412KB
bua_results.html 400KB
start_response.html 32KB
retrieve_response.html 22KB
check_response.html 13KB
cblaster.html 2KB
gne.html 1KB
objects.inv 1KB
jquery-3.4.1.js 274KB
d3.min.js 242KB
jquery.js 86KB
underscore-1.3.1.js 34KB
cblaster.js 22KB
searchtools.js 16KB
underscore.js 12KB
searchindex.js 12KB
language_data.js 11KB
doctools.js 9KB
gne.js 9KB
documentation_options.js 355B
bua_session.json 17.69MB
test_session_remote_fa_windows.json 1.7MB
test_session_remote_fa_linux.json 1.7MB
test_session_combi_local_fa.json 37KB
test_session_local_embl_windows.json 34KB
test_session_local_gbk_windows.json 34KB
test_session_local_embl_linux.json 34KB
test_session_local_gbk_linux.json 34KB
test_session2_file.json 5KB
test_session1_file.json 5KB
test_session_hmm_fa.json 279B
LICENSE 1KB
Makefile 638B
README.md 8KB
README.md 973B
environment.pickle 34KB
gui.png 3.21MB
workflow.png 286KB
results.png 241KB
gne.png 40KB
download.png 33KB
download.png 24KB
search.png 12KB
search.png 12KB
file.png 286B
minus.png 90B
plus.png 90B
parsers.py 29KB
classes.py 21KB
test_classes.py 19KB
context.py 18KB
main.py 17KB
extract_clusters.py 16KB
integration_checks.py 15KB
main.py 13KB
search.py 12KB
remote.py 12KB
plot.py 10KB
intermediate_genes.py 9KB
genome_parsers.py 9KB
formatters.py 8KB
hmm_search.py 8KB
共 179 条
- 1
- 2
盗心魔幻
- 粉丝: 21
- 资源: 4478
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0