![travis](https://travis-ci.org/jeremymcrae/denovonear.svg?branch=master)
### Denovonear
This code assesses whether de novo single-nucleotide variants are closer
together within the coding sequence of a gene than expected by chance. We use
local-sequence based mutation rates to account for differential mutability of
regions. The default rates are per-trinucleotide based see [Nature Genetics
46:944–950](http://www.nature.com/ng/journal/v46/n9/full/ng.3050.html), but
you can use your own rates, or even longer sequence contexts, such as 5-mers or
7-mers.
### Install
```sh
pip install denovonear
```
### Usage
Analyse *de novo* mutations with the CLI tool:
```sh
denovonear cluster \
--in data/example.grch38.dnms.txt \
--gencode data/example.grch38.gtf \
--fasta data/example.grch38.fa \
--out output.txt
```
explanation of options:
- `--in`: path to tab-separated table of de novo mutations. See example table below for columns, or `example.grch38.dnms.txt` in data folder.
- `--gencode`: path to GENCODE annotations in
[GTF format](https://www.ensembl.org/info/website/upload/gff.html) for
transcripts and exons e.g.
[example release](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz). Can be gzipped, or uncompressed.
- `--fasta`: path to genome fasta, matching genome build of gencode file
If the --gencode or --fasta options are skipped (e.g. `denovonear cluster --in
INFILE --out OUTFILE`), gene annotations will be retrieved via an ensembl web
service. For that, you might need to specify `--genome-build grch38` to ensure
the gene coordinates match your de novo mutation coordinates.
* `--rates PATH_TO_RATES`
* `--cache-folder PATH_TO_CACHE_DIR`
* `--genome-build "grch37" or "grch38" (default=grch37)`
The optional rates file is a table separated file with three columns: 'from',
'to', and 'mu_snp'. The 'from' column contains DNA sequence (where the length
is an odd number) with the base to change at the central nucleotide. The 'to'
column contains the sequence with the central base modified. The 'mu_snp' column
contains the probability of the change (as per site per generation).
The cache folder defaults to making a folder named "cache" within the working
directory. The genome build indicates which genome build the coordinates of the
de novo variants are based on, and defaults to GRCh37.
#### Example de novo table
gene_name | chr | pos | consequence | snp_or_indel
--- | --- | --- | --- | ---
OR4F5 | chr1 | 69500 | missense_variant | DENOVO-SNP
OR4F5 | chr1 | 69450 | missense_variant | DENOVO-SNP
### Python usage
```py
from denovonear.gencode import Gencode
from denovonear.cluster_test import cluster_de_novos
gencode = Gencode('./data/example.grch38.gtf', './data/example.grch38.fa')
symbol = 'OR4F5'
de_novos = {'missense': [69500, 69450, 69400], 'nonsense': []}
p_values = cluster_de_novos(symbol, de_novos, gencode[symbol], iterations=1000000)
```
Pull out site-specific rates by creating Transcript objects, then get the
rates by consequence at each site
```py
from denovonear.rate_limiter import RateLimiter
from denovonear.load_mutation_rates import load_mutation_rates
from denovonear.load_gene import construct_gene_object
from denovonear.site_specific_rates import SiteRates
# extract transcript coordinates and sequence from Ensembl
async with RateLimiter(per_second=15) as ensembl:
transcript = await construct_gene_object(ensembl, 'ENST00000346085')
mut_rates = load_mutation_rates()
rates = SiteRates(transcript, mut_rates)
# rates are stored by consequence, but you can iterate through to find all
# possible sites in and around the CDS:
for cq in ['missense', 'nonsense', 'splice_lof', 'synonymous']:
for site in rates[cq]:
site['pos'] = transcript.get_position_on_chrom(site['pos'], site['offset'])
# or if you just want the summed rate
rates['missense'].get_summed_rate()
```
### Identify transcripts containing de novo events
You can identify transcripts containing de novos events with the
`identify_transcripts.py` script. This either identifies all transcripts for a
gene with one or more de novo events, or identifies the minimal set of
transcripts to contain all de novos (where transcripts are prioritised on the
basis of number of de novo events, and length of coding sequence). Transcripts
can be identified with:
```sh
denovonear transcripts \
--de-novos data/example_de_novos.txt \
--out output.txt \
--all-transcripts
```
Other options are:
* `--minimise-transcripts` in place of `--all-transcripts`, to find the minimal
set of transcripts
* `--genome-build "grch37" or "grch38" (default=grch37)`
### Gene or transcript based mutation rates
You can generate mutation rates for either the union of alternative transcripts
for a gene, or for a specific Ensembl transcript ID with the
`construct_mutation_rates.py` script. Lof and missense mutation rates can be
generated with:
```sh
denovonear rates \
--genes data/example_gene_ids.txt \
--out output.txt
```
The tab-separated output file will contain one row per gene/transcript, with
each line containing a transcript ID or gene symbol, a log10 transformed
missense mutation rate, a log10 transformed nonsense mutation rate, and a log10
transformed synonymous mutation rate.
没有合适的资源?快使用搜索试试~ 我知道了~
PyPI 官网下载 | denovonear-0.9.6.tar.gz
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 50 浏览量
2022-01-10
12:34:33
上传
评论
收藏 313KB GZ 举报
温馨提示
共70个文件
py:30个
txt:10个
cpp:10个
资源来自pypi官网。 资源全名:denovonear-0.9.6.tar.gz
资源推荐
资源详情
资源评论
收起资源包目录
denovonear-0.9.6.tar.gz (70个子文件)
denovonear-0.9.6
setup.cfg 38B
README.md 5KB
tests
test_load_gene.py 11KB
test_cluster_test.py 2KB
test_ensembl_requester.py 8KB
test_simulate_p_value.py 4KB
test_geometric_mean.py 3KB
test_gencode.py 24KB
test_site_rates.py 19KB
__init__.py 0B
test_simulations.py 3KB
test_log_transform.py 2KB
test_frameshift_rate.py 2KB
test_sequence_methods.py 12KB
test_weighted_choice.py 6KB
test_ensembl_cache.py 7KB
test_transcript.py 16KB
denovonear
ensembl_cache.py 4KB
weights.cpp 246KB
frameshift_rate.py 2KB
site_specific_rates.cpp 254KB
transcript.pxd 3KB
cluster_test.py 4KB
load_de_novos.py 2KB
load_mutation_rates.py 875B
rate_limiter_retries.py 2KB
weights.pyx 5KB
rate_limiter.py 2KB
weights.pxd 2KB
__init__.py 97B
site_specific_rates.pyx 3KB
__main__.py 11KB
gencode.cpp 619KB
ensembl_requester.py 5KB
simulate.py 2KB
log_transform_rates.py 497B
transcript.cpp 649KB
data
rates.txt 5KB
load_gene.py 8KB
gencode.pyx 12KB
transcript.pyx 15KB
PKG-INFO 7KB
src
weighted_choice.cpp 3KB
site_rates.h 2KB
gtf.h 799B
gzstream
gzstream.C 5KB
gzstream.h 4KB
gencode.h 830B
site_rates.cpp 6KB
gtf.cpp 3KB
simulate.h 737B
tx.h 3KB
gencode.cpp 4KB
weighted_choice.h 879B
tx.cpp 26KB
simulate.cpp 7KB
scripts
run_batch.py 6KB
LICENSE.txt 1KB
MANIFEST.in 371B
pyproject.toml 70B
setup.py 3KB
data
example_gene_ids.txt 108B
example_de_novos.txt 319B
example.grch38.dnms.txt 178B
denovonear.egg-info
dependency_links.txt 1B
PKG-INFO 7KB
SOURCES.txt 2KB
top_level.txt 11B
entry_points.txt 57B
requires.txt 56B
共 70 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功