![Oxford Nanopore Technologies logo](https://github.com/nanoporetech/medaka/raw/master/images/ONT_logo_590x106.png)
Medaka
======
[![Build Status](https://travis-ci.org/nanoporetech/medaka.svg?branch=master)](https://travis-ci.org/nanoporetech/medaka)
[![](https://img.shields.io/pypi/v/medaka.svg)](https://pypi.org/project/medaka/)
[![](https://img.shields.io/pypi/wheel/medaka.svg)](https://pypi.org/project/medaka/)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](https://anaconda.org/bioconda/medaka)
[![](https://img.shields.io/conda/pn/bioconda/medaka.svg)](https://anaconda.org/bioconda/medaka)
`medaka` is a tool to create a consensus sequence from nanopore sequencing data.
This task is performed using neural networks applied from a pileup of individual
sequencing reads against a draft assembly. It outperforms graph-based methods
operating on basecalled data, and can be competitive with state-of-the-art
signal-based methods, whilst being much faster.
© 2018 Oxford Nanopore Technologies Ltd.
Features
--------
* Requires only basecalled data. (`.fasta` or `.fastq`)
* Improved accurary over graph-based methods (e.g. Racon).
* 50X faster than Nanopolish (and can run on GPUs).
* Methylation aggregation from Guppy `.fast5` files.
* Benchmarks are provided [here](https://nanoporetech.github.io/medaka/benchmarks.html).
* Includes extras for implementing and training bespoke correction
networks.
* Works on Linux and MacOS.
* Open source (Mozilla Public License 2.0).
Tools to enable the creation of draft assemblies can be found in a sister
project [pomoxis](https://github.com/nanoporetech/pomoxis).
Documentation can be found at https://nanoporetech.github.io/medaka/.
Installation
------------
Medaka can be installed in one of several ways.
**Installation with conda**
Perhaps the simplest way to start using medaka on both Linux and MacOS is
through conda; medaka is available via the
[bioconda](https://anaconda.org/bioconda/medaka) channel:
conda create -n medaka -c conda-forge -c bioconda medaka
**Installation with pip**
For those who prefer python's native pacakage manager, medaka is also available
on pypi and can be installed using pip:
pip install medaka
On Linux platforms this will install a precompiled binary, on MacOS (and other)
platforms this will fetch and compile a source distribution.
We recommend using medaka within a virtual environment, viz.:
virtualenv medaka --python=python3 --prompt "(medaka) "
. medaka/bin/activate
pip install medaka
Using this method requires the user to provide several binaries:
* [samtools](https://github.com/samtools/samtools),
* [minimap2](https://github.com/lh3/minimap2),
* [tabix](https://github.com/samtools/htslib), and
* [bgzip](https://github.com/samtools/htslib)
and place these within the `PATH`. `samtools/bgzip/tabix` version 1.9 and
`minimap2` version 2.17 are recommended as these are those used in development
of medaka.
**Installation from source**
Medaka can be installed from its source quite easily on most systems.
Before installing medaka it may be required to install some
prerequisite libraries, best installed by a package manager. On Ubuntu
theses are:
> bzip2 g++ zlib1g-dev libbz2-dev liblzma-dev libffi-dev libncurses5-dev
> libcurl4-gnutls-dev libssl-dev curl make cmake wget python3-all-dev
> python-virtualenv
In addition it is required to install and set up git LFS before cloning
the repository.
A Makefile is provided to fetch, compile and install all direct dependencies
into a python virtual environment. To set-up the environment run:
# Note: certain files are stored in git-lfs, https://git-lfs.github.com/,
# which must therefore be installed first.
git clone https://github.com/nanoporetech/medaka.git
cd medaka
make install
. ./venv/bin/activate
Using this method both `samtools` and `minimap2` are built from source and need
not be provided by the user.
**Using a GPU**
All installation methods will allow medaka to be used with CPU resource only.
To enable the use of GPU resource it is necessary to install the
`tensorflow-gpu` package. Unfortunately depending on your python version it
may be necessary to modify the requirements of the `medaka` package for it
to run without complaining. Using the source code from github a working
GPU-powered `medaka` can be configured with:
# Note: certain files are stored in git-lfs, https://git-lfs.github.com/,
# which must therefore be installed first.
git clone https://github.com/nanoporetech/medaka.git
cd medaka
sed -i 's/tensorflow/tensorflow-gpu/' requirements.txt
make install
However, note that The `tensorflow-gpu` GPU package is compiled against
specific versions of the NVIDIA CUDA and cuDNN libraries; users are directed to the
[tensorflow installation](https://www.tensorflow.org/install/gpu) pages
for further information. cuDNN can be obtained from the
[cuDNN Archive](https://developer.nvidia.com/rdp/cudnn-archive), whilst CUDA
from the [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive).
Depending on your GPU, `medaka` may show out of memory errors when running.
To avoid these the inference batch size can be reduced from the default
value by setting the `-b` option when running `medaka_consensus`. A value
`-b 100` is suitable for 11Gb GPUs.
For users with RTX series GPUs it may be required to additionally set an
environment variable to have `medaka` run without failure:
export TF_FORCE_GPU_ALLOW_GROWTH=true
In this situation a further reduction in batch size may be required.
Usage
-----
`medaka` can be run using its default settings through the `medaka_consensus`
program. An assembly in `.fasta` format and basecalls in `.fasta` or `.fastq`
formats are required. The program uses both `samtools` and `minimap2`. If
medaka has been installed using the from-source method these will be present
within the medaka environment, otherwise they will need to be provided by
the user.
source ${MEDAKA} # i.e. medaka/venv/bin/activate
NPROC=$(nproc)
BASECALLS=basecalls.fa
DRAFT=draft_assm/assm_final.fa
OUTDIR=medaka_consensus
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t ${NPROC} -m r941_min_high_g303
The variables `BASECALLS`, `DRAFT`, and `OUTDIR` in the above should be set
appropriately. For the selection of the model (`-m r941_min_high_g303` in the
example above) see the Model section following.
When `medaka_consensus` has finished running, the consensus will be saved to
`${OUTDIR}/consensus.fasta`.
Models
------
For best results it is important to specify the correct model, `-m` in the
above, according to the basecaller used. Allowed values can be found by
running `medaka tools list\_models`.
Medaka models are named to indicate i) the pore type, ii) the sequencing
device (MinION or PromethION), iii) the basecaller variant, and iv) the
basecaller version, with the format:
{pore}_{device}_{caller variant}_{caller version}
For example the model named `r941_min_fast_g303` should be used with data from
MinION (or GridION) R9.4.1 flowcells using the fast Guppy basecaller version
3.0.3. By contrast the model `r941_prom_hac_g303` should be used with PromethION
data and the high accuracy basecaller (termed "hac" in Guppy configuration
files). Where a version of Guppy has been used without an exactly corresponding
medaka model, the medaka model with the highest version equal to or less than
the guppy version should be selected.
Methylation Calling
-------------------
`medaka` includes a basic workflow for aggregating Guppy basecalling results
for Dcm, Dam, and CpG methylation. The workflow is currently very preliminary
and subject to change and improvement.
Aggregating the information from Guppy outputs is a two stage process, first
the basecalling results are extracted `.
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
共304个文件
c:72个
h:60个
sam:43个
资源分类:Python库 所属语言:Python 资源全名:medaka-1.0.2.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
Python库 | medaka-1.0.2.tar.gz (304个子文件)
tabix.1 6KB
bgzip.1 5KB
htsfile.1 3KB
faidx.5 6KB
vcf.5 3KB
sam.5 3KB
configure.ac 15KB
range.bam.bai 360B
range.bam 13KB
bed_file.bed 2KB
vcf.c 130KB
cram_io.c 129KB
cram_decode.c 118KB
cram_encode.c 109KB
sam.c 92KB
hts.c 90KB
bgzf.c 62KB
cram_codecs.c 56KB
hfile_libcurl.c 42KB
thread_pool.c 39KB
synced_bcf_reader.c 37KB
sam_header.c 34KB
vcfutils.c 32KB
hfile.c 29KB
faidx.c 26KB
rANS_static.c 26KB
test_bgzf.c 24KB
sam.c 22KB
bcf_sr_sort.c 21KB
tabix.c 19KB
knetfile.c 19KB
cram_index.c 18KB
medaka_counts.c 17KB
hts_endian.c 17KB
mFILE.c 16KB
bgzip.c 14KB
open_trace_file.c 13KB
cram_external.c 13KB
test-vcf-api.c 13KB
hfile_s3.c 13KB
medaka_trimbam.c 12KB
probaln.c 12KB
test_view.c 12KB
tbx.c 11KB
realn.c 11KB
hfile.c 10KB
md5.c 10KB
regidx.c 10KB
textutils.c 9KB
kstring.c 9KB
kfunc.c 9KB
test-bcf-translate.c 8KB
multipart.c 8KB
htsfile.c 8KB
errmod.c 6KB
kstring.c 6KB
cram_stats.c 6KB
cram_samtools.c 5KB
plugin.c 5KB
vcf_sweep.c 5KB
test_realn.c 5KB
test-bcf-sr.c 5KB
pooled_alloc.c 5KB
hfile_gcs.c 4KB
test-regidx.c 4KB
string_alloc.c 4KB
test-vcf-sweep.c 4KB
medaka_common.c 3KB
thrash_threads6.c 3KB
hfile_net.c 3KB
rand.c 3KB
files.c 2KB
fieldarith.c 2KB
thrash_threads4.c 2KB
thrash_threads5.c 2KB
thrash_threads3.c 2KB
hts_os.c 2KB
thrash_threads1.c 2KB
medaka_bamiter.c 2KB
thrash_threads2.c 2KB
fastrle.c 1KB
medaka_pytrimbam.c 1KB
setup.cfg 38B
configure 177KB
range.cram.crai 94B
xx#large_aux_java.cram 22KB
range.cram 11KB
ce#5b_java.cram 7KB
auxf#values_java.cram 5KB
ce.fa 1.01MB
realn02.fa 4KB
realn01.fa 719B
faidx.fa 289B
xx.fa 51B
md.fa 45B
auxf.fa 29B
c1.fa 15B
c2.fa 14B
ce.fa.fai 230B
xx.fa.fai 29B
共 304 条
- 1
- 2
- 3
- 4
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功