Python库|medaka-1.0.2.tar.gz资源-CSDN文库

版权申诉

72 浏览量 2022-04-10 19:15:55 上传评论收藏 38.43MB GZ 举报

共304个文件

c：72个

h：60个

sam：43个

资源推荐

资源详情

资源评论

收起资源包目录

Python库 | medaka-1.0.2.tar.gz （304个子文件）

tabix.1 6KB

bgzip.1 5KB

htsfile.1 3KB

faidx.5 6KB

vcf.5 3KB

sam.5 3KB

configure.ac 15KB

range.bam.bai 360B

range.bam 13KB

bed_file.bed 2KB

vcf.c 130KB

cram_io.c 129KB

cram_decode.c 118KB

cram_encode.c 109KB

sam.c 92KB

hts.c 90KB

bgzf.c 62KB

cram_codecs.c 56KB

hfile_libcurl.c 42KB

thread_pool.c 39KB

synced_bcf_reader.c 37KB

sam_header.c 34KB

vcfutils.c 32KB

hfile.c 29KB

faidx.c 26KB

rANS_static.c 26KB

test_bgzf.c 24KB

sam.c 22KB

bcf_sr_sort.c 21KB

tabix.c 19KB

knetfile.c 19KB

cram_index.c 18KB

medaka_counts.c 17KB

hts_endian.c 17KB

mFILE.c 16KB

bgzip.c 14KB

open_trace_file.c 13KB

cram_external.c 13KB

test-vcf-api.c 13KB

hfile_s3.c 13KB

medaka_trimbam.c 12KB

probaln.c 12KB

test_view.c 12KB

tbx.c 11KB

realn.c 11KB

hfile.c 10KB

md5.c 10KB

regidx.c 10KB

textutils.c 9KB

kstring.c 9KB

kfunc.c 9KB

test-bcf-translate.c 8KB

multipart.c 8KB

htsfile.c 8KB

errmod.c 6KB

kstring.c 6KB

cram_stats.c 6KB

cram_samtools.c 5KB

plugin.c 5KB

vcf_sweep.c 5KB

test_realn.c 5KB

test-bcf-sr.c 5KB

pooled_alloc.c 5KB

hfile_gcs.c 4KB

test-regidx.c 4KB

string_alloc.c 4KB

test-vcf-sweep.c 4KB

medaka_common.c 3KB

thrash_threads6.c 3KB

hfile_net.c 3KB

rand.c 3KB

files.c 2KB

fieldarith.c 2KB

thrash_threads4.c 2KB

thrash_threads5.c 2KB

thrash_threads3.c 2KB

hts_os.c 2KB

thrash_threads1.c 2KB

medaka_bamiter.c 2KB

thrash_threads2.c 2KB

fastrle.c 1KB

medaka_pytrimbam.c 1KB

setup.cfg 38B

configure 177KB

range.cram.crai 94B

xx#large_aux_java.cram 22KB

range.cram 11KB

ce#5b_java.cram 7KB

auxf#values_java.cram 5KB

ce.fa 1.01MB

realn02.fa 4KB

realn01.fa 719B

faidx.fa 289B

xx.fa 51B

md.fa 45B

auxf.fa 29B

c1.fa 15B

c2.fa 14B

ce.fa.fai 230B

xx.fa.fai 29B

共 304 条

![Oxford Nanopore Technologies logo](https://github.com/nanoporetech/medaka/raw/master/images/ONT_logo_590x106.png) Medaka ====== [![Build Status](https://travis-ci.org/nanoporetech/medaka.svg?branch=master)](https://travis-ci.org/nanoporetech/medaka) [![](https://img.shields.io/pypi/v/medaka.svg)](https://pypi.org/project/medaka/) [![](https://img.shields.io/pypi/wheel/medaka.svg)](https://pypi.org/project/medaka/) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](https://anaconda.org/bioconda/medaka) [![](https://img.shields.io/conda/pn/bioconda/medaka.svg)](https://anaconda.org/bioconda/medaka) `medaka` is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly. It outperforms graph-based methods operating on basecalled data, and can be competitive with state-of-the-art signal-based methods, whilst being much faster. © 2018 Oxford Nanopore Technologies Ltd. Features -------- * Requires only basecalled data. (`.fasta` or `.fastq`) * Improved accurary over graph-based methods (e.g. Racon). * 50X faster than Nanopolish (and can run on GPUs). * Methylation aggregation from Guppy `.fast5` files. * Benchmarks are provided [here](https://nanoporetech.github.io/medaka/benchmarks.html). * Includes extras for implementing and training bespoke correction networks. * Works on Linux and MacOS. * Open source (Mozilla Public License 2.0). Tools to enable the creation of draft assemblies can be found in a sister project [pomoxis](https://github.com/nanoporetech/pomoxis). Documentation can be found at https://nanoporetech.github.io/medaka/. Installation ------------ Medaka can be installed in one of several ways. **Installation with conda** Perhaps the simplest way to start using medaka on both Linux and MacOS is through conda; medaka is available via the [bioconda](https://anaconda.org/bioconda/medaka) channel: conda create -n medaka -c conda-forge -c bioconda medaka **Installation with pip** For those who prefer python's native pacakage manager, medaka is also available on pypi and can be installed using pip: pip install medaka On Linux platforms this will install a precompiled binary, on MacOS (and other) platforms this will fetch and compile a source distribution. We recommend using medaka within a virtual environment, viz.: virtualenv medaka --python=python3 --prompt "(medaka) " . medaka/bin/activate pip install medaka Using this method requires the user to provide several binaries: * [samtools](https://github.com/samtools/samtools), * [minimap2](https://github.com/lh3/minimap2), * [tabix](https://github.com/samtools/htslib), and * [bgzip](https://github.com/samtools/htslib) and place these within the `PATH`. `samtools/bgzip/tabix` version 1.9 and `minimap2` version 2.17 are recommended as these are those used in development of medaka. **Installation from source** Medaka can be installed from its source quite easily on most systems. Before installing medaka it may be required to install some prerequisite libraries, best installed by a package manager. On Ubuntu theses are: > bzip2 g++ zlib1g-dev libbz2-dev liblzma-dev libffi-dev libncurses5-dev > libcurl4-gnutls-dev libssl-dev curl make cmake wget python3-all-dev > python-virtualenv In addition it is required to install and set up git LFS before cloning the repository. A Makefile is provided to fetch, compile and install all direct dependencies into a python virtual environment. To set-up the environment run: # Note: certain files are stored in git-lfs, https://git-lfs.github.com/, # which must therefore be installed first. git clone https://github.com/nanoporetech/medaka.git cd medaka make install . ./venv/bin/activate Using this method both `samtools` and `minimap2` are built from source and need not be provided by the user. **Using a GPU** All installation methods will allow medaka to be used with CPU resource only. To enable the use of GPU resource it is necessary to install the `tensorflow-gpu` package. Unfortunately depending on your python version it may be necessary to modify the requirements of the `medaka` package for it to run without complaining. Using the source code from github a working GPU-powered `medaka` can be configured with: # Note: certain files are stored in git-lfs, https://git-lfs.github.com/, # which must therefore be installed first. git clone https://github.com/nanoporetech/medaka.git cd medaka sed -i 's/tensorflow/tensorflow-gpu/' requirements.txt make install However, note that The `tensorflow-gpu` GPU package is compiled against specific versions of the NVIDIA CUDA and cuDNN libraries; users are directed to the [tensorflow installation](https://www.tensorflow.org/install/gpu) pages for further information. cuDNN can be obtained from the [cuDNN Archive](https://developer.nvidia.com/rdp/cudnn-archive), whilst CUDA from the [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive). Depending on your GPU, `medaka` may show out of memory errors when running. To avoid these the inference batch size can be reduced from the default value by setting the `-b` option when running `medaka_consensus`. A value `-b 100` is suitable for 11Gb GPUs. For users with RTX series GPUs it may be required to additionally set an environment variable to have `medaka` run without failure: export TF_FORCE_GPU_ALLOW_GROWTH=true In this situation a further reduction in batch size may be required. Usage ----- `medaka` can be run using its default settings through the `medaka_consensus` program. An assembly in `.fasta` format and basecalls in `.fasta` or `.fastq` formats are required. The program uses both `samtools` and `minimap2`. If medaka has been installed using the from-source method these will be present within the medaka environment, otherwise they will need to be provided by the user. source ${MEDAKA} # i.e. medaka/venv/bin/activate NPROC=$(nproc) BASECALLS=basecalls.fa DRAFT=draft_assm/assm_final.fa OUTDIR=medaka_consensus medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t ${NPROC} -m r941_min_high_g303 The variables `BASECALLS`, `DRAFT`, and `OUTDIR` in the above should be set appropriately. For the selection of the model (`-m r941_min_high_g303` in the example above) see the Model section following. When `medaka_consensus` has finished running, the consensus will be saved to `${OUTDIR}/consensus.fasta`. Models ------ For best results it is important to specify the correct model, `-m` in the above, according to the basecaller used. Allowed values can be found by running `medaka tools list\_models`. Medaka models are named to indicate i) the pore type, ii) the sequencing device (MinION or PromethION), iii) the basecaller variant, and iv) the basecaller version, with the format: {pore}_{device}_{caller variant}_{caller version} For example the model named `r941_min_fast_g303` should be used with data from MinION (or GridION) R9.4.1 flowcells using the fast Guppy basecaller version 3.0.3. By contrast the model `r941_prom_hac_g303` should be used with PromethION data and the high accuracy basecaller (termed "hac" in Guppy configuration files). Where a version of Guppy has been used without an exactly corresponding medaka model, the medaka model with the highest version equal to or less than the guppy version should be selected. Methylation Calling ------------------- `medaka` includes a basic workflow for aggregating Guppy basecalling results for Dcm, Dam, and CpG methylation. The workflow is currently very preliminary and subject to change and improvement. Aggregating the information from Guppy outputs is a two stage process, first the basecalling results are extracted `.

评论收藏

内容反馈

版权申诉