# News
- 2021.03.15: We developed [deepsignal2](https://github.com/PengNi/deepsignal2). Compared to deepsignal, deepsignal2 has much smaller DNN model in size, and slightly better performance in 5mCpG detection of human.
# DeepSignal
[![Python](https://img.shields.io/pypi/pyversions/deepsignal)](https://www.python.org/)
[![PyPI version](https://img.shields.io/pypi/v/deepsignal)](https://pypi.org/project/deepsignal/)
[![GitHub License](https://img.shields.io/github/license/bioinfomaticsCSU/deepsignal)](https://github.com/bioinfomaticsCSU/deepsignal/blob/master/LICENSE)
[![PyPI-Downloads](https://pepy.tech/badge/deepsignal)](https://pepy.tech/project/deepsignal)
[![PyPI-Downloads/m](https://pepy.tech/badge/deepsignal/month)](https://pepy.tech/project/deepsignal/month)
## A deep-learning method for detecting DNA methylation state from Oxford Nanopore sequencing reads.
DeepSignal constructs a BiLSTM+Inception structure to detect DNA methylation state from Nanopore reads. It is
built with **Tensorflow** and **Python 3**.
## Contents
- [Installation](#Installation)
- [Trained models](#Trained-models)
- [Example data](#Example-data)
- [Quick start](#Quick-start)
- [Usage](#Usage)
## Installation
deepsignal is built on Python3. [tombo](https://github.com/nanoporetech/tombo) is required to re-squiggle the raw signals from nanopore reads before running deepsignal.
- Prerequisites:\
[Python 3.*](https://www.python.org/)\
[tensorflow](https://www.tensorflow.org/) (1.8.0<=tensorflow<=1.13.1)\
[tombo](https://github.com/nanoporetech/tombo)
- Dependencies:\
[numpy](http://www.numpy.org/)\
[h5py](https://github.com/h5py/h5py)\
[statsmodels](https://github.com/statsmodels/statsmodels/)\
[scikit-learn](https://scikit-learn.org/stable/)
#### 1. Create an environment
We highly recommend using a virtual environment for the installation of deepsignal and its dependencies. A virtual environment can be created and (de)activated as follows by using [conda](https://conda.io/docs/):
```bash
# create
conda create -n deepsignalenv python=3.7
# activate
conda activate deepsignalenv
# deactivate
conda deactivate
```
The virtual environment can also be created by using [virtualenv](https://github.com/pypa/virtualenv/).
#### 2. Install deepsignal
- After creating and activating the environment, download and install deepsignal (**latest version**) from github:
```bash
git clone https://github.com/bioinfomaticsCSU/deepsignal.git
cd deepsignal
python setup.py install
```
**or** install deepsignal using *pip*:
```bash
pip install deepsignal
```
- [tombo](https://github.com/nanoporetech/tombo) is required to be installed in the same environment:
```bash
# install using conda
conda install -c bioconda ont-tombo
# or install using pip
pip install ont-tombo
```
- install [tensorflow](https://www.tensorflow.org/) (version: 1.8.0<=tensorflow<=1.13.1) in the same environment:
```bash
# install using conda
conda install -c anaconda tensorflow==1.13.1
# or install using pip
pip install 'tensorflow==1.13.1'
```
If a GPU-machine is used, install the gpu version of tensorflow. The cpu version is not required:
```bash
# install using conda
conda install -c anaconda tensorflow-gpu==1.13.1
# or install using pip
pip install 'tensorflow-gpu==1.13.1'
```
## Trained models
The models we trained can be downloaded from [google drive](https://drive.google.com/open?id=1zkK8Q1gyfviWWnXUBMcIwEDw3SocJg7P).
Currently we have trained the following models:
* _model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7+.tar.gz_: A CpG model trained using HX1 R9.4 1D reads (for **deepsignal>=0.1.7**).
* _model.CpG.R9.4_1D.human_hx1.bn17.sn360.tar.gz_: A CpG model trained using HX1 R9.4 1D reads (for **deepsignal<=0.1.6**).
* _model.GATC.R9_2D.tem.puc19.bn17.sn360.tar.gz_: A G*A*TC model trained using pUC19 R9 2D template reads (for **deepsignal<=0.1.6**).
## Example data
The example data can be downloaded from [google drive](https://drive.google.com/open?id=1zkK8Q1gyfviWWnXUBMcIwEDw3SocJg7P).
* _fast5s.sample.tar.gz_: The data contain ~4000 yeast R9.4 1D reads each with called events (basecalled by Albacore), along with a genome reference.
## Quick start
To call modifications, the raw fast5 files should be basecalled ([Guppy or Albacore](https://nanoporetech.com/community)) and then be re-squiggled by [tombo](https://github.com/nanoporetech/tombo). At last, modifications of specified motifs can be called by deepsignal. The following are commands to call 5mC in CG contexts from the example data:
```bash
# 1. guppy basecall
guppy_basecaller -i fast5s.al -r -s fast5s.al.guppy --config dna_r9.4.1_450bps_hac_prom.cfg
cat fast5s.al.guppy/*.fastq > fast5s.al.guppy.fastq
# 2. tombo resquiggle
tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5s.al --fastq-filenames fast5s.al.guppy.fastq --sequencing-summary-filenames fast5s.al.guppy/sequencing_summary.txt --basecall-group Basecall_1D_000 --basecall-subgroup BaseCalled_template --overwrite --processes 10
tombo resquiggle fast5s.al GCF_000146045.2_R64_genomic.fna --processes 10 --corrected-group RawGenomeCorrected_001 --basecall-group Basecall_1D_000 --overwrite
# 3. deepsignal call_mods
deepsignal call_mods --input_path fast5s.al/ --model_path model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7+/bn_17.sn_360.epoch_9.ckpt --result_file fast5s.al.CpG.call_mods.tsv --corrected_group RawGenomeCorrected_001 --nproc 10 --is_gpu no
python /path/to/deepsignal/scripts/call_modification_frequency.py --input_path fast5s.al.CpG.call_mods.tsv --result_file fast5s.al.CpG.call_mods.frequency.tsv
```
## Usage
#### 1. Basecall and re-squiggle
Before run deepsignal, the raw reads should be basecalled ([Guppy or Albacore](https://nanoporetech.com/community)) and then be processed by the *re-squiggle* module of [tombo](https://github.com/nanoporetech/tombo).
Note:
- If the fast5 files are in multi-read FAST5 format, please use _multi_to_single_fast5_ command from the [ont_fast5_api package](https://github.com/nanoporetech/ont_fast5_api) to convert the fast5 files first (Ref to [issue #173](https://github.com/nanoporetech/tombo/issues/173) in [tombo](https://github.com/nanoporetech/tombo)).
```bash
multi_to_single_fast5 -i $multi_read_fast5_dir -s $single_read_fast5_dir -t 30 --recursive
```
- If the basecall results are saved as fastq, run the [*tombo proprecess annotate_raw_with_fastqs*](https://nanoporetech.github.io/tombo/resquiggle.html) command before *re-squiggle*.
For the example data:
```bash
# 1. basecall
guppy_basecaller -i fast5s.al -r -s fast5s.al.guppy --config dna_r9.4.1_450bps_hac_prom.cfg
# 2. proprecess fast5 if basecall results are saved in fastq format
cat fast5s.al.guppy/*.fastq > fast5s.al.guppy.fastq
tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5s.al --fastq-filenames fast5s.al.guppy.fastq --sequencing-summary-filenames fast5s.al.guppy/sequencing_summary.txt --basecall-group Basecall_1D_000 --basecall-subgroup BaseCalled_template --overwrite --processes 10
# 3. resquiggle, cmd: tombo resquiggle $fast5_dir $reference_fa
tombo resquiggle fast5s.al GCF_000146045.2_R64_genomic.fna --processes 10 --corrected-group RawGenomeCorrected_001 --basecall-group Basecall_1D_000 --overwrite
```
#### 2. extract features
Features of targeted sites can be extracted for training or testing.
For the example data (deepsignal extracts 17-mer-seq and 360-signal features of each CpG motif in reads by default. Note that the value of *--corrected_group* must be the same as that of *--corrected-group* in tombo.):
```bash
deepsignal extract --fast5_dir fast5s.al/ --write_path fast5s.al.CpG.signal_features.17bases.rawsignals_360.tsv --corrected_group RawGenomeCorrected_001 --nproc 10
```
The extracted_features file is a tab-delimited text file in the following form
使用纳米孔测序读数的信号水平特征检测甲基化_Python_下载.zip
版权申诉
26 浏览量
2023-04-27
10:53:46
上传
评论
收藏 77KB ZIP 举报
快撑死的鱼
- 粉丝: 1w+
- 资源: 9154
最新资源
- 本地镜像源配置脚本本地镜像源配置脚本
- 与spoon配套使用的jdk
- 基于单片机的便携式粮食水分测试仪的研究
- 测绘基坑支护工程变形监测报告1.pdf
- 基于PHP+swoole实现的微信机器人,依赖vbot和微信网页版的功能,帮助管理微信群/聊天/踢人等+源码+开发文档+运行教程
- com.xunmeng.pinduoduo_Release_cd290ca9_ARM64.apk
- 2788727d-25a0-41b2-b6b4-265d193edb95.doc
- 基于AVR单片机的伺服电机系统研究
- Lab-Electronic Craft Practicum-2-Simulation of a Single Tube Com
- 贪吃蛇基于TypeScript
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈