# metaDMG: Estimating ancient damage in (meta)genomic DNA rapidly
---
#### Work in progress. Please contact christianmichelsen@gmail.com for further information.
---
You can now see a preview of the [interactive dashboard](https://metadmg.herokuapp.com).
---
## Installation:
```
conda env create --file environment.yaml
```
or, if you have mamba installed (faster)
```
mamba env create --file environment.yaml
```
or, by using pip:
```
pip install "metaDMG[all]"
```
or, with Poetry:
```
poetry add "metaDMG[all]"
```
---
## Workflow:
Create `config.yaml` file:
```console
$ metaDMG config ./raw_data/example.bam \
--names raw_data/names.dmp.gz \
--nodes raw_data/nodes.dmp.gz \
--acc2tax raw_data/combined_taxid_accssionNO_20200425.gz
```
Run actual program:
```console
$ metaDMG compute
```
See the results in the dashboard:
```console
$ metaDMG dashboard
```
---
## Usage:
metaDMG works by first creating a config file using the `config` command. This file contains all of the information related to metaDMG such that you only have to type this once. The config file is saved in the current directory as `config.yaml` and can subsequently be edited in any text editor of your like.
After the config has been created, we run the actual program using the `compute` command. This can take a while depending on the number (and size) of the files.
Finally the results are saved in `{output-dir}/results` directory (`data/results` by default). These can be viewed with the interactive dashboard using the `dashboard` command.
---
# `config`
#### CLI options:
`metaDMG config` takes a single argument, `samples`, and a bunch of additional options. The `samples` refer to a single or multiple alignment-files (or a directory containing them), all with the file extensions: `.bam`, `.sam`, and `.sam.gz`.
The options are listed below:
- Input files:
- `--names`: Path to the (NCBI) `names.dmp.gz`. Mandatory for LCA.
- `--nodes`: Path to the (NCBI) `nodes.dmp.gz`. Mandator for LCA.
- `--acc2tax`: Path to the (NCBI) `acc2tax.gz`. Mandatory for LCA.
- LCA parameters:
- `--min-similarity-score`: Normalised edit distance (read to reference similarity) minimum. Number between 0-1. Default: 0.95.
- `--max-similarity-score`: Normalised edit distance (read to reference similarity) maximum. Number between 0-1 Default: 1.0.
- `--min-edit-dist`: Minimum edit distance (read to reference similarity). Number between 0-10. Default: 0.
- `--max-edit-dist`: Maximum edit distance (read to reference similarity). Number between 0-10. Default: 10.
- `--min-mapping-quality`: Minimum mapping quality. Default: 0.
- `--max-position`: Maximum position in the sequence to include. Default is (+/-) 15 (forward/reverse).
- `--weight-type`: Method for calculating weights. Default is 1.
- `--fix-ncbi`: Fix the (ncbi) database. Disable (0) if using a custom database. Default is 1.
- `--lca-rank`: The LCA rank used in ngsLCA. Can be either `family`, `genus`, `species` or `""` (everything). Default is `""`.
- Non-LCA parameters:
- `--damage-mode`: `[lca|local|global]`. `lca` is the recommended and automatic setting. If using `local`, it means that damage patterns will be calculated for each chr/scaffold contig. If using `global`, it means one global estimate. Note that when using `[local|global]` all of the parameters in the LCA section above won't matter, except `--max-position`.
- General parameters:
- `--forward-only`: Only fit the forward strand.
- `--output-dir`: Path where the generated output files and folders are stored. Default: `./data/`.
- `--parallel-samples`: The maximum number of cores to use. Default is 1.
- `--cores-per-sample`: Number of cores pr. sample. Do not change unless you know what you are doing.
- `--sample-prefix`: Prefix for the sample names.
- `--sample-suffix`: Suffix for the sample names.
- `--config-path`: The name of the generated config file. Default: `config.yaml`.
- Boolean flags (does not take any values, only the flag). Default is false.
- `--bayesian`: Include a fully Bayesian model (probably better, but also _a lot_ slower, about a factor of 100).
- `--long-name`: Use the full, long, name for the sample.
```console
$ metaDMG config ./raw_data/example.bam \
--names raw_data/names.dmp.gz \
--nodes raw_data/nodes.dmp.gz \
--acc2tax raw_data/combined_taxid_accssionNO_20200425.gz \
--parallel-samples 4
```
metaDMG is pretty versatile regarding its input argument and also accepts multiple alignment files:
```console
$ metaDMG config ./raw_data/*.bam [...]
```
or even an entire directory containing alignment files (`.bam`, `.sam`, and `.sam.gz`):
```console
$ metaDMG config ./raw_data/ [...]
```
To run metaDMG in non-LCA mode, an example could be:
```
$ metaDMG config ./raw_data/example.bam --damage-mode local --max-position 15 --bayesian
```
---
# `compute`
The `metaDMG compute` command takes an optional config-file as argument
(defaults to `config.yaml` if not specified).
#### CLI options:
- `--force`: Forced computation (even though the files already exists). Bool flag.
#### Example:
```console
$ metaDMG compute
```
```console
$ metaDMG compute non-default-config.yaml --force
```
---
# `dashboard`
You can now see a preview of the [interactive dashboard](https://metadmg.herokuapp.com).
The `metaDMG dashboard` command takes first an optional config-file as argument
(defaults to `config.yaml` if not specified) followed by the following CLI options:
#### CLI options:
- `--port`: The port to be used for the dashboard. Default is `8050`.
- `--host`: The dashboard host adress. Default is `0.0.0.0`.
- `--debug`: Boolean flag that allows for debugging the dashboard. Only for internal usage.
#### Example:
```console
$ metaDMG dashboard
```
```console
$ metaDMG dashboard non-default-config.yaml --port 8050 --host 0.0.0.0
```
---
# Results
The column names in the results and their explanation:
- General parameters:
- `tax_id`: The tax ID. int64.
- `tax_name`: The tax name. string.
- `tax_rank`: The tax rank. string.
- `sample`: The name of the original sample. string.
- `N_reads`: The number of reads. int64.
- `N_alignments`: The number of alignments. int64.
- Fit related parameters:
- `lambda_LR`: The likelihood ratio between the null model and the ancient damage model. This can be interpreted as the fit certainty, where higher values means higher certainty. float32.
- `lambda_LR_P`: The likelihood ratio expressed as a probability. float32.
- `lambda_LR_z`: The likelihood ratio expressed as number of ![](https://latex.codecogs.com/svg.image?%5Csigma). float32.
- `D_max`: The estimated damage. This can be interpreted as the amount of damage in the specific taxa. float32.
- `q`: The damage decay rate. float32.
- `A`: The background independent damage. float32.
- `c`: The background. float32.
- `phi`: The concentration for a beta binomial distribution (parametrised by ![](https://latex.codecogs.com/svg.image?%5Cmu) and ![](https://latex.codecogs.com/svg.image?%5Cphi)). float32.
- `rho_Ac`: The correlation between `A` and `c`. High values of this are often a sign of a bad fit. float32.
- `valid`: Wether or not the fit is valid (defined by [iminuit](https://iminuit.readthedocs.io/en/stable/)). bool.
- `asymmetry`: An estimate of the asymmetry of the forward and reverse fits. See below for more information. float32.
- `XXX_std`: the uncertainty (standard deviation) of the variable `XXX` for `D_max`, `A`, `q`, `c`, and `phi`.
- `forward__XXX`: The same description as above for variable `XXX`, but only for the forward read.
- `reverse__XXX`: The same description as above for variable `XXX`, but only for the reverse read.
- Read related parameters
- `mean_L`: The mean read length of all the individual, unique reads that map to the specific taxa. float64.
- `std_L`: The standard deviation of the above. float64.
- `mean_GC`: The me
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
共40个文件
py:31个
mplstyle:4个
pkg-info:1个
资源分类:Python库 所属语言:Python 资源全名:metaDMG-0.21.6.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
metaDMG-0.21.6.tar.gz (40个子文件)
metaDMG-0.21.6
PKG-INFO 15KB
pyproject.toml 4KB
LICENSE 1KB
src
metaDMG
loggers
log_config.yaml 1KB
loggers.py 2KB
__init__.py 0B
utils.py 8KB
errors.py 467B
__main__.py 50B
filters.py 2KB
__init__.py 82B
viz
app.py 27KB
results.py 16KB
content.py 20KB
viz_utils.py 19KB
dashboard.py 556B
__init__.py 50B
_taxonomy.py 1KB
figures.py 16KB
mpl_styles
scatter.mplstyle 333B
ieee.mplstyle 393B
__init__.py 0B
no-latex.mplstyle 40B
science.mplstyle 1KB
__version__.py 160B
fit
results.py 4KB
fits.py 14KB
mismatch_to_mapDamage.py 2KB
__init__.py 168B
serial.py 13KB
workflow.py 1KB
mismatches.py 6KB
frequentist.py 16KB
bayesian.py 9KB
fit_utils.py 10KB
cli
cli.py 13KB
cli_utils.py 2KB
__init__.py 0B
setup.py 16KB
README.md 14KB
共 40 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功