Python库|metaDMG-0.21.6.tar.gz资源-CSDN文库

版权申诉

150 浏览量 2022-04-10 20:13:01 上传评论收藏 59KB GZ 举报

共40个文件

py：31个

mplstyle：4个

pkg-info：1个

资源推荐

资源详情

资源评论

收起资源包目录

metaDMG-0.21.6.tar.gz （40个子文件）

metaDMG-0.21.6

PKG-INFO 15KB

pyproject.toml 4KB

LICENSE 1KB

src

metaDMG

loggers

log_config.yaml 1KB

loggers.py 2KB

__init__.py 0B

utils.py 8KB

errors.py 467B

__main__.py 50B

filters.py 2KB

__init__.py 82B

viz

app.py 27KB

results.py 16KB

content.py 20KB

viz_utils.py 19KB

dashboard.py 556B

__init__.py 50B

_taxonomy.py 1KB

figures.py 16KB

mpl_styles

scatter.mplstyle 333B

ieee.mplstyle 393B

__init__.py 0B

no-latex.mplstyle 40B

science.mplstyle 1KB

__version__.py 160B

fit

results.py 4KB

fits.py 14KB

mismatch_to_mapDamage.py 2KB

__init__.py 168B

serial.py 13KB

workflow.py 1KB

mismatches.py 6KB

frequentist.py 16KB

bayesian.py 9KB

fit_utils.py 10KB

cli

cli.py 13KB

cli_utils.py 2KB

__init__.py 0B

setup.py 16KB

README.md 14KB

# metaDMG: Estimating ancient damage in (meta)genomic DNA rapidly --- #### Work in progress. Please contact christianmichelsen@gmail.com for further information. --- You can now see a preview of the [interactive dashboard](https://metadmg.herokuapp.com). --- ## Installation: ``` conda env create --file environment.yaml ``` or, if you have mamba installed (faster) ``` mamba env create --file environment.yaml ``` or, by using pip: ``` pip install "metaDMG[all]" ``` or, with Poetry: ``` poetry add "metaDMG[all]" ``` --- ## Workflow: Create `config.yaml` file: ```console $ metaDMG config ./raw_data/example.bam \ --names raw_data/names.dmp.gz \ --nodes raw_data/nodes.dmp.gz \ --acc2tax raw_data/combined_taxid_accssionNO_20200425.gz ``` Run actual program: ```console $ metaDMG compute ``` See the results in the dashboard: ```console $ metaDMG dashboard ``` --- ## Usage: metaDMG works by first creating a config file using the `config` command. This file contains all of the information related to metaDMG such that you only have to type this once. The config file is saved in the current directory as `config.yaml` and can subsequently be edited in any text editor of your like. After the config has been created, we run the actual program using the `compute` command. This can take a while depending on the number (and size) of the files. Finally the results are saved in `{output-dir}/results` directory (`data/results` by default). These can be viewed with the interactive dashboard using the `dashboard` command. --- # `config` #### CLI options: `metaDMG config` takes a single argument, `samples`, and a bunch of additional options. The `samples` refer to a single or multiple alignment-files (or a directory containing them), all with the file extensions: `.bam`, `.sam`, and `.sam.gz`. The options are listed below: - Input files: - `--names`: Path to the (NCBI) `names.dmp.gz`. Mandatory for LCA. - `--nodes`: Path to the (NCBI) `nodes.dmp.gz`. Mandator for LCA. - `--acc2tax`: Path to the (NCBI) `acc2tax.gz`. Mandatory for LCA. - LCA parameters: - `--min-similarity-score`: Normalised edit distance (read to reference similarity) minimum. Number between 0-1. Default: 0.95. - `--max-similarity-score`: Normalised edit distance (read to reference similarity) maximum. Number between 0-1 Default: 1.0. - `--min-edit-dist`: Minimum edit distance (read to reference similarity). Number between 0-10. Default: 0. - `--max-edit-dist`: Maximum edit distance (read to reference similarity). Number between 0-10. Default: 10. - `--min-mapping-quality`: Minimum mapping quality. Default: 0. - `--max-position`: Maximum position in the sequence to include. Default is (+/-) 15 (forward/reverse). - `--weight-type`: Method for calculating weights. Default is 1. - `--fix-ncbi`: Fix the (ncbi) database. Disable (0) if using a custom database. Default is 1. - `--lca-rank`: The LCA rank used in ngsLCA. Can be either `family`, `genus`, `species` or `""` (everything). Default is `""`. - Non-LCA parameters: - `--damage-mode`: `[lca|local|global]`. `lca` is the recommended and automatic setting. If using `local`, it means that damage patterns will be calculated for each chr/scaffold contig. If using `global`, it means one global estimate. Note that when using `[local|global]` all of the parameters in the LCA section above won't matter, except `--max-position`. - General parameters: - `--forward-only`: Only fit the forward strand. - `--output-dir`: Path where the generated output files and folders are stored. Default: `./data/`. - `--parallel-samples`: The maximum number of cores to use. Default is 1. - `--cores-per-sample`: Number of cores pr. sample. Do not change unless you know what you are doing. - `--sample-prefix`: Prefix for the sample names. - `--sample-suffix`: Suffix for the sample names. - `--config-path`: The name of the generated config file. Default: `config.yaml`. - Boolean flags (does not take any values, only the flag). Default is false. - `--bayesian`: Include a fully Bayesian model (probably better, but also _a lot_ slower, about a factor of 100). - `--long-name`: Use the full, long, name for the sample. ```console $ metaDMG config ./raw_data/example.bam \ --names raw_data/names.dmp.gz \ --nodes raw_data/nodes.dmp.gz \ --acc2tax raw_data/combined_taxid_accssionNO_20200425.gz \ --parallel-samples 4 ``` metaDMG is pretty versatile regarding its input argument and also accepts multiple alignment files: ```console $ metaDMG config ./raw_data/*.bam [...] ``` or even an entire directory containing alignment files (`.bam`, `.sam`, and `.sam.gz`): ```console $ metaDMG config ./raw_data/ [...] ``` To run metaDMG in non-LCA mode, an example could be: ``` $ metaDMG config ./raw_data/example.bam --damage-mode local --max-position 15 --bayesian ``` --- # `compute` The `metaDMG compute` command takes an optional config-file as argument (defaults to `config.yaml` if not specified). #### CLI options: - `--force`: Forced computation (even though the files already exists). Bool flag. #### Example: ```console $ metaDMG compute ``` ```console $ metaDMG compute non-default-config.yaml --force ``` --- # `dashboard` You can now see a preview of the [interactive dashboard](https://metadmg.herokuapp.com). The `metaDMG dashboard` command takes first an optional config-file as argument (defaults to `config.yaml` if not specified) followed by the following CLI options: #### CLI options: - `--port`: The port to be used for the dashboard. Default is `8050`. - `--host`: The dashboard host adress. Default is `0.0.0.0`. - `--debug`: Boolean flag that allows for debugging the dashboard. Only for internal usage. #### Example: ```console $ metaDMG dashboard ``` ```console $ metaDMG dashboard non-default-config.yaml --port 8050 --host 0.0.0.0 ``` --- # Results The column names in the results and their explanation: - General parameters: - `tax_id`: The tax ID. int64. - `tax_name`: The tax name. string. - `tax_rank`: The tax rank. string. - `sample`: The name of the original sample. string. - `N_reads`: The number of reads. int64. - `N_alignments`: The number of alignments. int64. - Fit related parameters: - `lambda_LR`: The likelihood ratio between the null model and the ancient damage model. This can be interpreted as the fit certainty, where higher values means higher certainty. float32. - `lambda_LR_P`: The likelihood ratio expressed as a probability. float32. - `lambda_LR_z`: The likelihood ratio expressed as number of ![](https://latex.codecogs.com/svg.image?%5Csigma). float32. - `D_max`: The estimated damage. This can be interpreted as the amount of damage in the specific taxa. float32. - `q`: The damage decay rate. float32. - `A`: The background independent damage. float32. - `c`: The background. float32. - `phi`: The concentration for a beta binomial distribution (parametrised by ![](https://latex.codecogs.com/svg.image?%5Cmu) and ![](https://latex.codecogs.com/svg.image?%5Cphi)). float32. - `rho_Ac`: The correlation between `A` and `c`. High values of this are often a sign of a bad fit. float32. - `valid`: Wether or not the fit is valid (defined by [iminuit](https://iminuit.readthedocs.io/en/stable/)). bool. - `asymmetry`: An estimate of the asymmetry of the forward and reverse fits. See below for more information. float32. - `XXX_std`: the uncertainty (standard deviation) of the variable `XXX` for `D_max`, `A`, `q`, `c`, and `phi`. - `forward__XXX`: The same description as above for variable `XXX`, but only for the forward read. - `reverse__XXX`: The same description as above for variable `XXX`, but only for the reverse read. - Read related parameters - `mean_L`: The mean read length of all the individual, unique reads that map to the specific taxa. float64. - `std_L`: The standard deviation of the above. float64. - `mean_GC`: The me

评论收藏

内容反馈

版权申诉