PyPI官网下载|fseq2-2.0.0.tar.gz资源-CSDN文库

版权申诉

Python库

196 浏览量 2022-01-11 21:29:38 上传评论收藏 71KB GZ 举报

共50个文件

py：15个

rst：10个

txt：4个

资源推荐

资源详情

资源评论

收起资源包目录

fseq2-2.0.0.tar.gz （50个子文件）

fseq2-2.0.0

MANIFEST.in 243B

PKG-INFO 755B

bin

fseq2 27KB

fseq2.egg-info

PKG-INFO 755B

requires.txt 85B

not-zip-safe 1B

SOURCES.txt 1KB

top_level.txt 6B

dependency_links.txt 1B

fseq2

callpeak_idr_main.py 3KB

idr_2_0_3

idr.py 36KB

utility.py 5KB

__init__.py 1KB

symbolic.py 5KB

optimization.py 16KB

examine_inv_cdf.py 6KB

stuff_i_pbly_wont_use.py 25KB

__init__.py 130B

callpeak_main.py 10KB

idr_main.py 3KB

fseq2.py 52KB

docs

history.rst 28B

conf.py 5KB

usage.rst 65B

make.bat 767B

installation.rst 1KB

authors.rst 28B

readme.rst 26B

Makefile 606B

index.rst 302B

contributing.rst 33B

AUTHORS.rst 201B

CONTRIBUTING.rst 3KB

tests

unit

.DS_Store 6KB

.DS_Store 8KB

integration

test_callpeak_idr.py 5KB

.DS_Store 8KB

fixtures

treatment_1.bam 870B

control_2.bam 871B

control_1.bam 870B

hg19.chrom.sizes 365B

result_1_result_2.wig 4KB

.DS_Store 6KB

result_1_result_2_conservative_IDR_thresholded_peaks.narrowPeak 108B

treatment_2.bam 873B

LICENSE 34KB

HISTORY.rst 87B

setup.cfg 400B

setup.py 1KB

README.md 11KB

# F-Seq2 ## Improving the feature density based peak caller with dynamic statistics Tag sequencing using high-throughput sequencing technologies are employed to identify specific sequence features such as DNase-seq, ATAC-seq, ChIP-seq, and FAIRE-seq. To intuitively summarize and display individual sequence data as an accurate and interpretable signal, we have developed the original [F-Seq](http://fureylab.web.unc.edu/software/fseq/) [GitHub](https://github.com/aboyle/F-seq), a software package that generates a continuous tag sequence density estimation allowing identification of biologically meaningful sites whose output can be displayed directly in the UCSC Genome Browser. F-Seq2 is a complete rewrite of the original version in Python. We designed a new statistical framework and introduced new features to F-Seq to further improve the performance in its second version. F-Seq2 implements a dynamic parameter to conduct local statistical analysis with an underlying “continuous” Poisson distribution. By combining the power of the local test and the KDE, which model the read probability distribution with statistical rigor, we robustly account for local biases and solve ties that occur when ranking candidate summits, making results suitable for irreproducible discovery rate (IDR) analysis. ## Table of contents 1. [Installation](./INSTALL.md) 2. [Usage](#usage) - [`callpeak`](#callpeak) - [`callpeak_idr`](#callpeak_idr) - [`idr`](#idr) 3. [Output files and formats](#output-files-and-formats) 4. [Examples](#examples) 5. [Reference](#reference) 6. [Troubleshooting](#troubleshooting) ## Installation Prerequisite: [BEDTools](https://bedtools.readthedocs.io/en/latest/content/installation.html). See [here](./INSTALL.md) for more details to install F-Seq2. ## Usage ``` fseq2 [-h] [--version] {callpeak, callpeak_idr, idr} ``` Available subcommands Subcommand | Description -----------|---------- `callpeak` | F-Seq2 main function to call peaks from alignment results. `callpeak_idr` | Call peaks and follow by IDR framework with recommended settings. `idr` | A wrapper for [IDR package](https://github.com/nboley/idr) for customized IDR analysis. ## `callpeak` #### Command line input: ##### `-treatment_file` REQUIRED argument for fseq2. Treatment file(s) in bam or bed format. If specifiy multiple files (separated by space), they are considered as one treatment experiment. See [here](./INPUT_FORMAT.md) for more details about input format. ##### `-control_file` Control file(s) corresponding to treatment file(s). ##### `-pe` Paired-end mode. If this flag on, treatment (and control) file(s) are paired-end data, either in format of BAMPE or BEDPE. Default is False to treat all data as single-end. See [here](./INPUT_FORMAT.md) for more details about paired-end mode. ##### `-chrom_size_file` A file specify chrom sizes, where each line has one chrom and its size. This is required if output signal format is bigwig. Note if this file is specified, fseq2 only process the chroms in this file. Default is False to process all and cannot output bigwig. ##### `-o` Output directory. Default is current directory. ##### `-name` Prefix for all output files. This overrides exisiting files. Default is `fseq2_result`. ##### `-sig_format` Signal format for reconstructed signal. Available format `wig`, `bigwig`, `np_array`. Note if choose `np_array`, arrays for each chrom are stored in [`NAME_sig.h5`](#name_sigh5) with `chrom` as key, and no gaussian smooth applied. Default is False, without output signal. ##### `-sort_by` Sort peaks and summits by `pValue` or `chromAndStart`. Default is `chromAndStart`. ##### `-v` Verbose output. Default is False. ##### `-f` Fragment size of treatment data. Default is to estimate from data. This determines shift size where `offset = fragment_size/2`. For DNase-seq and ATAC-seq data, set `-f 0`. ##### `-l` Feature length for treatment data. Default is 600. Recommend 50 for TF ChIP-seq, 600 for DNase-seq and ATAC-seq, 1000 for histone ChIP-seq. ##### `-fc` Fragment size of control data. ##### `-t` Threshold (standard deviations) to call candidate summits. Default is 4.0. Recommend 4.0 for broad peaks, 8.0 for sharp peaks. ##### `-p_thr` P value threshold. Default is 0.01. Consider to relax it to 0.05 when without control data or calling broad peaks. ##### `-q_thr` Q value (FDR) threshold. Default is not set and use `p_thr`. If set, only use `q_thr`. ##### `-cpus` Number of cpus to use. Default is 1. ##### `-tp` Threshold (standard deviations) to call peak regions. Default is 4.0. ##### `-sparse_data` If flag on, statistical test includes 1k region for more accurate background estimation. This can be useful for single-cell data. ##### `-nfr_upper_limit` Nucleosome free region upper limit. Default is 150. Used as window_size and min_distance when `-f 0`. ##### `-pe_fragment_size_range` Effective only if `-pe` on. Only keep PE fragments whose size within the range to call peaks. Default is False, without any selection. Useful for ATAC-seq data: (1) to call peaks on nucleosome free regions, specify: `0 150` (2) to call peaks on nucleosome centers, specify: `150 inf` (3) to call peaks on open chromatin regions, specify: `auto` > `auto` is a filter designed for ATAC-seq open chromatin peak calling where we filter out fragments whose size related to mono-, di-, tri-, and multi-nucleosomes. Size information is taken from the original ATAC-seq paper (Buenrostro et al.). You can design your own auto filter based on specific experiment data by specifying `-nucleosome_size` parameter. ##### `-nucleosome_size` Effective only if `-pe` on and specify `-pe_fragment_size_range auto`. Default is `180, 247, 315, 473, 558, 615` They are the ATAC-seq PE fragment sizes related to mono-, di-, and tri-nucleosomes. Fragments whose size within the ranges and above the largest bound (i.e. 615) are filtered out when calling peaks. Change those numbers to design your own auto filter. ##### `-prior_pad_summit` Prior knowledge about peak length which only padded into `NAME_summits.narrowPeak`. Default is 0. Useful for IDR analysis: in `callpeak_idr`, we set it to est. fragment size. ##### `-num_peaks` Maximum number of peaks called. Default is not set. If set, overrides `p_thr` and `q_thr`. ## `callpeak_idr` #### Command line input: Most arguments are shared between `callpeak` and `callpeak_idr`. Here are the unique ones. > Notice if it is `-` or `--` ahead of arguments. `--` arguments are from IDR package. `-` are from fseq2. ##### `-treatment_file_1` Treatment file in bam or bed format as replicate 1. ##### `-treatment_file_2` Treatment file in bam or bed format as replicate 2. ##### `-control_file_1` Control file in bam or bed format, paired with replicate 1 treament file. ##### `-control_file_2` Control file in bam or bed format, paired with replicate 2 treament file. ##### `-name_1` Prefix for output files for replicate 1 (default=`fseq2_result_1`). ##### `-name_2` Prefix for output files for replicate 2 (default=`fseq2_result_2`). ##### `-prior_pad_summit` Prior knowledge about peak length which only padded into `NAME_summits.narrowPeak`. Default is est. fragment size. ##### `--idr_threshold` Only return peaks with a global idr threshold below this value. Default: report all peaks. ##### `--soft_idr_threshold` Report statistics for peaks with a global idr below this value but return all peaks with an idr below --idr Default: 0.05. ##### `--plot` Plot IDR results. Specify False if no plot. Default is to plot to `NAME_1_NAME_2.png`. Can specify other name here. Notice this is different from original IDR package which is only a flag. ## `idr` #### Command line input and output: See original [IDR documentation](https://github.com/nboley/idr#usage). > Notice all single letter arguments are removed to avoid conflict with fseq2, e.g. no `-s`, use `--samples` ## Output files a

评论收藏

内容反馈

版权申诉