PyPI官网下载|remixt-0.5.18.tar.gz资源-CSDN文库

版权申诉

171 浏览量 2022-01-15 20:15:50 上传评论收藏 870KB GZ 举报

共264个文件

h：85个

cpp：64个

py：62个

《PyPI官网下载 | remixt-0.5.18.tar.gz：探索Python库的发布与使用》 PyPI（Python Package Index），是Python开发者最重要的资源库之一，它提供了大量可重复使用的Python软件包，方便用户下载、安装和管理。在Python的生态系统中，PyPI扮演着核心角色，为全球的Python开发者提供了丰富的开源工具和库。 "remixt-0.5.18.tar.gz" 是一个源自PyPI官网的资源文件，它包含了名为"remixt"的Python库的0.5.18版本。这个压缩包的格式是tar.gz，这是一种常见的Linux和Unix系统中用于打包和压缩文件的格式，它结合了tar的归档功能和gzip的压缩功能，能有效减少文件大小，便于存储和传输。解压"remixt-0.5.18.tar.gz"后，我们可以看到"remixt-0.5.18"这个目录，其中通常包含了库的源代码、文档、测试用例、安装脚本等文件。源代码通常位于`src`或`remixt`目录下，开发者可以查看这些文件了解库的内部实现，学习如何使用或进行二次开发。 Python库的结构通常包括以下几个关键部分： 1. `setup.py`：这是Python项目的配置文件，用于定义项目的基本信息，如名称、版本、作者、依赖等，并提供安装、打包等操作的命令。 2. `MANIFEST.in`：指定在打包时应包含哪些额外的非Python文件，例如文档、数据文件等。 3. `LICENSE`：包含库的许可协议，定义了用户可以如何使用、修改和分发该库。 4. `README.md`或`README.rst`：提供项目的基本介绍、使用方法和安装指南，帮助用户快速了解库的功能和用途。 5. `requirements.txt`：列出了库运行所依赖的其他Python库及其版本要求，用于自动安装依赖。 6. `tests`或`test`目录：包含单元测试和集成测试，确保库的正确性。 7. `docs`目录：包含项目文档，通常由Sphinx等工具生成，供用户查阅。对于"remixt"库，我们还需要深入研究其API（应用程序编程接口）和示例代码，了解它提供的功能和如何在项目中使用。通常，库的主模块（如`remixt/__init__.py`）会定义库的核心功能，而其他模块则可能包含更具体的子功能。在实际使用中，开发者可以通过Python的`pip`工具从PyPI安装这个库，命令如下： ```bash pip install remixt ``` 安装完成后，就可以在Python代码中导入并使用"remixt"库提供的功能了。总结起来，"remixt-0.5.18.tar.gz"是一个从PyPI获取的Python库，它代表了一个特定版本的"remixt"，包含库的所有源代码和元数据。通过分析其结构和内容，我们可以更好地理解和使用这个库，同时也能了解到Python库的发布和管理流程。

资源推荐

资源详情

资源评论

收起资源包目录

PyPI 官网下载 | remixt-0.5.18.tar.gz （264个子文件）

gtest-all.cc 329KB

gtest.cc 176KB

gtest-death-test.cc 45KB

gtest-port.cc 25KB

gtest-filepath.cc 14KB

gtest-printers.cc 12KB

gtest-test-part.cc 4KB

gtest-typed-test.cc 4KB

gtest-all.cc 2KB

gtest_main.cc 2KB

setup.cfg 212B

CHANGES 5KB

internal_utils.cmake 9KB

ExportHeader.cmake 830B

CONTRIBUTORS 1KB

COPYING 1KB

bpmodel.cpp 1.83MB

bamreader.cpp 274KB

bamtools_resolve.cpp 51KB

bamtools_filter.cpp 42KB

BamAlignment.cpp 39KB

json_value.cpp 38KB

BamStandardIndex_p.cpp 35KB

bamtools_convert.cpp 32KB

BamMultiReader_p.cpp 30KB

BamToolsIndex_p.cpp 23KB

bamtools_split.cpp 20KB

json_reader.cpp 20KB

json_writer.cpp 20KB

BamWriter_p.cpp 19KB

bamtools_fasta.cpp 18KB

BamMultiReader.cpp 16KB

SamHeaderValidator_p.cpp 16KB

BgzfStream_p.cpp 15KB

BamHttp_p.cpp 15KB

BamReader_p.cpp 15KB

BamFtp_p.cpp 14KB

BamReader.cpp 13KB

TcpSocket_p.cpp 13KB

bamtools_pileup_engine.cpp 12KB

bamtools_options.cpp 12KB

bamtools_sort.cpp 12KB

HttpHeader_p.cpp 12KB

SamProgramChain.cpp 12KB

bamtools_utilities.cpp 11KB

HostAddress_p.cpp 11KB

bamtools_random.cpp 10KB

BamAlleleReader.cpp 10KB

SamSequenceDictionary.cpp 10KB

bamtools_stats.cpp 10KB

SamReadGroupDictionary.cpp 10KB

BamRandomAccessController_p.cpp 10KB

RollingBuffer_p.cpp 8KB

SamFormatParser_p.cpp 8KB

TcpSocketEngine_win_p.cpp 8KB

bamtools_merge.cpp 8KB

HostInfo_p.cpp 7KB

TcpSocketEngine_unix_p.cpp 7KB

SamFormatPrinter_p.cpp 7KB

SamHeader.cpp 7KB

bamtools_count.cpp 6KB

SamReadGroup.cpp 6KB

bamtools.cpp 6KB

bamtools_revert.cpp 6KB

TcpSocketEngine_p.cpp 5KB

bamtools_coverage.cpp 5KB

BamWriter.cpp 5KB

SamSequence.cpp 4KB

BamIndexFactory_p.cpp 4KB

BamHeader_p.cpp 4KB

bamtools_header.cpp 4KB

SamProgram.cpp 4KB

bamtools_index.cpp 3KB

ByteArray_p.cpp 3KB

BamPipe_p.cpp 2KB

BamFile_p.cpp 2KB

ILocalIODevice_p.cpp 2KB

BamDeviceFactory_p.cpp 1KB

BamException_p.cpp 688B

test.cpp 0B

Doxyfile 66KB

.git 52B

.gitignore 4B

gtest.h 783KB

gtest-internal-inl.h 39KB

json_value.h 33KB

BamAlignment.h 23KB

bamtools_filter_engine.h 21KB

BamAux.h 14KB

Sort.h 12KB

BamConstants.h 9KB

bamtools_filter_ruleparser.h 9KB

BamStandardIndex_p.h 9KB

bamtools_filter_properties.h 8KB

BamMultiMerger_p.h 7KB

bamtools_options.h 7KB

json_reader.h 7KB

json_writer.h 6KB

BamToolsIndex_p.h 6KB

共 264 条

# ReMixT ReMixT is a tool for joint inference of clone specific segment and breakpoint copy number in whole genome sequencing data. The input for the tool is a set of segments, a set of breakpoints predicted from the sequencing data, and normal and tumour bam files. Where multiple tumour samples are available, they can be analyzed jointly for additional benefit. ## Installation Conda is a prerequisite, install [anaconda python](https://store.continuum.io/cshop/anaconda/) from the continuum website. ### Installing from pip The recommended method of installation for ReMixT is using `pip`. pip install remixt You will also need to `shapeit` and `samtools` on your path. They can be installed using conda: conda install samtools conda install -c dranew shapeit ### Installing from conda The conda distribution is now out of date. However, to use conda, add my channel, and the bioconda channel, and install ReMixT as follows. conda config --add channels https://conda.anaconda.org/dranew conda config --add channels 'bioconda' conda install remixt ### Installing from source #### Clone Source Code To install the code, first clone from bitbucket. A recursive clone is preferred to pull in all submodules. git clone --recursive git@bitbucket.org:dranew/remixt.git #### Dependencies To install from source you will need several dependencies. A list of dependencies can be found in the `conda` `yaml` file in the repo at `conda/remixt/meta.yaml`. #### Build executables and install To build executables and install the ReMixT code as a python package run the following command in the ReMixT repo: python setup.py install ## Setup ReMixT ### Reference genome Download and setup of the reference genome is automated. The default is hg19. Select a directory on your system that will contain the reference data, herein referred to as `$ref_data_dir`. The `$ref_data_dir` directory will be used in many of the subsequent scripts when running destruct. Download the reference data and build the required indexes: remixt create_ref_data $ref_data_dir ### Mappability file Additionally, ReMixT requires a mappability file to be generated. We have provided a workflow for generating a mappability file based on `bwa` alignments, for other aligners, you may want to create your own mappability workflow, see `remixt/mappability/bwa/workflow.py` as an example. To create a mappability file for `bwa`, run: remixt mappability_bwa $ref_data_dir Note that this workflow will take a considerable amount of time and it is recommended you run this part of ReMixT setup on a cluster or multicore machine. For parallelism options see the section [Parallelism using pypeliner](#markdown-header-parallelism-using-pypeliner). ## Running ReMixT ### Input Data ReMixT takes multiple bam files as input. Bam files should be multiple samples from the same patient, with one bam sequenced from a normal sample from that patient. Additionally, ReMixT takes a list of predicted breakpoints detected by paired end sequencing as an additional input. #### Breakpoint Prediction Input Format The predicted breakpoints should be provided in a tab separated file with the following columns: * `prediction_id` * `chromosome_1` * `strand_1` * `position_1` * `chromosome_2` * `strand_2` * `position_2` The first line should be the column names, which should be identical to the above list. Each subsequent line is a breakpoint prediction. The `prediction_id` should be unique to each breakpoint prediction. The `chromosome_`, `strand_` and `position_` columns give the position and orientation of each end of the breakpoint. The values for `strand_` should be either `+` or `-`. A value of `+` means that sequence to the right of `chromosome_`, `position_` is preserved in the tumour chromosome containing the breakpoint. Conversely, a value of `-` means that sequence to the left of `chromosome_`, `position_` is preserved in the tumour chromosome containing the breakpoint. The following table may assist in understanding the strand of a break-end. Note that an inversion event produces two breakpoints, the strand configurations for both are shown. Additionally, for inter-chromosomal events, any strand configuration is possible. | Structural Variation | Strand of Leftmost Break-End | Strand of Rightmost Break-End | | ------------------------ | ---------------------------- | ----------------------------- | | Deletion | + | - | | Duplication | - | + | | Inversion (Breakpoint A) | + | + | | Inversion (Breakpoint B) | - | - | ### ReMixT Command Line Running ReMixT involves invoking a single command, `remixt run`. The result of ReMixT is an [hdf5](https://www.hdfgroup.org) file storing [pandas](http://pandas.pydata.org) tables. Suppose we have the following list of inputs: * Normal sample with ID `123N` and bam filename `$normal_bam` * Tumour sample with ID `123A` and bam filename `$tumour_a_bam` * Tumour sample with ID `123B` and bam filename `$tumour_b_bam` * Breakpoint table in TSV format with filename `$breakpoints` Additionally, ReMixT will generate the following outputs: * Results as HDF5 file storing pandas tables with filename `$results_h5` * Temporary files and logs stored in directory `$remixt_tmp_dir` (directory created if it doesnt exist) Given the above inputs and outputs run ReMixT as follows: remixt run $ref_data_dir $raw_data_dir $breakpoints \ --normal_sample_id 123N \ --normal_bam_file $normal_bam \ --tumour_sample_ids 123A 123B \ --tumour_bam_files $tumour_a_bam $tumour_b_bam \ --results_files $results_h5 --tmpdir $remixt_tmp_dir Note that ReMixT creates multiple jobs and many parts of ReMixT are massively parallelizable, thus it is recommended you run ReMixT on a cluster or multicore machine. For parallelism options see the section [Parallelism using pypeliner](#markdown-header-parallelism-using-pypeliner). ### Output File Formats The main output file is an HDF5 store containing pandas dataframes. These can be extracted in python or viewed using the ReMixT viewer. Important tables include: * `stats`: statistics for each restart * `solutions/solution_{idx}/cn`: segment copy number table for solution `idx` * `solutions/solution_{idx}/brk_cn`: breakpoint copy number table for solution `idx` * `solutions/solution_{idx}/h`: haploid depths for solution `idx` #### Statistics ReMixT uses optimal restarts and model selection by BIC. The statistics table contains one row per restart, sorted by BIC. The table contains the following columns: * `idx`: the solution index, used to refer to `solutions/solution_{idx}/*` tables. * `bic`: the bic of this solution * `log_posterior`: log posterior of the HMM * `log_posterior_graph`: log posterior of the genome graph model * `num_clones`: number of clones including normal * `num_segments`: number of segments * `h_converged`: whether haploid depths estimation converged * `h_em_iter`: number of iterations for convergence of h * `graph_opt_iter`: number of iterations for convergence of genome graph copy number * `decreased_log_posterior`: whether the genome graph optimization stopped due to a move that decreased the log posterior #### Segment Copy Number The segment copy number table adds additional columns to the segment counts table described above, including but not limited to: * `major_1` * `minor_1` * `major_2` * `minor_2` The columns refer to the major and minor copy number in tumour clone 1 and 2. #### Breakpoints Copy Number The breakpoint copy number table contains the following columns: * `prediction_id` * `cn_1` * `cn_2` The `prediction_id` column matches the column of the same name

评论收藏

内容反馈

版权申诉