PyPI官网下载|toil-vg-1.4.1a1.dev928.tar.gz资源-CSDN文库

版权申诉

7 浏览量 2022-01-30 09:10:10 上传评论收藏 98KB GZ 举报

共30个文件

py：20个

txt：5个

pkg-info：2个

标题中的"PyPI 官网下载 | toil-vg-1.4.1a1.dev928.tar.gz"表明这是一个在Python Package Index (PyPI) 上发布的软件包，名为`toil-vg`，版本号为`1.4.1a1.dev928`，并且以`.tar.gz`格式提供，这是Python项目常用的源码打包方式。PyPI是Python开发者发布和获取第三方Python库的主要平台，用户可以通过`pip`等工具便捷地安装这些库。描述中提到"资源来自pypi官网"，进一步确认了这个软件包的来源，保证了其权威性和可靠性。资源的全名是`toil-vg-1.4.1a1.dev928.tar.gz`，这通常包括了项目的名称、主版本号、次版本号、修订版号以及开发版本标识。标签中包含了"zookeeper"、"分布式"、"云原生"和"cloud native"，这暗示了`toil-vg`可能是一个与分布式系统和云计算相关的Python库。`Zookeeper`是一个广泛使用的分布式协调服务，用于管理大型分布式系统的配置信息、命名服务、分布式同步和组服务。"分布式"表明此库可能涉及分布式计算或数据处理。"云原生"（cloud native）则意味着该库可能遵循云原生计算基金会（CNCF）提倡的原则，比如微服务、容器化、动态编排等，以优化在云环境中的部署和运行。 `toil-vg`这个名字暗示它可能是`Toil`项目的一个扩展，`Toil`是一个开源的、社区驱动的、基于Mesos的作业调度器，用于执行大规模生物信息学工作流。VG（Variation Graphs Toolkit）可能是指一种处理遗传变异的工具，利用图结构来表示基因组变异，这种工具在生物信息学领域非常常见。在压缩包子文件的文件名称列表中，只有一个条目`toil-vg-1.4.1a1.dev928`，这通常包含源代码、README文件、setup.py（用于构建和安装的Python脚本）、测试代码以及任何其他必要的资源。通过解压并查看这些内容，可以获取更多关于`toil-vg`如何工作、如何安装和使用的信息，包括模块结构、依赖项、示例代码等。 `toil-vg`是一个与生物信息学和分布式计算相关的Python库，它可能利用`Toil`调度器和`Zookeeper`进行分布式协调，且遵循云原生原则，适用于云环境。要深入了解和使用这个库，我们需要解压文件并阅读其源代码、文档和示例。

资源推荐

资源详情

资源评论

收起资源包目录

toil-vg-1.4.1a1.dev928.tar.gz （30个子文件）

toil-vg-1.4.1a1.dev928

MANIFEST.in 19B

PKG-INFO 322B

src

toil_vg

vg_vcfeval.py 11KB

test

test_vg.py 24KB

__init__.py 0B

vg_mapeval.py 88KB

vg_toil.py 21KB

vg_config.py 17KB

vg_construct.py 42KB

vg_common.py 28KB

vg_calleval.py 21KB

__init__.py 0B

context.py 6KB

vg_surject.py 9KB

vg_call.py 33KB

vg_index.py 36KB

singularity.py 6KB

vg_map.py 28KB

iostore.py 35KB

vg_sim.py 16KB

toil_vg.egg-info

PKG-INFO 322B

requires.txt 69B

SOURCES.txt 704B

entry_points.txt 50B

top_level.txt 8B

dependency_links.txt 1B

setup.cfg 45B

setup.py 2KB

README.md 16KB

version.py 853B

# TOIl-VG ## University of California, Santa Cruz Genomics Institute ### Please contact us on [github with any issues](https://github.com/BD2KGenomics/toil-vg/issues/new) [vg](https://github.com/vgteam/vg) is a toolkit for DNA sequence analysis using variation graphs. Toil-vg is a [toil](https://github.com/BD2KGenomics/toil)-based framework for running common vg pipelines at scale, either locally or on a distributed computing environment: `toil-vg construct`: Create vg graph from FASTA and VCF, constructing contigs in parallel. `toil-vg run`: Given input vg graph(s), create indexes, map reads, then produce VCF variant calls. `toil-vg index`: Produce a GCSA and/or XG index from input graph(s). `toil-vg map`: Produce a graph alignment (GAM) for each chromosome from input reads and index `toil-vg call`: Produce VCF from input XG index and GAM(s). ## Installation ### Local TOIL-VG Pip Installation Installation requires Python and Toil. We recommend installing within virtualenv as follows virtualenv toilvenv source toilvenv/bin/activate pip install toil[aws,mesos]==3.13.0 pip install --pre toil-vg ## WIKI See the [Wiki](https://github.com/vgteam/toil-vg/wiki) in addition to below for examples. ### Docker toil-vg can run vg, along with some other tools, via [Docker](http://www.docker.com). Docker can be installed locally (not required when running via cgcloud), as follows. * [**Linux Docker Installation**](https://docs.docker.com/engine/installation/linux/): If running `docker version` doesn't work, try adding user to docker group with `sudo usermod -aG docker $USER`, then log out and back in. * [**Mac Docker Installation**](https://docs.docker.com/docker-for-mac/): If running `docker version` doesn't work, try adding docker environment variables: `docker-machine start; docker-machine env; eval "$(docker-machine env default)"` * **Running Without Docker**: If Docker is not installed or is disabled with `--container None`, toil-vg requires the following command line tools to be installed on the system: `vg, pigz, bcftools, tabix`. `jq, samtools and rtg vcfeval` are also necessary for certain tests. ## Configuration A configuration file can be used as an alternative to most command line options. A default configuration file can be generated using toil-vg generate-config > config.yaml Pass this file to `toil-vg` commands using the `--config` option. For non-trivial inputs, care must be taken to specify the resource requirements for the different pipeline phases (via the command line or by editing the config file), as they all default to single-core and 4G of ram. To generate a default configuration for running at genome scale on a cluster with 32-core worker nodes, use toil-vg generate-config --whole_genome > config_wg.yaml ## Testing make test A faster test to see if toil-vg runs on the current machine (Replace myname with a unique prefix): ./scripts/bakeoff.sh -f myname f1.tsv Or on a Toil cluster ./scripts/bakeoff.sh -fm myname f1.tsv In both cases, verify that f1.tsv contains a number (should be approx. 0.9). Note that this script will create some directories (or S3 buckets) of the form `myname-bakeoff-out-store-brca1` and `myname-bakeoff-job-store-brca1`. These will have to be manually removed. ## A Note on IO conventions The jobStore and outStore arguments to toil-vg are directories that will be created if they do not already exist. When starting a new job, toil will complain if the jobStore exists, so use `toil clean <jobStore>` first. When running on Mesos, these stores should be S3 buckets. They are specified using the following format aws:region:bucket (see examples below). All other input files can either either be local (best to specify absolute path) or URLs specified in the normal manner, ex : http://address/input_file or s3://bucket/input_file. The config file must always be local. When using an S3 jobstore, it is preferable to pass input files from S3 as well, as they load much faster and less cluster time will be wasted importing data. ## Running on Amazon EC2 with Toil ### Install Toil Please read Toil's [installation documentation](http://toil.readthedocs.io/en/latest/install/basic.html) Install Toil locally. This can be done with virtualenv as follows: virtualenv ~/toilvenv . ~/toilvenv/bin/activate pip install toil[aws,mesos] ### Create a leader node wget https://raw.githubusercontent.com/BD2KGenomics/toil-vg/master/scripts/create-ec2-leader.sh ./create-ec2-leader.sh <leader-name> <keypair-name> Log into the leader with toil ssh-cluster <leader-name> --zone usa-west-2a In order to log onto a worker node instead of the leader, find its public IP from the EC2 Management Console or command line, and log in using the core username: `ssh core@public-ip` Destroy the leader when finished with it. After logging out with `exit`: toil destroy-cluster myleader ### Small AWS Test Run a small test from the leader node as follows. wget https://raw.githubusercontent.com/BD2KGenomics/toil-vg/master/scripts/bakeoff.sh chmod u+x ./bakeoff.sh ./bakeoff.sh -fm <NAME> ### Processing a Whole Genome From the leader node, begin by making a toil-vg configuration file suitable for processing whole-genomes, then customizing it as necessary. toil-vg generate-config --whole_genome > wg.yaml Toil-vg can be used to construct vg graphs as, for example, [described here](https://github.com/vgteam/vg/wiki/working-with-a-whole-genome-variation-graph). Files will be written to the S3 bucket, OUT_STORE and the S3 bucket, JOB_STORE, will be used by Toil (both buckets created automatically if necessary; do not prefix OUT_STORE or JOB_STORE with s3://) REF=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz VCF=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz MASTER_IP=`ifconfig eth0 |grep "inet addr" |awk '{print $2}' |awk -F: '{print $2}'` toil-vg construct aws:us-west-2:JOB_STORE aws:us-west-2:OUT_STORE --fasta $REF --vcf $VCF --config wg.yaml --out_name hs37d5 --batchSystem=mesos --mesosMaster=${MASTER_IP}:5050 --nodeTypes r3.8xlarge:0.85 --maxNodes 8 --provisioner aws --realTimeLogging --logInfo --defaultPreemptable --logFile construct.log --retryCount 3 --regions $(for i in $(seq 1 22; echo X; echo Y); do echo $i; done) Indexes can be created above using the `--xg_index` and `--gcsa_index` options (and switching to i2.8xlarge nodes), or by running `toil-vg index` below. : MASTER_IP=`ifconfig eth0 |grep "inet addr" |awk '{print $2}' |awk -F: '{print $2}'` toil-vg index aws:us-west-2:JOB_STORE aws:us-west-2:OUT_STORE --batchSystem=mesos --mesosMaster=${MASTER_IP}:5050 --graphs $(for i in $(seq 22; echo X; echo Y); do echo s3://OUT_STORE/hs37d5-${i}; done) --chroms $(for i in $(seq 22; echo X; echo Y); do echo $i; done) --realTimeLogging --logInfo --config wg.yaml --index_name my_index --defaultPreemptable --nodeTypes i2.8xlarge:1.00 --maxNodes 5 --provisioner aws 2> index.log Note that the spot request node type (i2.8xlarge) and amount ($1.00) can be adjusted in the above command. Keep in mind that indexing is very memory and disk intensive. If successful, this will produce for files in s3://OUT_STORE/ my_index.xg my_index.gcsa my_index.gcsa.lcp my_index_id_ranges.tsv We can now align reads and produce a VCF in a single call to `toil-vg run`. (see `toil-vg map` and `toil-vg call` to do separately). The invocation is similar to the above, except we use r3.8xlarge instances as we do not need as much disk and memory. toil-vg run aws:us-west-2:JOB_STORE READ_LOCATION/reads.fastq.gz SAMPLE_NAME aws:us-west-2:OUT_STORE --batchSystem=mesos --mesosMaster=${MASTER_IP}:5050 --gcsa_index s3://OUT_STORE/my_index.gcsa --xg_index s3://

评论收藏

内容反馈

版权申诉