PyPI官网下载|taxoniq-0.1.3.tar.gz资源-CSDN文库

版权申诉

104 浏览量 2022-01-30 04:27:32 上传评论收藏 273KB GZ 举报

共41个文件

py：9个

cc：9个

cpp：9个

在IT行业中，Python是一种广泛应用的编程语言，尤其在数据处理、科学计算以及Web开发等领域。PyPI（Python Package Index）是Python社区官方的软件包仓库，开发者可以在这里发布自己的Python库，供他人下载和使用。"PyPI 官网下载 | taxoniq-0.1.3.tar.gz"的标题表明这是一个从PyPI官方网站下载的Python库，名为taxoniq，版本为0.1.3，格式为tar.gz。 taxoniq-0.1.3.tar.gz是一个压缩包，通常包含源代码、元数据、README文件和其他必要的构建和安装信息。在Python世界里，这样的压缩包是通过`setup.py`脚本进行安装的，用户可以通过命令行工具pip来执行安装，例如`pip install taxoniq-0.1.3.tar.gz`。标签中提到了“zookeeper”、“分布式”和“云原生（cloud native）”，这暗示taxoniq可能是一个与这些技术相关的Python库。Zookeeper是Apache的一个项目，提供分布式配置服务、命名服务和分布式同步等功能，广泛应用于分布式系统中。如果taxoniq与Zookeeper有关，那么它可能提供了与Zookeeper交互的API或工具，帮助开发者在Python环境中更好地管理和协调分布式应用程序。 “云原生”（Cloud Native）是指设计和构建应用程序的方式，使其能在云环境中的容器、微服务架构下高效运行。taxoniq可能是一个支持云原生理念的库，可能包含了与容器编排（如Kubernetes）、服务发现、弹性伸缩等云原生特性相关的功能。在云原生生态系统中，Python库经常用于构建自动化流程、监控、日志收集等任务。由于未提供具体的压缩包子文件列表，我们无法详细探讨每个文件的功能。但通常，一个Python项目的压缩包会包含以下组件： 1. `setup.py`：定义了项目的元数据和安装过程。 2. `MANIFEST.in`：指定应包含在源分布中的额外文件。 3. `LICENSE`：项目的许可证文件，描述了使用、修改和分发代码的法律条款。 4. `README`：项目简介和使用说明。 5. `requirements.txt`：列出项目依赖的Python包。 6. `src`或`taxoniq`目录：包含Python源代码。 7. `tests`目录：存放测试用例，确保代码质量。 8. `docs`目录：项目文档，可能包括开发者指南和用户手册。对于实际的使用，开发者首先需要解压文件，然后通过Python的`setup.py`脚本来安装或测试该库。通过阅读文档和示例代码，了解如何将taxoniq集成到自己的项目中，利用其提供的功能，如与Zookeeper的交互或实现云原生环境下的特定任务。在实际开发中，持续关注库的更新和维护，以获取最新的功能和修复的安全漏洞。

资源推荐

资源详情

资源评论

收起资源包目录

taxoniq-0.1.3.tar.gz （41个子文件）

taxoniq-0.1.3

setup.cfg 106B

README.md 9KB

marisa-trie

lib

marisa

agent.cc 1KB

keyset.cc 6KB

trie.cc 6KB

grimoire

trie

louds-trie.cc 26KB

tail.cc 6KB

vector

bit-vector.cc 25KB

writer.cc 3KB

reader.cc 3KB

mapper.cc 4KB

src

keyset.cpp 97KB

trie.cpp 98KB

query.cpp 97KB

std_iostream.cpp 97KB

key.cpp 97KB

agent.cpp 97KB

marisa_trie.cpp 909KB

base.cpp 97KB

iostream.cpp 98KB

test

test.py 4KB

sample_wikipedia_extract.json 81B

__pycache__

__init__.cpython-36.pyc 128B

test.cpython-36.pyc 1KB

__init__.py 0B

PKG-INFO 12KB

taxoniq.egg-info

dependency_links.txt 1B

PKG-INFO 12KB

SOURCES.txt 1KB

top_level.txt 8B

entry_points.txt 45B

requires.txt 55B

MANIFEST.in 39B

taxoniq

vendored

__init__.py 0B

tax_dump_readers.py 7KB

util.py 844B

build.py 20KB

cli.py 2KB

__init__.py 11KB

setup.py 3KB

Changes.rst 3KB

Taxoniq: Taxon Information Query - fast, offline querying of NCBI Taxonomy and related data =========================================================================================== Taxoniq is a Python and command-line interface to the [NCBI Taxonomy database](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7408187/) and selected data sources that cross-reference it. Taxoniq's features include: - Pre-computed indexes updated monthly from NCBI, [WoL](https://biocore.github.io/wol/) and cross-referenced databases - Offline operation: all indexes are bundled with the package; no network calls are made when querying taxon information (separately, Taxoniq can fetch the nucleotide or protein sequences over the network given a taxon or accession - see **Retrieving sequences** below) - A CLI capable of JSON I/O, batch processing and streaming of inputs for ease of use and pipelining in shell scripts - A stable, well-documented, type-hinted Python API (Python 3.6 and higher is supported) - Comprehensive testing and continuous integration - An intuitive interface with useful defaults - Compactness, readability, and extensibility The Taxoniq package bundles an indexed, compressed copy of the [NCBI taxonomy database files](https://ncbiinsights.ncbi.nlm.nih.gov/2018/02/22/new-taxonomy-files-available-with-lineage-type-and-host-information/), the [NCBI RefSeq](https://www.ncbi.nlm.nih.gov/refseq/) nucleotide and protein accessions associated with each taxon, the [WoL](https://biocore.github.io/wol/) kingdom-wide phylogenomic distance database, and relevant information from other databases. Accessions which appear in the NCBI RefSeq BLAST databases are indexed so that given a taxon ID, accession ID, or taxon name, you can quickly retrieve the taxon's rank, lineage, description, citations, representative RefSeq IDs, LCA information, evolutionary distance, sequence (with a network call), and more, as described in the **Cookbook** section below. ## Installation pip3 install taxoniq ## Synopsis ```python >>> import taxoniq >>> t = taxoniq.Taxon(9606) >>> t.scientific_name 'Homo sapiens' >>> t.common_name 'human' >>> t.ranked_lineage [taxoniq.Taxon(9606), taxoniq.Taxon(9605), taxoniq.Taxon(9604), taxoniq.Taxon(9443), taxoniq.Taxon(40674), taxoniq.Taxon(7711), taxoniq.Taxon(33208), taxoniq.Taxon(2759)] >>> len(t.lineage) 32 >>> [(t.rank.name, t.scientific_name) for t in t.ranked_lineage] [('species', 'Homo sapiens'), ('genus', 'Homo'), ('family', 'Hominidae'), ('order', 'Primates'), ('class', 'Mammalia'), ('phylum', 'Chordata'), ('kingdom', 'Metazoa'), ('superkingdom', 'Eukaryota')] >>> [(c.rank.name, c.common_name) for c in t.child_nodes] [('subspecies', 'Neandertal'), ('subspecies', 'Denisova hominin')] >>> t.refseq_representative_genome_accessions[:10] [taxoniq.Accession('NC_000001.11'), taxoniq.Accession('NC_000002.12'), taxoniq.Accession('NC_000003.12'), taxoniq.Accession('NC_000004.12'), taxoniq.Accession('NC_000005.10'), taxoniq.Accession('NC_000006.12'), taxoniq.Accession('NC_000007.14'), taxoniq.Accession('NC_000008.11'), taxoniq.Accession('NC_000009.12'), taxoniq.Accession('NC_000010.11')] >>> t.url 'https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606' # Wikidata provides structured links to many databases about taxa represented on Wikipedia >>> t.wikidata_url 'https://www.wikidata.org/wiki/Q15978631' ``` ``` >>> t2 = taxoniq.Taxon(scientific_name="Bacillus anthracis") >>> t2.description ' Bacillus anthracis is the agent of anthrax—a common disease of livestock and, occasionally, of humans—and the only obligate pathogen within the genus Bacillus. This disease can be classified as a zoonosis, causing infected animals to transmit the disease to humans. B. anthracis is a Gram-positive, endospore-forming, rod-shaped bacterium, with a width of 1.0–1.2 µm and a length of 3–5 µm. It can be grown in an ordinary nutrient medium under aerobic or anaerobic conditions. It is one of few bacteria known to synthesize a protein capsule (poly-D-gamma-glutamic acid). Like Bordetella pertussis, it forms a calmodulin-dependent adenylate cyclase exotoxin known as anthrax edema factor, along with anthrax lethal factor. It bears close genotypic and phenotypic resemblance to Bacillus cereus and Bacillus thuringiensis. All three species share cellular dimensions and morphology...' ``` ```python >>> t3 = taxoniq.Taxon(accession_id="NC_000913.3") >>> t3.scientific_name 'Escherichia coli str. K-12 substr. MG1655"' >>> t3.parent.parent.common_name 'E. coli' >>> t3.refseq_representative_genome_accessions[0].length 4641652 # The get_from_s3() method is the only command that will trigger a network call. >>> seq = t3.refseq_representative_genome_accessions[0].get_from_s3().read() >>> len(seq) 4641652 >>> seq[:64] b'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGAT' ``` ## Retrieving sequences Mirrors of the NCBI BLAST databases are maintained on [AWS S3](https://registry.opendata.aws/ncbi-blast-databases/) (`s3://ncbi-blast-databases`) and Google Storage (`gs://blast-db`). This is a key resource, since S3 and GS have superior bandwidth and throughput compared to the NCBI FTP server, so range requests can be used to retrieve individual sequences from the database files without downloading and keeping a copy of the whole database. The Taxoniq PyPI distribution (the package you install using `pip3 install taxoniq`) indexes accessions for the following NCBI BLAST databases: - Refseq viruses representative genomes (`ref_viruses_rep_genomes`) (nucleotide) - Refseq prokaryote representative genomes (contains refseq assembly) (`ref_prok_rep_genomes`) (nucleotide) - RefSeq Eukaryotic Representative Genome Database (`ref_euk_rep_genomes`) (nucleotide) - Betacoronavirus (nucleotide) Given an accession ID, Taxoniq can issue a single HTTP request and return a file-like object streaming the nucleotide sequence for this accession from the S3 or GS mirror as follows: ```python with taxoniq.Accession("NC_000913.3").get_from_s3() as fh: fh.read() ``` To retrieve many sequences quickly, you may want to use a threadpool to open multiple network connections at once: ```python from concurrent.futures import ThreadPoolExecutor def fetch_seq(accession): seq = accession.get_from_s3().read() return (accession, seq) taxon = taxoniq.Taxon(scientific_name="Apis mellifera") for accession, seq in ThreadPoolExecutor().map(fetch_seq, taxon.refseq_representative_genome_accessions): print(accession, len(seq)) ``` ## Command-line interface `pip3 install taxoniq` installs a command-line utility, `taxoniq`, which can be used to perform many of the same functions provided by the Python API: ``` >taxoniq child_nodes --taxon-id 2 --output-format '{tax_id}: {scientific_name}' [ "1224: Proteobacteria", "2323: Bacteria incertae sedis", "32066: Fusobacteria", "40117: Nitrospirae", "48479: environmental samples", "49928: unclassified Bacteria", "57723: Acidobacteria", "68297: Dictyoglomi", "74152: Elusimicrobia", "200783: Aquificae", "200918: Thermotogae", "200930: Deferribacteres", "200938: Chrysiogenetes", "200940: Thermodesulfobacteria", "203691: Spirochaetes", "508458: Synergistetes", "1783257: PVC group", "1783270: FCB group", "1783272: Terrabacteria group", "1802340: Nitrospinae/Tectomicrobia group", "1930617: Calditrichaeota", "2138240: Coprothermobacterota", "2498710: Caldiserica/Cryosericota group", "2698788: Candidatus Krumholzibacteriota", "2716431: Coleospermum", "2780997: Vogosella" ] ``` See `taxoniq --help` for full details. ## Using the nr/nt databases Because of their size, taxoniq wheels with indexes of the NT (GenBank Non-redundant nucleotide) BLAST database are distributed on GitHub

评论收藏

内容反馈

版权申诉