PyPI官网下载|clust-1.10.7.tar.gz资源-CSDN文库

版权申诉

73 浏览量 2022-01-31 13:20:51 上传评论收藏 48KB GZ 举报

共27个文件

py：18个

txt：5个

pkg-info：2个

标题中的"PyPI 官网下载 | clust-1.10.7.tar.gz"指出这是一个从Python Package Index (PyPI)官方下载的压缩包，名为"clust-1.10.7.tar.gz"。PyPI是Python开发者发布和分享他们软件包的中央仓库，用户可以从中获取并安装所需的Python库。描述中提到"资源全名：clust-1.10.7.tar.gz"，这进一步确认了这个文件是一个tar.gz格式的归档文件，通常包含源代码和其他资源，用于在Python环境中构建和安装clust库的版本1.10.7。标签包括"zookeeper"、"分布式"、"云原生"和"cloud native"以及"Python库"，这些标签揭示了clust库的核心功能和应用场景： 1. **Zookeeper**：Apache ZooKeeper是一个分布式的，开放源码的协调服务，用于管理配置信息，命名，提供分布式同步和组服务。这表明clust库可能与Zookeeper集成，用于在分布式系统中实现协调和管理任务。 2. **分布式**：标签表明clust库设计用于处理分布式环境中的问题，可能包含分布式计算、数据存储或服务发现等功能，适应于大规模、高并发的应用场景。 3. **云原生**（Cloud Native）：这个标签意味着clust库遵循云原生原则，即设计用于现代云基础设施，强调可移植性、弹性、容错性和自动化管理。 4. **Python库**：clust是一个用Python编写的库，提供Python开发者用来解决与上述标签相关的问题的工具和API。压缩包内的文件名称列表只给出了"clust-1.10.7"，这通常意味着解压后会有一个包含源代码、文档、测试和其他资源的目录结构。开发者需要按照Python的标准步骤（如使用`setup.py`文件）来安装和使用这个库。 clust-1.10.7是一个Python开发的分布式系统工具库，它与Apache ZooKeeper有紧密的关联，适用于云原生环境，能够帮助开发者在这样的环境中进行服务管理和协调。通过PyPI下载并安装此库，用户可以获得其提供的各种功能，以简化分布式系统的构建和管理。为了使用这个库，用户需要具备一定的Python编程和分布式系统知识，同时也可能需要了解Zookeeper的基础操作。

资源推荐

资源详情

资源评论

收起资源包目录

clust-1.10.7.tar.gz （27个子文件）

clust-1.10.7

PKG-INFO 702B

clust.egg-info

PKG-INFO 702B

requires.txt 57B

SOURCES.txt 632B

entry_points.txt 47B

top_level.txt 6B

dependency_links.txt 1B

setup.cfg 38B

setup.py 4KB

README.md 20KB

clust

__main__.py 6KB

__init__.py 35B

scripts

statistical.py 2KB

mnplots.py 12KB

datastructures.py 5KB

io.py 12KB

__init__.py 0B

validation.py 297B

uncles.py 23KB

glob.py 868B

numeric.py 7KB

preprocess_data.py 28KB

clustering.py 6KB

graphics.py 6KB

output.py 23KB

postprocess_results.py 27KB

clustpipeline.py 18KB

# Clust Optimised consensus clustering of one or more heterogeneous datasets. Try our *clust's Beta* website front-end at http://clust.baselabujamous.com? Or read below for an easy-to-use *clust* command line! ### Contents * [What does *Clust* do?](#what-does-clust-do) * [How does *Clust* do it?](#how-does-clust-do-it) * [Install *Clust*](#install-clust) * [Run *Clust*](#run-clust) * [Normalisation](#normalisation) * [Handling replicates](#handling-replicates) * [Data from multiple species](#data-from-multiple-species) * [Data from multiple technologies (e.g. mixing RNA-seq and microarrays)](#data-from-multiple-technologies-eg-mixing-rna-seq-and-microarrays) * [Handling missing genes](#handling-missing-genes) * [Handling genes with low expression](#handling-genes-with-low-expression) * [Are you obtaining noisy clusters?](#are-you-obtaining-noisy-clusters) * [List of all parameters](#list-of-all-parameters) * [Example datasets](#example-datasets) * [Citation](#citation) # What does Clust do? *Clust* is a fully automated method for identification of clusters (groups) of genes that are consistently co-expressed (well-correlated) in one or more heterogeneous datasets from one or multiple species. #### The single dataset case: ![Clusters_oneDS](Images/Clusters_1DS.png) *Figure 1: Clust processes one gene expression dataset to identify (*K*) clusters of co-expressed genes. Clust automatically identifies the number of clusters (*K*).* #### The multiple datasets case: ![Clusters_multiDS](Images/Clusters.png) *Figure 2: Clust processes multiple gene expression datasets (X1, X2, ... X(*L*)) to identify clusters of genes that are co-expressed (well-correlated) in each of the input datasets. The left-hand panel shows the gene expression profiles of all genes in each one of the input datasets, while the right-hand panel shows the gene expression profiles of the genes in the clusters (C1, C2, ... C(*k*)). Note that the number of conditions or time points are different for each dataset.* ### Features! 1. No need to pre-process your data; *clust* automatically normalises the data. 2. No need to preset the number of clusters; *clust* finds this number automatically. 3. You can control the tightness of the clusters by varying a single parameter `-t` 4. It is okay if the datasets: * Were generated by different technologies (e.g. RNA-seq or microarrays) * Are from different species * Have different numbers of conditions or time points * Have multiple replicates for the same condition * Require different types of normalisation * Were generated in different years and laboratories * Have some missing values * Do not include every single gene in every single dataset 5. *Clust* generates the following output files: * A table of clustering statistics * A table listing genes included in each cluster * Pre-processed (normalised, summarised, and filtered) datasets' files * Plotted gene expression profiles of clusters (a PDF file) # How does Clust do it? ![Clust workflow](Images/Workflow_PyPkg.png) *Figure 3: Automatic Clust analysis pipeline* # Install *Clust* ### Way 1 * `sudo pip install clust` Then run it from any directory as: * `clust ...` ### Way 2 * `pip install --user clust` Then run it from any directory as: * `clust ...` ### Way 3 (less recommended) First, make sure you have all of the following Python packages installed: * numpy * scipy * matplotlib * scikit-learn * pandas * joblib * portalocker Then, download the latest release file (clust-*.*.*.tar.gz) file from the [release tab](https://github.com/BaselAbujamous/clust/releases) and run *clust* without installation directly by running the script `clust.py` that is in the top level directory of the source code by: * `python clust.py ...` **Hint**: you can check which package you have installed by: * `pip freeze` ### Upgrade clust to a newer version If you already have *clust* and you want to upgdare it, then based on the way you used to install *clust* (from the ways above), upgrade it by: - Way 1. `sudo pip install clust --upgrade` - Way 2. `pip install --user clust --upgrade` - Way 3. Download the newer release file (clust-*.*.*.tar.gz) and use it to run clust instead of the older one ### For Windows users Clust has not been tried in Windows thoroughly. If you try it, your feedback will be much appreciated. We recommend that you download and install WinPython which provides you with many Python packages that *clust* requires from http://winpython.github.io/ Open `WinPython Powershell Prompt.exe` from the directory in which you installed WinPython. Run: * `pip install clust` Then you can run *clust* by: * `clust ...` # Run *Clust* For normalised homogeneous datasets, simply run: - `clust data_path` - `clust data_path -o output_directory [...]` Where `data_path` is either the path to a single data file (**v1.8.5+**), or a path to a directory including one or more data files. This command runs *clust* with default parameters. If the output directory is not provided using the `-o` option, *clust* creates a new directory for the results within the current working directory. For raw RNA-seq TPM, FPKM, or RPKM data, consider the [Normalisation](#normalisation) section below. Other sections below address handling [replicates](#handling-replicates), handling data from [mulitple species](#data-from-multiple-species), and handling [microarray data](#data-from-multiple-technologies-eg-microarrays) (only or mixed with RNA-seq data). ### Data files format Each dataset is represented in a single TAB delimited (TSV) file in which the first column represents gene IDs, the first row represents unique labels of the samples, and the rest of the file includes numerical values, mainly gene expression values. ![Data_simple](Images/Data_simple.png) *Figure 4: Snapshots of the first few lines of three data files X1.txt, X2.txt, and X3.txt.* * When the same gene ID appears in different datasets, it is considered to refer to the same gene. * If more than one row in the same file had the same identifier, they are automatically summarised by summing up their values. * **IMPORTANT**: Gene names should not include spaces, commas, or semicolons. # Normalisation **NEW FEATURE: AUTOMATIC NORMALISATION! (V1.7.0 and newer)** *Clust* applies data normalisation during its pre-processing step. * Version 1.7.0 and newer: *Clust* **automatically detects** the most suitable normalisation for each dataset unless otherwise stated by the user via the `-n` option. The normalisation codes that *clust* decides to apply are stored in the output file `/Normalisation_actual` * Version 1.6.0 and earlier: The required normalisation techniques should be stated by the user via the `-n` option. Otherwise, no normalisation is applied. #### The `-n` option: Tell *clust* how to normalise your data in one of two ways: 1. `clust data_path -n code1 [code2 code3 ...] [...]` **(V1.7.0 and newer)** * List one or more normalisation codes (from the table below) to be applied to your one or more datasets * Example: `clust data_path -n 101 3 4 [...]` 2. `clust data_path -n normalisation_file [...]` * Provide a file listing the normalisation codes for each dataset (see Fig. 5). * Each line of the file includes these elements in order: 1. The name of the dataset file (e.g. X0.txt) 2. One or more normalisation codes. **The order** of these codes defines the order of the application of normalisation techniques. * Delimiters between these elements can be spaces, TABs, commas, or semicolons. ![NormalisationFile](Images/NormalisationFile.png) *Figure 5: Normalisation file indicating the types of normalisation that should be applied to each of the datasets.* #### Codes suggested for commonly used datasets * RNA-seq TPM, FPKM, and RPKM data: **101 3 4** * Log2 RNA-seq TPM, FPKM, and RPKM data: **101 4** * One-colour microarray gene expression da

评论收藏

内容反馈

版权申诉