PyPI官网下载|torchdata-nightly-1620519056.tar.gz资源-CSDN文库

版权申诉

57 浏览量 2022-01-17 01:12:42 上传评论收藏 24KB GZ 举报

共28个文件

py：20个

txt：4个

pkg-info：2个

《PyPI上的torchdata-nightly-1620519056.tar.gz：Python数据处理库的夜间构建详解》在Python编程环境中，数据处理是机器学习和深度学习项目中的关键环节。PyPI（Python Package Index）是Python开发者获取第三方库的主要平台，它提供了丰富的工具和库以扩展Python的功能。本文将深入探讨PyPI上名为"torchdata-nightly-1620519056.tar.gz"的资源，这是一个与Python库相关的夜间构建版本，特别关注于数据处理。我们要了解的是“torchdata”这个库。它是PyTorch生态系统的一部分，专为PyTorch设计的数据加载和预处理模块。PyTorch是一个开源的机器学习框架，以其灵活性和易用性受到广大开发者喜爱。torchdata库的目标是简化数据处理流程，使数据加载更快、更高效，并且易于扩展。它的设计灵感来自于TensorFlow的tf.data API，但更注重性能和易用性。 "nightly"的命名通常意味着这是该库的每日构建版本，是开发团队在进行持续集成和持续部署(CI/CD)时发布的未经正式测试的最新代码。这种版本包含了最新的功能和改进，但可能不够稳定，适用于开发者和贡献者进行测试和反馈，而非生产环境。在"torchdata-nightly-1620519056.tar.gz"压缩包中，只有一个文件，即torchdata-nightly-1620519056，这通常是Python包的源代码或编译后的二进制文件。解压后，开发者可以将其安装到本地环境中，通过`pip install torchdata-nightly-1620519056.tar.gz`命令进行安装，从而获取到最新的torchdata特性。 torchdata库的主要特性包括： 1. **多源数据加载**：支持多种数据源，如文件系统、网络、数据库等，方便用户从不同渠道获取数据。 2. **高效数据预处理**：提供了丰富的数据转换操作，如映射、采样、过滤等，可以方便地组合和定制数据预处理流程。 3. **分块读取**：通过分块读取大文件，减少内存占用，提高程序运行效率。 4. **并行处理**：利用多线程和多进程实现数据加载的并行化，加快数据处理速度。 5. **自定义数据集**：允许用户轻松创建自己的数据集类，便于组织和管理数据。使用torchdata，开发者可以更好地控制数据处理流程，优化数据加载性能，使得机器学习模型的训练更为高效。然而，由于这是夜间构建版本，可能存在的bug和未解决的问题需要开发者有所准备，及时报告并跟踪修复情况。总结起来，"torchdata-nightly-1620519056.tar.gz"是PyTorch生态中的数据处理库torchdata的夜间版本，提供了便捷的数据加载和预处理功能，适合开发者用于探索和测试新特性。对于希望跟进行业最新进展的Python开发者而言，了解和尝试这样的夜间构建版本，可以帮助他们更好地适应快速发展的机器学习领域。

资源推荐

资源详情

资源评论

收起资源包目录

torchdata-nightly-1620519056.tar.gz （28个子文件）

torchdata-nightly-1620519056

setup.cfg 38B

README.md 6KB

tests

datasets.py 843B

modifiers_test.py 1KB

utils.py 1KB

__init__.py 0B

samplers_test.py 982B

datasets_test.py 805B

maps_test.py 3KB

torchdata_test.py 3KB

cachers_test.py 3KB

PKG-INFO 9KB

torchdata

_base.py 5KB

_dev_utils.py 507B

maps.py 11KB

datasets.py 23KB

_name.py 28B

_version.py 27B

cachers.py 10KB

__init__.py 2KB

modifiers.py 8KB

samplers.py 5KB

torchdata_nightly.egg-info

dependency_links.txt 1B

PKG-INFO 9KB

SOURCES.txt 620B

top_level.txt 16B

requires.txt 13B

setup.py 2KB

<img align="left" width="256" height="256" src="https://github.com/szymonmaszke/torchdata/blob/master/assets/logos/medium.png"> * Use `map`, `apply`, `reduce` or `filter` directly on `Dataset` objects * `cache` data in RAM/disk or via your own method (partial caching supported) * Full PyTorch's [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) and [`IterableDataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset>) support * General `torchdata.maps` like `Flatten` or `Select` * Extensible interface (your own cache methods, cache modifiers, maps etc.) * Useful `torchdata.datasets` classes designed for general tasks (e.g. file reading) * Support for `torchvision` datasets (e.g. `ImageFolder`, `MNIST`, `CIFAR10`) via `td.datasets.WrapDataset` * Minimal overhead (single call to `super().__init__()`) | Version | Docs | Tests | Coverage | Style | PyPI | Python | PyTorch | Docker | Roadmap | |---------|------|-------|----------|-------|------|--------|---------|--------|---------| | [![Version](https://img.shields.io/static/v1?label=&message=0.2.0&color=377EF0&style=for-the-badge)](https://github.com/szymonmaszke/torchdata/releases) | [![Documentation](https://img.shields.io/static/v1?label=&message=docs&color=EE4C2C&style=for-the-badge)](https://szymonmaszke.github.io/torchdata/) | ![Tests](https://github.com/szymonmaszke/torchdata/workflows/test/badge.svg) | ![Coverage](https://img.shields.io/codecov/c/github/szymonmaszke/torchdata?label=%20&logo=codecov&style=for-the-badge) | [![codebeat](https://img.shields.io/static/v1?label=&message=CB&color=27A8E0&style=for-the-badge)](https://codebeat.co/projects/github-com-szymonmaszke-torchdata-master) | [![PyPI](https://img.shields.io/static/v1?label=&message=PyPI&color=377EF0&style=for-the-badge)](https://pypi.org/project/torchdata/) | [![Python](https://img.shields.io/static/v1?label=&message=3.6&color=377EF0&style=for-the-badge&logo=python&logoColor=F8C63D)](https://www.python.org/) | [![PyTorch](https://img.shields.io/static/v1?label=&message=>=1.2.0&color=EE4C2C&style=for-the-badge)](https://pytorch.org/) | [![Docker](https://img.shields.io/static/v1?label=&message=docker&color=309cef&style=for-the-badge)](https://hub.docker.com/r/szymonmaszke/torchdata) | [![Roadmap](https://img.shields.io/static/v1?label=&message=roadmap&color=009688&style=for-the-badge)](https://github.com/szymonmaszke/torchdata/blob/master/ROADMAP.md) | # :bulb: Examples __Check documentation here:__ [https://szymonmaszke.github.io/torchdata](https://szymonmaszke.github.io/torchdata) ## General example - Create image dataset, convert it to Tensors, cache and concatenate with smoothed labels: ```python import torchdata as td import torchvision class Images(td.Dataset): # Different inheritance def __init__(self, path: str): super().__init__() # This is the only change self.files = [file for file in pathlib.Path(path).glob("*")] def __getitem__(self, index): return Image.open(self.files[index]) def __len__(self): return len(self.files) images = Images("./data").map(torchvision.transforms.ToTensor()).cache() ``` You can concatenate above dataset with another (say `labels`) and iterate over them as per usual: ```python for data, label in images | labels: # Do whatever you want with your data ``` - Cache first `1000` samples in memory, save the rest on disk in folder `./cache`: ```python images = ( ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor()) # First 1000 samples in memory .cache(td.modifiers.UpToIndex(1000, td.cachers.Memory())) # Sample from 1000 to the end saved with Pickle on disk .cache(td.modifiers.FromIndex(1000, td.cachers.Pickle("./cache"))) # You can define your own cachers, modifiers, see docs ) ``` To see what else you can do please check [**torchdata documentation**](https://szymonmaszke.github.io/torchdata/) ## Integration with `torchvision` Using `torchdata` you can easily split `torchvision` datasets and apply augmentation only to the training part of data without any troubles: ```python import torchvision import torchdata as td # Wrap torchvision dataset with WrapDataset dataset = td.datasets.WrapDataset(torchvision.datasets.ImageFolder("./images")) # Split dataset train_dataset, validation_dataset, test_dataset = torch.utils.data.random_split( model_dataset, (int(0.6 * len(dataset)), int(0.2 * len(dataset)), int(0.2 * len(dataset))), ) # Apply torchvision mappings ONLY to train dataset train_dataset.map( td.maps.To( torchvision.transforms.Compose( [ torchvision.transforms.RandomResizedCrop(224), torchvision.transforms.RandomHorizontalFlip(), torchvision.transforms.ToTensor(), torchvision.transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ), ] ) ), # Apply this transformation to zeroth sample # First sample is the label 0, ) ``` Please notice you can use `td.datasets.WrapDataset` with any existing `torch.utils.data.Dataset` instance to give it additional `caching` and `mapping` powers! # :wrench: Installation ## :snake: [pip](<https://pypi.org/project/torchdata/>) ### Latest release: ```shell pip install --user torchdata ``` ### Nightly: ```shell pip install --user torchdata-nightly ``` ## :whale2: [Docker](https://hub.docker.com/r/szymonmaszke/torchdata) __CPU standalone__ and various versions of __GPU enabled__ images are available at [dockerhub](https://hub.docker.com/r/szymonmaszke/torchdata/tags). For CPU quickstart, issue: ```shell docker pull szymonmaszke/torchdata:18.04 ``` Nightly builds are also available, just prefix tag with `nightly_`. If you are going for `GPU` image make sure you have [nvidia/docker](https://github.com/NVIDIA/nvidia-docker) installed and it's runtime set. # :question: Contributing If you find any issue or you think some functionality may be useful to others and fits this library, please [open new Issue](https://help.github.com/en/articles/creating-an-issue) or [create Pull Request](https://help.github.com/en/articles/creating-a-pull-request-from-a-fork). To get an overview of thins one can do to help this project, see [Roadmap](https://github.com/szymonmaszke/torchdata/blob/master/ROADMAP.md)

评论收藏

内容反馈

版权申诉