# TensorFlow Datasets
TensorFlow Datasets provides many public datasets as `tf.data.Datasets`.
[![Kokoro](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.svg)](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.html)
[![PyPI version](https://badge.fury.io/py/tensorflow-datasets.svg)](https://badge.fury.io/py/tensorflow-datasets)
[![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](https://www.tensorflow.org/datasets/api_docs/python/tfds)
* [List of datasets](https://www.tensorflow.org/datasets/catalog/overview)
* Getting started:
* [Introduction](https://www.tensorflow.org/datasets/overview) ([Try it in Colab](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb))
* [End-to-end example with Keras](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/keras_example.ipynb)
* Features & performances:
* [Using splits and slicing API](https://www.tensorflow.org/datasets/splits)
* [Performance advice](https://www.tensorflow.org/datasets/performances)
* [Datasets versioning](https://www.tensorflow.org/datasets/datasets_versioning)
* [Feature decoding](https://www.tensorflow.org/datasets/decode)
* [Store your dataset on GCS](https://www.tensorflow.org/datasets/gcs)
* Add your dataset:
* [Add a dataset](https://www.tensorflow.org/datasets/add_dataset)
* [Add a huge dataset with Beam (>>100GiB)](https://www.tensorflow.org/datasets/beam_datasets)
* [API docs](https://www.tensorflow.org/datasets/api_docs/python/tfds)
Note: [`tf.data`](https://www.tensorflow.org/guide/data) is a builtin library in
TensorFlow which builds efficient data pipelines.
[TFDS](https://www.tensorflow.org/datasets) (this library) uses `tf.data` to
build an input pipeline when you load a dataset.
**Table of Contents**
* [Installation](#installation)
* [Usage](#usage)
* [`DatasetBuilder`](#datasetbuilder)
* [NumPy usage with `tfds.as_numpy`](#numpy-usage-with-tfdsas_numpy)
* [Citation](#citation)
* [Want a certain dataset?](#want-a-certain-dataset)
* [*Disclaimers*](#disclaimers)
### Installation
```sh
pip install tensorflow-datasets
# Requires TF 1.15+ to be installed.
# Some datasets require additional libraries; see setup.py extras_require
pip install tensorflow
# or:
pip install tensorflow-gpu
```
Join [our Google group](https://groups.google.com/forum/#!forum/tensorflow-datasets-public-announce)
to receive updates on the project.
### Usage
```python
import tensorflow_datasets as tfds
import tensorflow as tf
# Here we assume Eager mode is enabled (TF2), but tfds also works in Graph mode.
# Construct a tf.data.Dataset
ds_train = tfds.load('mnist', split='train', shuffle_files=True)
# Build your input pipeline
ds_train = ds_train.shuffle(1000).batch(128).prefetch(10)
for features in ds_train.take(1):
image, label = features['image'], features['label']
```
Try it interactively in a
[Colab notebook](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb).
### `DatasetBuilder`
All datasets are implemented as subclasses of `tfds.core.DatasetBuilder`. TFDS
has two entry points:
* `tfds.builder`: Returns the `tfds.core.DatasetBuilder` instance, giving
control over `builder.download_and_prepare()` and
`builder.as_dataset()`.
* `tfds.load`: Convenience wrapper which hides the `download_and_prepare` and
`as_dataset` calls, and directly returns the `tf.data.Dataset`.
```python
import tensorflow_datasets as tfds
# The following is the equivalent of the `load` call above.
# You can fetch the DatasetBuilder class by string
mnist_builder = tfds.builder('mnist')
# Download the dataset
mnist_builder.download_and_prepare()
# Construct a tf.data.Dataset
ds = mnist_builder.as_dataset(split='train')
# Get the `DatasetInfo` object, which contains useful information about the
# dataset and its features
info = mnist_builder.info
print(info)
```
This will print the dataset info content:
```python
tfds.core.DatasetInfo(
name='mnist',
version=3.0.1,
description='The MNIST database of handwritten digits.',
homepage='http://yann.lecun.com/exdb/mnist/',
features=FeaturesDict({
'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
}),
total_num_examples=70000,
splits={
'test': 10000,
'train': 60000,
},
supervised_keys=('image', 'label'),
citation="""@article{lecun2010mnist,
title={MNIST handwritten digit database},
author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
volume={2},
year={2010}
}""",
redistribution_info=,
)
```
You can also get details about the classes (number of classes and their names).
```python
info = tfds.builder('cats_vs_dogs').info
info.features['label'].num_classes # 2
info.features['label'].names # ['cat', 'dog']
info.features['label'].int2str(1) # "dog"
info.features['label'].str2int('cat') # 0
```
### NumPy Usage with `tfds.as_numpy`
As a convenience for users that want simple NumPy arrays in their programs, you
can use `tfds.as_numpy` to return a generator that yields NumPy array
records out of a `tf.data.Dataset`. This allows you to build high-performance
input pipelines with `tf.data` but use whatever you'd like for your model
components.
```python
train_ds = tfds.load("mnist", split="train")
train_ds = train_ds.shuffle(1024).batch(128).repeat(5).prefetch(10)
for example in tfds.as_numpy(train_ds):
numpy_images, numpy_labels = example["image"], example["label"]
```
You can also use `tfds.as_numpy` in conjunction with `batch_size=-1` to
get the full dataset in NumPy arrays from the returned `tf.Tensor` object:
```python
train_ds = tfds.load("mnist", split=tfds.Split.TRAIN, batch_size=-1)
numpy_ds = tfds.as_numpy(train_ds)
numpy_images, numpy_labels = numpy_ds["image"], numpy_ds["label"]
```
Note that the library still requires `tensorflow` as an internal dependency.
### Citation
Please include the following citation when using `tensorflow-datasets` for a
paper, in addition to any citation specific to the used datasets.
```
@misc{TFDS,
title = {{TensorFlow Datasets}, A collection of ready-to-use datasets},
howpublished = {\url{https://www.tensorflow.org/datasets}},
}
```
## Want a certain dataset?
Adding a dataset is really straightforward by following
[our guide](https://github.com/tensorflow/datasets/tree/master/docs/add_dataset.md).
Request a dataset by opening a
[Dataset request GitHub issue](https://github.com/tensorflow/datasets/issues/new?assignees=&labels=dataset+request&template=dataset-request.md&title=%5Bdata+request%5D+%3Cdataset+name%3E).
And vote on the current
[set of requests](https://github.com/tensorflow/datasets/labels/dataset%20request)
by adding a thumbs-up reaction to the issue.
#### *Disclaimers*
*This is a utility library that downloads and prepares public datasets. We do*
*not host or distribute these datasets, vouch for their quality or fairness, or*
*claim that you have license to use the dataset. It is your responsibility to*
*determine whether you have permission to use the dataset under the dataset's*
*license.*
*If you're a dataset owner and wish to update any part of it (description,*
*citation, etc.), or do not want your dataset to be included in this*
*library, please get in touch through a GitHub issue. Thanks for your*
*contribution to the ML community!*
*If you're interested in learning more about responsible AI practices, including*
*fairness, please see Google AI's [Responsible AI Practices](https://ai.google/education/responsible-ai-practices).*
*`tensorflow/datasets` is Apache 2.0 licensed. See the
[`LICENSE`](LICENSE) file.*
没有合适的资源?快使用搜索试试~ 我知道了~
tensorflow-datasets-3.2.0.tar.gz
0 下载量 110 浏览量
2024-03-21
12:36:04
上传
评论
收藏 2.76MB GZ 举报
温馨提示
Python库是一组预先编写的代码模块,旨在帮助开发者实现特定的编程任务,无需从零开始编写代码。这些库可以包括各种功能,如数学运算、文件操作、数据分析和网络编程等。Python社区提供了大量的第三方库,如NumPy、Pandas和Requests,极大地丰富了Python的应用领域,从数据科学到Web开发。Python库的丰富性是Python成为最受欢迎的编程语言之一的关键原因之一。这些库不仅为初学者提供了快速入门的途径,而且为经验丰富的开发者提供了强大的工具,以高效率、高质量地完成复杂任务。例如,Matplotlib和Seaborn库在数据可视化领域内非常受欢迎,它们提供了广泛的工具和技术,可以创建高度定制化的图表和图形,帮助数据科学家和分析师在数据探索和结果展示中更有效地传达信息。
资源推荐
资源详情
资源评论
收起资源包目录
tensorflow-datasets-3.2.0.tar.gz (745个子文件)
setup.cfg 38B
dataset.mako.md 9KB
README.md 8KB
schema_org.mako.md 2KB
catalog_overview.mako.md 898B
PKG-INFO 2KB
PKG-INFO 2KB
waymo_dataset_generated_pb2.py 62KB
dataset_builder.py 46KB
cbis_ddsm.py 38KB
wmt.py 37KB
dataset_builder_test.py 29KB
super_glue.py 28KB
glue.py 27KB
download_manager.py 24KB
corruptions.py 23KB
dataset_info.py 22KB
tfrecords_reader_test.py 20KB
tfrecords_reader.py 20KB
registered.py 19KB
dataset_builder_testing.py 18KB
dataset_info_generated_pb2.py 17KB
subword_text_encoder.py 17KB
movielens_parsing_test.py 17KB
movielens.py 17KB
test_utils.py 17KB
feature.py 17KB
download_manager_test.py 16KB
coco.py 16KB
text_encoder.py 16KB
sequence_feature_test.py 16KB
tfrecords_writer.py 15KB
c4_utils.py 15KB
open_images.py 15KB
py_utils.py 14KB
dataset_info_test.py 14KB
nsynth.py 13KB
features_test.py 13KB
opus.py 13KB
c4.py 13KB
qa4mre.py 13KB
mnist.py 13KB
sun.py 12KB
abstract_reasoning.py 12KB
trivia_qa.py 12KB
cars196.py 12KB
wikipedia.py 12KB
cityscapes.py 11KB
text_encoder_test.py 11KB
diabetic_retinopathy_detection.py 11KB
caltech_birds.py 11KB
document_datasets.py 11KB
c4_utils_test.py 11KB
tedlium.py 11KB
example_serializer.py 11KB
kitti.py 10KB
cnn_dailymail.py 10KB
resource.py 10KB
bigearthnet.py 10KB
movielens_parsing.py 9KB
dataset_utils.py 9KB
civil_comments.py 9KB
registered_test.py 9KB
tfrecords_writer_test.py 9KB
imagenet2012_corrupted.py 9KB
big_patent.py 9KB
sequence_feature.py 9KB
lost_and_found.py 8KB
groove.py 8KB
download_and_prepare.py 8KB
voc.py 8KB
wordnet.py 8KB
moving_sequence.py 8KB
math_dataset.py 8KB
document_datasets_test.py 8KB
shuffle.py 8KB
splits.py 8KB
celeba.py 8KB
waymo_open_dataset.py 8KB
cfq.py 8KB
smallnorb.py 8KB
open_images_challenge2019.py 8KB
example_serializer_test.py 8KB
mocking.py 8KB
imagenet.py 8KB
visual_domain_decathlon.py 8KB
features_dict.py 8KB
voxforge.py 8KB
irc_disentanglement.py 7KB
dataset_utils_test.py 7KB
cifar.py 7KB
amazon_us_reviews.py 7KB
starcraft.py 7KB
scan.py 7KB
librispeech.py 7KB
downloader.py 7KB
natural_questions.py 7KB
wikihow.py 7KB
duke_ultrasound.py 7KB
stanford_dogs.py 7KB
共 745 条
- 1
- 2
- 3
- 4
- 5
- 6
- 8
资源评论
程序员Chino的日记
- 粉丝: 3718
- 资源: 5万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 毕设和企业适用springboot区块链技术类及在线药品管理平台源码+论文+视频.zip
- 毕设和企业适用springboot区块链技术类及在线票务管理平台源码+论文+视频.zip
- 毕设和企业适用springboot区块链技术类及自动化测试平台源码+论文+视频.zip
- 毕设和企业适用springboot区块链交易平台类及IT资产管理平台源码+论文+视频.zip
- 毕设和企业适用springboot汽车电商类及家庭健康管理平台源码+论文+视频.zip
- 毕设和企业适用springboot区块链交易平台类及公寓管理系统源码+论文+视频.zip
- 毕设和企业适用springboot区块链交易平台类及个性化推荐系统源码+论文+视频.zip
- 毕设和企业适用springboot汽车电商类及跨境物流平台源码+论文+视频.zip
- 毕设和企业适用springboot汽车电商类及企业培训平台源码+论文+视频.zip
- 毕设和企业适用springboot汽车电商类及企业资源规划平台源码+论文+视频.zip
- 毕设和企业适用springboot区块链交易平台类及健康数据分析系统源码+论文+视频.zip
- 毕设和企业适用springboot商城类及在线平台源码+论文+视频.zip
- 毕设和企业适用springboot汽车电商类及视频监控平台源码+论文+视频.zip
- 毕设和企业适用springboot汽车电商类及社交电商平台源码+论文+视频.zip
- 毕设和企业适用springboot汽车电商类及网络游戏交易平台源码+论文+视频.zip
- 毕设和企业适用springboot商城类及运动赛事管理平台源码+论文+视频.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功