tensorflow-datasets-2.1.0.tar.gz资源-CSDN文库

98 浏览量 2024-03-21 12:36:00 上传评论收藏 2.58MB GZ 举报

共642个文件

py：456个

txt：179个

md：4个

资源推荐

资源详情

资源评论

收起资源包目录

tensorflow-datasets-2.1.0.tar.gz （642个子文件）

setup.cfg 38B

dataset.mako.md 7KB

README.md 7KB

schema_org.mako.md 2KB

catalog_overview.mako.md 1KB

PKG-INFO 2KB

dataset_builder.py 49KB

cbis_ddsm.py 38KB

wmt.py 37KB

super_glue.py 28KB

glue.py 27KB

dataset_info.py 25KB

dataset_builder_test.py 25KB

corruptions.py 23KB

splits.py 22KB

splits_test.py 21KB

tfrecords_reader.py 19KB

dataset_info_generated_pb2.py 17KB

subword_text_encoder.py 17KB

tfrecords_reader_test.py 17KB

download_manager.py 17KB

feature.py 17KB

coco.py 16KB

text_encoder.py 16KB

sequence_feature_test.py 16KB

tfrecords_writer.py 16KB

dataset_builder_testing.py 15KB

test_utils.py 15KB

open_images.py 15KB

file_format_adapter_test.py 15KB

c4_utils.py 15KB

registered.py 14KB

features_test.py 13KB

mnist.py 13KB

nsynth.py 13KB

qa4mre.py 13KB

c4.py 12KB

sun.py 12KB

dataset_info_test.py 12KB

abstract_reasoning.py 12KB

trivia_qa.py 12KB

download_manager_test.py 12KB

diabetic_retinopathy_detection.py 12KB

example_serializer.py 12KB

wikipedia.py 12KB

text_encoder_test.py 11KB

cars196.py 11KB

cityscapes.py 11KB

py_utils.py 11KB

caltech_birds.py 11KB

c4_utils_test.py 11KB

kitti.py 10KB

cnn_dailymail.py 10KB

bigearthnet.py 10KB

dataset_utils.py 9KB

downloader.py 9KB

resource.py 9KB

big_patent.py 9KB

imagenet2012_corrupted.py 9KB

curated_breast_imaging_ddsm.py 9KB

groove.py 9KB

lost_and_found.py 8KB

sequence_feature.py 8KB

tfrecords_writer_test.py 8KB

voc.py 8KB

moving_sequence.py 8KB

math_dataset.py 8KB

shuffle.py 8KB

file_format_adapter.py 8KB

imagenet.py 8KB

download_and_prepare.py 8KB

smallnorb.py 8KB

downloader_test.py 8KB

starcraft.py 8KB

visual_domain_decathlon.py 8KB

celeba.py 8KB

registered_test.py 7KB

dataset_utils_test.py 7KB

ucf101.py 7KB

cifar.py 7KB

amazon_us_reviews.py 7KB

yelp_polarity.py 7KB

features_dict.py 7KB

librispeech.py 7KB

natural_questions.py 7KB

wikihow.py 7KB

duke_ultrasound.py 7KB

stanford_dogs.py 7KB

speech_commands.py 7KB

subword_text_encoder_test.py 7KB

create_new_dataset.py 7KB

wider_face.py 7KB

image_feature.py 7KB

dataset_builder_beam_test.py 7KB

mocking.py 7KB

cifar10_corrupted.py 7KB

ted_hrlr.py 7KB

example_serializer_test.py 7KB

flores.py 6KB

共 642 条

# TensorFlow Datasets TensorFlow Datasets provides many public datasets as `tf.data.Datasets`. [![Kokoro](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.svg)](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.html) [![PyPI version](https://badge.fury.io/py/tensorflow-datasets.svg)](https://badge.fury.io/py/tensorflow-datasets) * [List of datasets](https://www.tensorflow.org/datasets/catalog/overview) * [Try it in Colab](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb) * [API docs](https://www.tensorflow.org/datasets/api_docs/python/tfds) * Guides * [Overview](https://www.tensorflow.org/datasets/overview) * [Datasets versioning](https://www.tensorflow.org/datasets/datasets_versioning) * [Using splits and slicing API](https://www.tensorflow.org/datasets/splits) * [Add a dataset](https://www.tensorflow.org/datasets/add_dataset) * [Add a huge dataset (>>100GiB)](https://www.tensorflow.org/datasets/beam_datasets) **Table of Contents** * [Installation](#installation) * [Usage](#usage) * [`DatasetBuilder`](#datasetbuilder) * [NumPy usage](#numpy-usage-with-tfdsas-numpy) * [Citation](#citation) * [Want a certain dataset?](#want-a-certain-dataset) * [Disclaimers](#disclaimers) ### Installation ```sh pip install tensorflow-datasets # Requires TF 1.15+ to be installed. # Some datasets require additional libraries; see setup.py extras_require pip install tensorflow # or: pip install tensorflow-gpu ``` Join [our Google group](https://groups.google.com/forum/#!forum/tensorflow-datasets-public-announce) to receive updates on the project. ### Usage ```python import tensorflow_datasets as tfds import tensorflow as tf # Here we assume Eager mode is enabled (TF2), but tfds also works in Graph mode. # See available datasets print(tfds.list_builders()) # Construct a tf.data.Dataset ds_train = tfds.load(name="mnist", split="train", shuffle_files=True) # Build your input pipeline ds_train = ds_train.shuffle(1000).batch(128).prefetch(10) for features in ds_train.take(1): image, label = features["image"], features["label"] ``` Try it interactively in a [Colab notebook](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb). ### `DatasetBuilder` All datasets are implemented as subclasses of `tfds.core.DatasetBuilder`. TFDS has two entry points: * `tfds.builder`: Returns the `tfds.core.DatasetBuilder` instance, giving control over `builder.download_and_prepate()` and `builder.as_dataset()`. * `tfds.load`: Convenience wrapper which hides the `download_and_prepate` and `as_dataset` calls, and directly returns the `tf.data.Dataset`. ```python import tensorflow_datasets as tfds # The following is the equivalent of the `load` call above. # You can fetch the DatasetBuilder class by string mnist_builder = tfds.builder('mnist') # Download the dataset mnist_builder.download_and_prepare() # Construct a tf.data.Dataset ds = mnist_builder.as_dataset(split='train') # Get the `DatasetInfo` object, which contains useful information about the # dataset and its features info = mnist_builder.info print(info) ``` This will print the dataset info content: ``` tfds.core.DatasetInfo( name='mnist', version=1.0.0, description='The MNIST database of handwritten digits.', homepage='http://yann.lecun.com/exdb/mnist/', features=FeaturesDict({ 'image': Image(shape=(28, 28, 1), dtype=tf.uint8), 'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10) }, total_num_examples=70000, splits={ 'test': <tfds.core.SplitInfo num_examples=10000>, 'train': <tfds.core.SplitInfo num_examples=60000> }, supervised_keys=('image', 'label'), citation='""" @article{lecun2010mnist, title={MNIST handwritten digit database}, author={LeCun, Yann and Cortes, Corinna and Burges, CJ}, journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist}, volume={2}, year={2010} } """', ) ``` You can also get details about the classes (number of classes and their names). ```python info = tfds.builder('cats_vs_dogs').info info.features['label'].num_classes # 2 info.features['label'].names # ['cat', 'dog'] info.features['label'].int2str(1) # "dog" info.features['label'].str2int('cat') # 0 ``` ### NumPy Usage with `tfds.as_numpy` As a convenience for users that want simple NumPy arrays in their programs, you can use `tfds.as_numpy` to return a generator that yields NumPy array records out of a `tf.data.Dataset`. This allows you to build high-performance input pipelines with `tf.data` but use whatever you'd like for your model components. ```python train_ds = tfds.load("mnist", split="train") train_ds = train_ds.shuffle(1024).batch(128).repeat(5).prefetch(10) for example in tfds.as_numpy(train_ds): numpy_images, numpy_labels = example["image"], example["label"] ``` You can also use `tfds.as_numpy` in conjunction with `batch_size=-1` to get the full dataset in NumPy arrays from the returned `tf.Tensor` object: ```python train_ds = tfds.load("mnist", split=tfds.Split.TRAIN, batch_size=-1) numpy_ds = tfds.as_numpy(train_ds) numpy_images, numpy_labels = numpy_ds["image"], numpy_ds["label"] ``` Note that the library still requires `tensorflow` as an internal dependency. ### Citation Please include the following citation when using `tensorflow-datasets` for a paper, in addition to any citation specific to the used datasets. ``` @misc{TFDS, title = {{TensorFlow Datasets}, A collection of ready-to-use datasets}, howpublished = {\url{https://www.tensorflow.org/datasets}}, } ``` ## Want a certain dataset? Adding a dataset is really straightforward by following [our guide](https://github.com/tensorflow/datasets/tree/master/docs/add_dataset.md). Request a dataset by opening a [Dataset request GitHub issue](https://github.com/tensorflow/datasets/issues/new?assignees=&labels=dataset+request&template=dataset-request.md&title=%5Bdata+request%5D+%3Cdataset+name%3E). And vote on the current [set of requests](https://github.com/tensorflow/datasets/labels/dataset%20request) by adding a thumbs-up reaction to the issue. #### *Disclaimers* *This is a utility library that downloads and prepares public datasets. We do* *not host or distribute these datasets, vouch for their quality or fairness, or* *claim that you have license to use the dataset. It is your responsibility to* *determine whether you have permission to use the dataset under the dataset's* *license.* *If you're a dataset owner and wish to update any part of it (description,* *citation, etc.), or do not want your dataset to be included in this* *library, please get in touch through a GitHub issue. Thanks for your* *contribution to the ML community!* *If you're interested in learning more about responsible AI practices, including* *fairness, please see Google AI's [Responsible AI Practices](https://ai.google/education/responsible-ai-practices).* *`tensorflow/datasets` is Apache 2.0 licensed. See the `LICENSE` file.*

评论收藏

内容反馈