tensorflow-datasets-1.3.0.tar.gz资源-CSDN文库

181 浏览量 2024-03-21 12:35:53 上传评论收藏 1.9MB GZ 举报

共529个文件

py：381个

txt：141个

md：4个

资源推荐

资源详情

资源评论

收起资源包目录

tensorflow-datasets-1.3.0.tar.gz （529个子文件）

setup.cfg 38B

README.md 7KB

dataset.mako.md 4KB

schema_org.mako.md 1KB

catalog_overview.mako.md 1KB

PKG-INFO 2KB

dataset_builder.py 42KB

cbis_ddsm.py 38KB

wmt.py 36KB

glue.py 26KB

dataset_info.py 25KB

super_glue.py 24KB

splits.py 20KB

splits_test.py 19KB

corruptions.py 18KB

dataset_builder_test.py 18KB

subword_text_encoder.py 17KB

dataset_info_generated_pb2.py 17KB

tfrecords_reader.py 17KB

coco.py 16KB

text_encoder.py 16KB

download_manager.py 16KB

open_images.py 15KB

test_utils.py 15KB

file_format_adapter_test.py 15KB

tfrecords_reader_test.py 14KB

c4_utils.py 14KB

registered.py 14KB

feature.py 14KB

dataset_builder_testing.py 13KB

mnist.py 13KB

nsynth.py 13KB

sun.py 12KB

features_test.py 12KB

dataset_info_test.py 12KB

abstract_reasoning.py 12KB

download_manager_test.py 12KB

cars196.py 12KB

text_encoder_test.py 11KB

diabetic_retinopathy_detection.py 11KB

py_utils.py 11KB

trivia_qa.py 11KB

caltech_birds.py 11KB

coco2014_legacy.py 11KB

c4_utils_test.py 10KB

kitti.py 10KB

wikipedia.py 10KB

cnn_dailymail.py 10KB

c4.py 10KB

sequence_feature_test.py 9KB

bigearthnet.py 9KB

librispeech.py 9KB

downloader.py 9KB

resource.py 9KB

curated_breast_imaging_ddsm.py 9KB

groove.py 9KB

dataset_utils.py 8KB

sequence_feature.py 8KB

moving_sequence.py 8KB

voc.py 8KB

file_format_adapter.py 8KB

download_and_prepare.py 8KB

smallnorb.py 8KB

imagenet.py 8KB

downloader_test.py 8KB

celeba.py 8KB

visual_domain_decathlon.py 8KB

registered_test.py 7KB

imagenet2012_corrupted.py 7KB

cifar.py 7KB

amazon_us_reviews.py 7KB

starcraft.py 7KB

ucf101.py 7KB

features_dict.py 7KB

stanford_dogs.py 7KB

subword_text_encoder_test.py 7KB

shuffle.py 7KB

wikihow.py 7KB

wider_face.py 7KB

create_new_dataset.py 7KB

image_feature.py 7KB

cifar10_corrupted.py 7KB

mocking.py 7KB

flores.py 7KB

ted_hrlr.py 7KB

kitti.py 6KB

tfrecords_writer.py 6KB

multi_nli.py 6KB

translation_feature.py 6KB

open_images.py 6KB

lm1b.py 6KB

dataset_utils_test.py 6KB

cmaterdb.py 6KB

extractor.py 6KB

dataset_builder_beam_test.py 6KB

imdb.py 6KB

shapes3d.py 6KB

class_label_feature.py 6KB

para_crawl.py 6KB

共 529 条

# TensorFlow Datasets TensorFlow Datasets provides many public datasets as `tf.data.Datasets`. [![Kokoro](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.svg)](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.html) [![PyPI version](https://badge.fury.io/py/tensorflow-datasets.svg)](https://badge.fury.io/py/tensorflow-datasets) * [List of datasets](https://www.tensorflow.org/datasets/catalog/overview) * [Try it in Colab](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb) * [API docs](https://www.tensorflow.org/datasets/api_docs/python/tfds) * Guides * [Overview](https://www.tensorflow.org/datasets/overview) * [Datasets versioning](https://www.tensorflow.org/datasets/datasets_versioning) * [Using splits and slicing API](https://www.tensorflow.org/datasets/splits) * [Add a dataset](https://www.tensorflow.org/datasets/add_dataset) * [Add a huge dataset (>>100GiB)](https://www.tensorflow.org/datasets/beam_datasets) **Table of Contents** * [Installation](#installation) * [Usage](#usage) * [`DatasetBuilder`](#datasetbuilder) * [NumPy usage](#numpy-usage-with-tfdsas-numpy) * [Want a certain dataset?](#want-a-certain-dataset) * [Disclaimers](#disclaimers) ### Installation ```sh pip install tensorflow-datasets # Requires TF 1.15+ to be installed. # Some datasets require additional libraries; see setup.py extras_require pip install tensorflow # or: pip install tensorflow-gpu ``` Join [our Google group](https://groups.google.com/forum/#!forum/tensorflow-datasets-public-announce) to receive updates on the project. ### Usage ```python import tensorflow_datasets as tfds import tensorflow as tf # tfds works in both Eager and Graph modes tf.compat.v1.enable_eager_execution() # See available datasets print(tfds.list_builders()) # Construct a tf.data.Dataset ds_train, ds_test = tfds.load(name="mnist", split=["train", "test"]) # Build your input pipeline ds_train = ds_train.shuffle(1000).batch(128).prefetch(10) for features in ds_train.take(1): image, label = features["image"], features["label"] ``` Try it interactively in a [Colab notebook](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb). ### `DatasetBuilder` All datasets are implemented as subclasses of [`DatasetBuilder`](https://www.tensorflow.org/datasets/api_docs/python/tfds/core/DatasetBuilder.md) and [`tfds.load`](https://www.tensorflow.org/datasets/api_docs/python/tfds/load.md) is a thin convenience wrapper. [`DatasetInfo`](https://www.tensorflow.org/datasets/api_docs/python/tfds/core/DatasetInfo.md) documents the dataset. ```python import tensorflow_datasets as tfds # The following is the equivalent of the `load` call above. # You can fetch the DatasetBuilder class by string mnist_builder = tfds.builder("mnist") # Download the dataset mnist_builder.download_and_prepare() # Construct a tf.data.Dataset ds = mnist_builder.as_dataset(split=tfds.Split.TRAIN) # Get the `DatasetInfo` object, which contains useful information about the # dataset and its features info = mnist_builder.info print(info) tfds.core.DatasetInfo( name='mnist', version=1.0.0, description='The MNIST database of handwritten digits.', urls=[u'http://yann.lecun.com/exdb/mnist/'], features=FeaturesDict({ 'image': Image(shape=(28, 28, 1), dtype=tf.uint8), 'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10) }, total_num_examples=70000, splits={ u'test': <tfds.core.SplitInfo num_examples=10000>, u'train': <tfds.core.SplitInfo num_examples=60000> }, supervised_keys=(u'image', u'label'), citation='""" @article{lecun2010mnist, title={MNIST handwritten digit database}, author={LeCun, Yann and Cortes, Corinna and Burges, CJ}, journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist}, volume={2}, year={2010} } """', ) ``` You can also get details about the classes (number of classes and their names). ```python info = tfds.builder('cats_vs_dogs').info info.features['label'].num_classes # 2 info.features['label'].names # ['cat', 'dog'] info.features['label'].int2str(1) # "dog" info.features['label'].str2int('cat') # 0 ``` ### NumPy Usage with `tfds.as_numpy` As a convenience for users that want simple NumPy arrays in their programs, you can use [`tfds.as_numpy`](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_numpy.md) to return a generator that yields NumPy array records out of a `tf.data.Dataset`. This allows you to build high-performance input pipelines with `tf.data` but use whatever you'd like for your model components. ```python train_ds = tfds.load("mnist", split=tfds.Split.TRAIN) train_ds = train_ds.shuffle(1024).batch(128).repeat(5).prefetch(10) for example in tfds.as_numpy(train_ds): numpy_images, numpy_labels = example["image"], example["label"] ``` You can also use `tfds.as_numpy` in conjunction with `batch_size=-1` to get the full dataset in NumPy arrays from the returned `tf.Tensor` object: ```python train_ds = tfds.load("mnist", split=tfds.Split.TRAIN, batch_size=-1) numpy_ds = tfds.as_numpy(train_ds) numpy_images, numpy_labels = numpy_ds["image"], numpy_ds["label"] ``` Note that the library still requires `tensorflow` as an internal dependency. ## Want a certain dataset? Adding a dataset is really straightforward by following [our guide](https://github.com/tensorflow/datasets/tree/master/docs/add_dataset.md). Request a dataset by opening a [Dataset request GitHub issue](https://github.com/tensorflow/datasets/issues/new?assignees=&labels=dataset+request&template=dataset-request.md&title=%5Bdata+request%5D+%3Cdataset+name%3E). And vote on the current [set of requests](https://github.com/tensorflow/datasets/labels/dataset%20request) by adding a thumbs-up reaction to the issue. #### *Disclaimers* *This is a utility library that downloads and prepares public datasets. We do* *not host or distribute these datasets, vouch for their quality or fairness, or* *claim that you have license to use the dataset. It is your responsibility to* *determine whether you have permission to use the dataset under the dataset's* *license.* *If you're a dataset owner and wish to update any part of it (description,* *citation, etc.), or do not want your dataset to be included in this* *library, please get in touch through a GitHub issue. Thanks for your* *contribution to the ML community!* *If you're interested in learning more about responsible AI practices, including* *fairness, please see Google AI's [Responsible AI Practices](https://ai.google/education/responsible-ai-practices).* *`tensorflow/datasets` is Apache 2.0 licensed. See the `LICENSE` file.*

评论收藏

内容反馈