# TensorFlow Similarity: Metric Learning for Humans
TensorFlow Similarity is a [TensorFLow](https://tensorflow.org) library for [similarity learning](https://en.wikipedia.org/wiki/Similarity_learning) also known as metric learning and contrastive learning.
TensorFlow Similarity is still in beta.
## Introduction
Tensorflow Similarity offers state-of-the-art algorithms for metric learning and all the necessary components to research, train, evaluate, and serve similarity-based models.
![Example of nearest neighbors search performed on the embedding generated by a similarity model trained on the Oxford IIIT Pet Dataset.](assets/images/similar-cats-and-dogs.jpg)
With TensorFlow Similarity you can train and serve models that find similar items (such as images) in a large corpus of examples. For example, as visible above, you can train a similarity model to find and cluster similar looking images of cats and dogs from the [Oxford IIIT Pet Dataset](https://www.tensorflow.org/datasets/catalog/oxford_iiit_pet) by only training on a few classes. To train your own similarity model see this [notebook](examples/supervised_visualization.ipynb).
Metric learning is different from traditional classification as it's objective is different. The model learns to minimize the distance between similar examples and maximize the distance between dissimilar examples, in a supervised or self-supervised fashion. Either way, TensorFlow Similarity provides the necessary losses, metrics, samplers, visualizers, and indexing sub-system to make this quick and easy.
**Currently, TensorFlow Similarity supports supervised training.** In future releases, it will support semi-supervised and self-supervised training.
To learn more about the benefits of using similarity training, you can check out the blog post.
## What's new
- [Aug 21]: Interactive embedding `projector()` added. See this [notebook](examples/supervised_visualization.ipynb)
- [Aug 21]: [`CircleLoss()`](api/TFSimilarity/losses/CircleLoss.md) added
- [Aug 21]: [`PNLoss()`](api/TFSimilarity/losses/PNLoss.md) added.
- [Aug 21]: [`MultiSimilarityLoss()`](api/TFSimilarity/losses/MultiSimilarityLoss.md) added.
For previous changes - see [the release changelog](./releases.md)
## Getting Started
### Installation
Use pip to install the library
```python
pip install tensorflow_similarity
```
### Documentation
The detailed and narrated [notebooks](examples/) are a good way to get started with TensorFlow Similarity. There is likely to be one that is similar to your data or your problem (if not, let us know). You can start working with the examples immediately in Google Colab by clicking the Google Colab icon.
For more information about specific functions, you can [check the API documentation](api/)
For contributing to the project please check out the [contribution guidelines](CONTRIBUTING.md)
### Minimal Example: MNIST similarity
Here is a bare bones example demonstrating how to train a TensorFlow Similarity model on the MNIST data. This example illustrates some of the main components provided by TensorFlow Similarity and how they fit together. Please refer to the [hello_world notebook](examples/supervised_hello_world.ipynb) for a more detailed introduction.
### Preparing data
TensorFlow Similarity provides [data samplers](api/TFSimilarity/samplers/), for various dataset types, that balance the batches to ensure smoother training.
In this example, we are using the multi-shot sampler that integrate directly from the TensorFlow dataset catalog.
```python
from tensorflow_similarity.samplers import TFDatasetMultiShotMemorySampler
# Data sampler that generates balanced batches from MNIST dataset
sampler = TFDatasetMultiShotMemorySampler(dataset_name='mnist', classes_per_batch=10)
```
### Building a Similarity model
Building a TensorFlow Similarity model is similar to building a standard Keras model, except the output layer is usually a [`MetricEmbedding()`](api/TFSimilarity/layers/) layer that enforces L2 normalization and the model is instantiated as a specialized subclass [`SimilarityModel()`](api/TFSimilarity/models/SimilarityModel.md) that supports additional functionality.
```python
from tensorflow.keras import layers
from tensorflow_similarity.layers import MetricEmbedding
from tensorflow_similarity.models import SimilarityModel
# Build a Similarity model using standard Keras layers
inputs = layers.Input(shape=(28, 28, 1))
x = layers.experimental.preprocessing.Rescaling(1/255)(inputs)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
outputs = MetricEmbedding(64)(x)
# Build a specialized Similarity model
model = SimilarityModel(inputs, outputs)
```
### Training model via contrastive learning
To output a metric embedding, that are searchable via approximate nearest neighbor search, the model needs to be trained using a similarity loss. Here we are using the `MultiSimilarityLoss()`, which is one of the most efficient loss functions.
```python
from tensorflow_similarity.losses import MultiSimilarityLoss
# Train Similarity model using contrastive loss
model.compile('adam', loss=MultiSimilarityLoss())
model.fit(sampler, epochs=5)
```
### Building images index and querying it
Once the model is trained, reference examples must indexed via the model index API to be searchable. After indexing, you can use the model lookup API to search the index for the K most similar items.
```python
from tensorflow_similarity.visualization import viz_neigbors_imgs
# Index 100 embedded MNIST examples to make them searchable
sx, sy = sampler.get_slice(0,100)
model.index(x=sx, y=sy, data=sx)
# Find the top 5 most similar indexed MNIST examples for a given example
qx, qy = sampler.get_slice(3713, 1)
nns = model.single_lookup(qx[0])
# Visualize the query example and its top 5 neighbors
viz_neigbors_imgs(qx[0], qy[0], nns)
```
## Supported Algorithms
### Supervised Losses
- Triplet Loss
- PN Loss
- Multi Sim Loss
- Circle Loss
### Metrics
Tensorflow Similarity offers many of the most common metrics used for [classification](api/TFSimilarity/classification_metrics/) and [retrieval](api/TFSimilarity/retrieval_metrics/) evaluation. Including:
| Name | Type | Description |
| ---- | ---- | ----------- |
| Precision | Classification | |
| Recall | Classification | |
| F1 Score | Classification | |
| Recall@K | Retrieval | |
| Binary NDCG | Retrieval | |
## Citing
Please cite this reference if you use any part of TensorFlow similarity in your research:
```bibtex
@article{EBSIM21,
title={TensorFlow Similarity: A Usuable, High-Performance Metric Learning Library},
author={Elie Bursztein, James Long, Shun Lin, Owen Vallis, Francois Chollet},
journal={Fixme},
year={2021}
}
```
## Disclaimer
This is not an official Google product.
没有合适的资源?快使用搜索试试~ 我知道了~
tensorflow_similarity-0.13.23.tar.gz
需积分: 1 0 下载量 191 浏览量
2024-03-24
23:51:06
上传
评论
收藏 76KB GZ 举报
温馨提示
Python库是一组预先编写的代码模块,旨在帮助开发者实现特定的编程任务,无需从零开始编写代码。这些库可以包括各种功能,如数学运算、文件操作、数据分析和网络编程等。Python社区提供了大量的第三方库,如NumPy、Pandas和Requests,极大地丰富了Python的应用领域,从数据科学到Web开发。Python库的丰富性是Python成为最受欢迎的编程语言之一的关键原因之一。这些库不仅为初学者提供了快速入门的途径,而且为经验丰富的开发者提供了强大的工具,以高效率、高质量地完成复杂任务。例如,Matplotlib和Seaborn库在数据可视化领域内非常受欢迎,它们提供了广泛的工具和技术,可以创建高度定制化的图表和图形,帮助数据科学家和分析师在数据探索和结果展示中更有效地传达信息。
资源推荐
资源详情
资源评论
收起资源包目录
tensorflow_similarity-0.13.23.tar.gz (98个子文件)
tensorflow_similarity-0.13.23
setup.py 2KB
LICENSE 11KB
PKG-INFO 651B
tests
__init__.py 107B
samplers
__init__.py 0B
test_memory_samplers.py 3KB
test_tfdataset_samplers.py 866B
evaluators
__init__.py 0B
test_memory_evaluator.py 4KB
test_distances.py 3KB
training_metrics
__init__.py 0B
test_distance_metrics.py 3KB
test_model.py 986B
test_losses.py 4KB
test_algebra.py 3KB
search
__init__.py 0B
test_nmslib_search.py 2KB
matchers
__init__.py 0B
test_match_nearest.py 630B
test_majority_vote.py 686B
test_classification_match.py 6KB
test_indexer.py 5KB
test_callbacks.py 2KB
stores
__init__.py 0B
test_memory_store.py 2KB
classification_metrics
__init__.py 0B
test_classification_metrics.py 5KB
conftest.py 175B
tensorflow_similarity.egg-info
SOURCES.txt 4KB
top_level.txt 28B
PKG-INFO 651B
requires.txt 270B
dependency_links.txt 1B
tensorflow_similarity
utils.py 2KB
__init__.py 611B
architectures
__init__.py 721B
efficientnet.py 5KB
samplers
utils.py 3KB
__init__.py 2KB
samplers.py 6KB
tfrecords_samplers.py 5KB
memory_samplers.py 10KB
tfdataset_samplers.py 6KB
evaluators
evaluator.py 6KB
__init__.py 2KB
memory_evaluator.py 14KB
layers.py 2KB
algebra.py 4KB
api
__init__.py 4KB
visualization
__init__.py 792B
neighbors_viz.py 2KB
projector.py 6KB
confusion_matrix.py 3KB
training_metrics
utils.py 2KB
__init__.py 1KB
distance_metrics.py 6KB
search
__init__.py 2KB
nmslib_search.py 6KB
search.py 3KB
matchers
utils.py 2KB
__init__.py 973B
match_majority_vote.py 3KB
match_nearest.py 2KB
classification_match.py 12KB
indexer.py 29KB
types.py 4KB
losses
utils.py 6KB
__init__.py 862B
metric_loss.py 3KB
circle_loss.py 7KB
triplet_loss.py 7KB
pn_loss.py 8KB
multisim_loss.py 8KB
stores
__init__.py 1KB
store.py 4KB
memory_store.py 6KB
models
__init__.py 793B
similarity_model.py 26KB
retrieval_metrics
retrieval_metric.py 3KB
utils.py 1KB
__init__.py 1016B
recall_at_k.py 3KB
map_at_k.py 5KB
bndcg.py 4KB
precision_at_k.py 4KB
callbacks.py 12KB
distances.py 9KB
classification_metrics
utils.py 2KB
__init__.py 1KB
binary_accuracy.py 3KB
false_positive_rate.py 3KB
negative_predictive_value.py 3KB
classification_metric.py 3KB
recall.py 2KB
precision.py 3KB
f1_score.py 3KB
setup.cfg 38B
README.md 7KB
共 98 条
- 1
资源评论
程序员Chino的日记
- 粉丝: 3689
- 资源: 5万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功