# TensorFlow Similarity: Metric Learning for Humans
TensorFlow Similarity is a [TensorFLow](https://tensorflow.org) library focused
on making metric learning easy. TensorFlow similarity is still in beta version
with some features, such as semi-supervised not yet implementd.
## Introduction
Tensorflow Similarity offers state-of-the-art algorithms for metric learning and
all the needed components to research, train, evaluate and serve models
that learn from similar looking examples. With it you can quickly and easily:
- Train and serve model that allow to find similar items, such as images,
from large indexes.
- Perform semi-supervised or self-supervised training to
train/boost classification models when you have a large corpus with
few labeled examples. **Not yet available**.
### Supervised models
Metric learning objective function is different from traditional classification:
- *Supervised models* learn to output a metric embeddings (1D float tensor)
that exhibit the property that if two examples are close in the real world,
their embeddings will be close in the
projected [metric space](https://en.wikipedia.org/wiki/Metric_space).
Representing items by their metrics embeddings allow to build
indexes that contains "classes" that were not seen during training,
add classes to the index without retraining, and only requires
to have a few examples per classes both for training and retriving.
This ability to operate on few examples per class is sometime
refered as few-shot learning in the litterature.
What makes retrieving similar items from the index very efficient is that
metric learning allows to use [Approximate Nearest Neighboors Search](https://en.wikipedia.org/wiki/Nearest_neighbor_search) to perform the search on the embedding times in sublinear
time instead of using the standard [Nearest Neighboors Search](https://en.wikipedia.org/wiki/Nearest_neighbor_search) which take a quadratic time.
In practice TensorFlow Similarity built-in `Index()` by leveraging
the [NMSLIB](https://github.com/nmslib/nmslib) can find the closest items
in a fraction of second even when the index contains over 1M elements.
- **Self-supervised contrastive model** help train more accurate models by
peforming a large-scale pretraining that aim at learning a consistent
representation of the data by "contrasting" different representation of
the same example generated via data augmentation and/or contrasting the
representation of different examples to separate then. Then the model is
fine-tuned on the few labeled examples like any classification model.
**This part is still a work in progress**
Overall Tensorflow Similarity well-tested composable components
follow Keras best practices to ensure they can be seamlessly integrated
into your TensorFlow workflows and get you results faster whether you
are doing research or building innovative products.
## What's new
- August 2021 (v0.13.x): Added many new contrastives losses
including Circle Loss, PNLoss, LiftedStructure Loss and
Multisimilarity Loss.
For previous changes - see the [changelog -- Fixme](FIXME)
## Getting Started
### Installation
Use pip to install the library
```python
pip install tensorflow_similarity
```
### Documentation
The detailed and narrated notebooks are a good way to get started
with TensorFlow Similarity. There is likely to be one that is similar to
your data or your problem (if not, let us know). You can start working with
the examples immediately in Google Colab by clicking the Google colab icon.
For more information about specific functions, you can [check the API documentation -- FIXME]()
## Example: MNIST similarity
### Preparing data
```python
from tensorflow_similarity.samplers import TFDatasetMultiShotMemorySampler
spl = TFDatasetMultiShotMemorySampler(dataset_name='mnist', class_per_batch=10)
```
### Building a Similarity model
```python
from tensorflow.keras import layers
from tensorflow_similarity.layers import MetricEmbedding
from tensorflow_similarity.models import SimilarityModel
inputs = layers.Input(shape=(spl.x[0].shape))
x = layers.experimental.preprocessing.Rescaling(1/255)(inputs)
x = layers.Conv2D(32, 7, activation='relu')(x)
x = layers.MaxPool2D()(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.Flatten()(x)
x = MetricEmbedding(64)(x)
model = SimilarityModel(inputs, x)
```
### Training model via contrastive learning
```python
from tensorflow_similarity.losses import TripletLoss
# using Tripletloss to project in metric space
tloss = TripletLoss()
model.compile('adam', loss=tloss)
model.fit(sampler, epochs=5)
```
### Building images index and querying it
```python
from tensorflow_similarity.visualization import viz_neigbors_imgs
# index emneddings for fast retrivial via ANN
model.index(x=sampler.x[:100], y=sampler.y[:100], data=sampler.x[:100])
# Lookup examples nearest indexed images
nns = model.single_lookup(sampler.x[4242])
# visualize results result
viz_neigbors_imgs(sampler.x[4242], sampler.y[4242], nns)
```
## Supported Algorithms
### Supervised learning
| name | Description |
| ----------- | ----------- |
| Triplet Loss | |
| PN Loss | |
| Multi Loss | |
| Circle Loss | |
## Package components
![TensorFlow Similarity Overview](api/images/tfsim_overview.png)
TensorFlow Similiarity, as visible in the diagram above, offers the following
components to help research, train, evaluate and serve metric models:
- **`SimilarityModel()`**: This class subclasses the `tf.keras.model` class and extends it with additional properties that are useful for metric learning. For example it adds the methods:
1. `index()`: Enables indexing of the embedding
2. `lookup()`: Takes samples, calls predict(), and searches for neighbors within the index.
- **`MetricLoss()`**: This virtual class, that extends the `tf.keras.Loss` class, is the base class from which Metric losses are derived from. This subclassing ensures proper error checking, i.e., ensures the user is using a loss metric to train the models, perform better static analysis, and enforces additional constraints such as having a distance function that is supported by the index. Additionally, Metric losses make use of the fully tested and highly optimized pairwise distances functions provided by TF Similarity that are available under the `Distances.*` classes.
- **`Samplers()`**: Samplers are meant to ensure that each batch has at least n (with n >=2) examples of each class, as losses such as TripletLoss can’t work properly if this condition is not met. TF similarity offers an in-memory sampler for small dataset and a TFRecordDatasets for large scales one.
- **`Indexer()`**: The Indexer and its sub-component are meant to index known embeddings alongside their metadata. The embedding metadata is stored within `Table()`, while the `Matcher()` is used to perform [fast approximate neighboor searches](https://en.wikipedia.org/wiki/Nearest_neighbor_search) that are meant to quickly retrieve the indexed elements that are the closest to the embeddings supplied in the `lookup()` and `single_lookup()` function.
The `Evaluator()` component is used to compute EvalMetrics() on the specific index for evaluation and calibration purpose.
The default `Index()` sub-compoments run in-memory and are optimized to be used in interactive settings such as jupyter notebooks, colab, and metric computation during training (e.g using the `EvalCallback()` provided). Index are serialized as part of `model.save()` so you can reload them via `model.index_load()` for serving purpose or futher training / evaluation.
The default implementation can scale up to medium deployement (1M-10M+ points) easily provided the computers used have enough memory. For very large scale deployement you will need to sublcass the compoments to match your own architetctue. See FIXME colab to see how to deploy TF simialrity in production.
For more information about a giv
程序员Chino的日记
- 粉丝: 3731
- 资源: 5万+
最新资源
- springboot-JavaWeb图书管理系统(编号:29027118)(1).zip
- 软包动力锂电池高效率真空注液封装机step全套技术资料100%好用.zip
- 三台松下的PLC一起通信控制16轴的程序,表格定位,用于固态硬盘的组装,精密度要求高,手动,自动、报景、空机运行等,程序写法新颖,清晰明了,注释清晰易懂,是学习多台PLC并联和定位控制非常好的栗子
- 基于Spring Boot的企业客源关系管理系统的设计与实现(编号:1778968).zip
- 松下FP-XH用pro7写的包膜机,给某为做的,已经出机 轴,结构化编程 成序动作虽然复杂,但条例清楚,可读性强,手动,点动,产量,自动分块编写,思路清晰,是敩习松下pro7结构化编程和定位模块的最
- 分布式光伏接入电网simulink仿真模型 光伏电池板并网matlab2014版本 simulink仿真模型 输入光伏电池板 boost升压电路采用mppt控制策略 控制直流输出电压为600伏 加入
- Ajax.NET Professional 入门套件
- Ajaxna - C# .NET & Javascript API框架
- 双极性SPWM控制单相全桥逆变电路仿真模型,电压电流双闭环控制 直流输入电压范围在10-40v左右,输出交流峰值在正负10-40v,频率1-200hz可调
- Akisi 是一个基于 .Net Framework 4.5.1 并使用 MVC 设计模式的简单博客平台。配置和使用它应该简单、快速和容易。.zip
- AmiBroker .NET 开发工具包。
- AlphaFS - 为 .NET 提供高级 Windows 文件系统支持
- anito.NET - 对象关系映射框架
- 视觉图像处理,2D 或 3D 实时视频或来自摄像头、网络摄像头或扫描仪的静止图像的视觉图像处理.......zip
- 动画师2D,动画师2D,动画师2D
- 闹钟,这是一个适用于 Windows 的简单免费开源 闹钟,MIT 许可闹钟。它不到 100 行代码。用 Visual Studio C# 2010 EE 编写。.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈