# TensorFlow Similarity: Metric Learning for Humans
TensorFlow Similarity is a [TensorFLow](https://tensorflow.org) library focused
on making metric learning easy. TensorFlow similarity is still in beta version
with some features, such as semi-supervised not yet implementd.
## Introduction
Tensorflow Similarity offers state-of-the-art algorithms for metric learning and
all the needed components to research, train, evaluate and serve models
that learn from similar looking examples. With it you can quickly and easily:
- Train and serve model that allow to find similar items, such as images,
from large indexes.
- Perform semi-supervised or self-supervised training to
train/boost classification models when you have a large corpus with
few labeled examples. **Not yet available**.
### Supervised models
Metric learning objective function is different from traditional classification:
- *Supervised models* learn to output a metric embeddings (1D float tensor)
that exhibit the property that if two examples are close in the real world,
their embeddings will be close in the
projected [metric space](https://en.wikipedia.org/wiki/Metric_space).
Representing items by their metrics embeddings allow to build
indexes that contains "classes" that were not seen during training,
add classes to the index without retraining, and only requires
to have a few examples per classes both for training and retriving.
This ability to operate on few examples per class is sometime
refered as few-shot learning in the litterature.
What makes retrieving similar items from the index very efficient is that
metric learning allows to use [Approximate Nearest Neighboors Search](https://en.wikipedia.org/wiki/Nearest_neighbor_search) to perform the search on the embedding times in sublinear
time instead of using the standard [Nearest Neighboors Search](https://en.wikipedia.org/wiki/Nearest_neighbor_search) which take a quadratic time.
In practice TensorFlow Similarity built-in `Index()` by leveraging
the [NMSLIB](https://github.com/nmslib/nmslib) can find the closest items
in a fraction of second even when the index contains over 1M elements.
- **Self-supervised contrastive model** help train more accurate models by
peforming a large-scale pretraining that aim at learning a consistent
representation of the data by "contrasting" different representation of
the same example generated via data augmentation and/or contrasting the
representation of different examples to separate then. Then the model is
fine-tuned on the few labeled examples like any classification model.
**This part is still a work in progress**
Overall Tensorflow Similarity well-tested composable components
follow Keras best practices to ensure they can be seamlessly integrated
into your TensorFlow workflows and get you results faster whether you
are doing research or building innovative products.
## What's new
- August 2021 (v0.13.x): Added many new contrastives losses
including Circle Loss, PNLoss, LiftedStructure Loss and
Multisimilarity Loss.
For previous changes - see the [changelog -- Fixme](FIXME)
## Getting Started
### Installation
Use pip to install the library
```python
pip install tensorflow_similarity
```
### Documentation
The detailed and narrated notebooks are a good way to get started
with TensorFlow Similarity. There is likely to be one that is similar to
your data or your problem (if not, let us know). You can start working with
the examples immediately in Google Colab by clicking the Google colab icon.
For more information about specific functions, you can [check the API documentation -- FIXME]()
## Example: MNIST similarity
### Preparing data
```python
from tensorflow_similarity.samplers import TFDatasetMultiShotMemorySampler
spl = TFDatasetMultiShotMemorySampler(dataset_name='mnist', class_per_batch=10)
```
### Building a Similarity model
```python
from tensorflow.keras import layers
from tensorflow_similarity.layers import MetricEmbedding
from tensorflow_similarity.models import SimilarityModel
inputs = layers.Input(shape=(spl.x[0].shape))
x = layers.experimental.preprocessing.Rescaling(1/255)(inputs)
x = layers.Conv2D(32, 7, activation='relu')(x)
x = layers.MaxPool2D()(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.Flatten()(x)
x = MetricEmbedding(64)(x)
model = SimilarityModel(inputs, x)
```
### Training model via contrastive learning
```python
from tensorflow_similarity.losses import TripletLoss
# using Tripletloss to project in metric space
tloss = TripletLoss()
model.compile('adam', loss=tloss)
model.fit(sampler, epochs=5)
```
### Building images index and querying it
```python
from tensorflow_similarity.visualization import viz_neigbors_imgs
# index emneddings for fast retrivial via ANN
model.index(x=sampler.x[:100], y=sampler.y[:100], data=sampler.x[:100])
# Lookup examples nearest indexed images
nns = model.single_lookup(sampler.x[4242])
# visualize results result
viz_neigbors_imgs(sampler.x[4242], sampler.y[4242], nns)
```
## Supported Algorithms
### Supervised learning
| name | Description |
| ----------- | ----------- |
| Triplet Loss | |
| PN Loss | |
| Multi Loss | |
| Circle Loss | |
## Package components
![TensorFlow Similarity Overview](api/images/tfsim_overview.png)
TensorFlow Similiarity, as visible in the diagram above, offers the following
components to help research, train, evaluate and serve metric models:
- **`SimilarityModel()`**: This class subclasses the `tf.keras.model` class and extends it with additional properties that are useful for metric learning. For example it adds the methods:
1. `index()`: Enables indexing of the embedding
2. `lookup()`: Takes samples, calls predict(), and searches for neighbors within the index.
- **`MetricLoss()`**: This virtual class, that extends the `tf.keras.Loss` class, is the base class from which Metric losses are derived from. This subclassing ensures proper error checking, i.e., ensures the user is using a loss metric to train the models, perform better static analysis, and enforces additional constraints such as having a distance function that is supported by the index. Additionally, Metric losses make use of the fully tested and highly optimized pairwise distances functions provided by TF Similarity that are available under the `Distances.*` classes.
- **`Samplers()`**: Samplers are meant to ensure that each batch has at least n (with n >=2) examples of each class, as losses such as TripletLoss can’t work properly if this condition is not met. TF similarity offers an in-memory sampler for small dataset and a TFRecordDatasets for large scales one.
- **`Indexer()`**: The Indexer and its sub-component are meant to index known embeddings alongside their metadata. The embedding metadata is stored within `Table()`, while the `Matcher()` is used to perform [fast approximate neighboor searches](https://en.wikipedia.org/wiki/Nearest_neighbor_search) that are meant to quickly retrieve the indexed elements that are the closest to the embeddings supplied in the `lookup()` and `single_lookup()` function.
The `Evaluator()` component is used to compute EvalMetrics() on the specific index for evaluation and calibration purpose.
The default `Index()` sub-compoments run in-memory and are optimized to be used in interactive settings such as jupyter notebooks, colab, and metric computation during training (e.g using the `EvalCallback()` provided). Index are serialized as part of `model.save()` so you can reload them via `model.index_load()` for serving purpose or futher training / evaluation.
The default implementation can scale up to medium deployement (1M-10M+ points) easily provided the computers used have enough memory. For very large scale deployement you will need to sublcass the compoments to match your own architetctue. See FIXME colab to see how to deploy TF simialrity in production.
For more information about a giv
没有合适的资源?快使用搜索试试~ 我知道了~
tensorflow_similarity-0.13.6.tar.gz
需积分: 1 0 下载量 114 浏览量
2024-03-24
23:50:59
上传
评论
收藏 61KB GZ 举报
温馨提示
Python库是一组预先编写的代码模块,旨在帮助开发者实现特定的编程任务,无需从零开始编写代码。这些库可以包括各种功能,如数学运算、文件操作、数据分析和网络编程等。Python社区提供了大量的第三方库,如NumPy、Pandas和Requests,极大地丰富了Python的应用领域,从数据科学到Web开发。Python库的丰富性是Python成为最受欢迎的编程语言之一的关键原因之一。这些库不仅为初学者提供了快速入门的途径,而且为经验丰富的开发者提供了强大的工具,以高效率、高质量地完成复杂任务。例如,Matplotlib和Seaborn库在数据可视化领域内非常受欢迎,它们提供了广泛的工具和技术,可以创建高度定制化的图表和图形,帮助数据科学家和分析师在数据探索和结果展示中更有效地传达信息。
资源推荐
资源详情
资源评论
收起资源包目录
tensorflow_similarity-0.13.6.tar.gz (67个子文件)
tensorflow_similarity-0.13.6
setup.py 2KB
LICENSE 11KB
PKG-INFO 650B
tests
__init__.py 107B
samplers
__init__.py 0B
test_memory_samplers.py 3KB
test_tfdataset_samplers.py 860B
test_distance_metrics.py 3KB
evaluators
__init__.py 0B
test_memory_evaluator.py 5KB
test_distances.py 3KB
test_model.py 1KB
test_losses.py 4KB
test_algebra.py 3KB
matchers
__init__.py 0B
test_nmslib_matcher.py 2KB
test_indexer.py 5KB
test_callbacks.py 2KB
test_metrics.py 4KB
tables
__init__.py 0B
test_memory_table.py 2KB
conftest.py 135B
tensorflow_similarity.egg-info
SOURCES.txt 2KB
top_level.txt 28B
PKG-INFO 650B
requires.txt 234B
dependency_links.txt 1B
tensorflow_similarity
utils.py 565B
__init__.py 23B
architectures
__init__.py 49B
efficientnet.py 4KB
samplers
utils.py 2KB
__init__.py 299B
samplers.py 5KB
tfrecords_samplers.py 5KB
memory_samplers.py 8KB
tfdataset_samplers.py 5KB
metrics.py 15KB
evaluators
evaluator.py 3KB
__init__.py 102B
memory_evaluator.py 10KB
layers.py 771B
algebra.py 3KB
api
__init__.py 526B
visualization.py 3KB
matchers
__init__.py 86B
matcher.py 3KB
nmslib_matcher.py 5KB
indexer.py 23KB
types.py 4KB
losses
utils.py 6KB
__init__.py 225B
metric_loss.py 3KB
circle_loss.py 7KB
triplet_loss.py 7KB
pn_loss.py 8KB
multisim_loss.py 8KB
models
__init__.py 54B
similarity_model.py 21KB
callbacks.py 6KB
tables
__init__.py 79B
table.py 3KB
memory_table.py 6KB
distances.py 8KB
distance_metrics.py 6KB
setup.cfg 38B
README.md 8KB
共 67 条
- 1
资源评论
程序员Chino的日记
- 粉丝: 3743
- 资源: 5万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于java的数字科技风险报告管理系统设计与实现.docx
- 基于java的老年医疗保健网站设计与实现.docx
- 基于java的山西文旅网设计与实现.docx
- 基于java的智慧旅游系统设计与实现.docx
- 基于java的新闻发布管理系统设计与实现.docx
- 基于java的智慧农业专家远程指导系统设计与实现.docx
- 这是一个pycharm
- 分布式grade:IDL-DataReader
- 音乐指标数据集(年份,声学特性,时长,器乐特性,响度,语音特性,节拍速度),音乐Spotify 数据集 1921-2020,160k+ 数据(超过16万首曲目的音频特征和超过100万艺术家的人气指标)
- python-3.7.8-amd64.exe安装包
- 1999-2023年上市公司员工学历、工资数据.xlsx
- 非标自动化塑料件产线sw18全套技术资料100%好用.zip
- ThinkPHP6从入门到实战API开发中文pdf版最新版本
- 恒压供水全套图纸程序 西门子200samrt +ABB ACS510变频器 采用变频器自带PID控制或者plc内部PID,多种控制方式 跟传统编程逻辑不同,此程序采用的思路如下 1·泵数量选择,只要在
- 埋弧焊机数字化控制的研究
- 脉搏检测系统设计及其信号处理算法研究
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功