# clustimage
[![Python](https://img.shields.io/pypi/pyversions/clustimage)](https://img.shields.io/pypi/pyversions/clustimage)
[![PyPI Version](https://img.shields.io/pypi/v/clustimage)](https://pypi.org/project/clustimage/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/erdogant/clustimage/blob/master/LICENSE)
[![Github Forks](https://img.shields.io/github/forks/erdogant/clustimage.svg)](https://github.com/erdogant/clustimage/network)
[![GitHub Open Issues](https://img.shields.io/github/issues/erdogant/clustimage.svg)](https://github.com/erdogant/clustimage/issues)
[![Project Status](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
[![Sphinx](https://img.shields.io/badge/Sphinx-Docs-blue)](https://erdogant.github.io/clustimage/)
[![Downloads](https://pepy.tech/badge/clustimage/month)](https://pepy.tech/project/clustimage/month)
[![Downloads](https://pepy.tech/badge/clustimage)](https://pepy.tech/project/clustimage)
[![BuyMeCoffee](https://img.shields.io/badge/buymea-coffee-yellow.svg)](https://www.buymeacoffee.com/erdogant)
[![DOI](https://zenodo.org/badge/423822054.svg)](https://zenodo.org/badge/latestdoi/423822054)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/erdogant/clustimage/blob/master/notebooks/clustimage.ipynb)
[![Medium](https://img.shields.io/badge/Medium-Blog-blue)](https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128)
<!---[![Coffee](https://img.shields.io/badge/coffee-black-grey.svg)](https://erdogant.github.io/donate/?currency=USD&amount=5)-->
* [**A step-by-step guide for clustering images**](https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128)
The aim of ``clustimage`` is to detect natural groups or clusters of images.
Image recognition is a computer vision task for identifying and verifying objects/persons on a photograph.
We can seperate the image recognition task into the two broad tasks, namely the supervised and unsupervised task.
In case of the supervised task, we have to classify an image into a fixed number of learned categories. Most packages rely on (deep) neural networks, and try solve the problem of predicting "whats on the image".
In case of the unsupervised task, we do not depend on the fact that training data is required but we can interpret the input data and find natural groups or clusters.
However, it can be quit a breath to carefully group similar images in an unsupervised manner, or simply identify the unique images.
The aim of ``clustimage`` is to detect natural groups or clusters of images. It works using a multi-step proces of carefully pre-processing the images, extracting the features, and evaluating the optimal number of clusters across the feature space.
The optimal number of clusters can be determined using well known methods suchs as *silhouette, dbindex, and derivatives* in combination with clustering methods, such as *agglomerative, kmeans, dbscan and hdbscan*.
With ``clustimage`` we aim to determine the most robust clustering by efficiently searching across the parameter and evaluation the clusters.
Besides clustering of images, the ``clustimage`` model can also be used to find the most similar images for a new unseen sample.
A schematic overview is as following:
<p align="center">
<img src="https://github.com/erdogant/clustimage/blob/main/docs/figs/schematic_overview.png" width="1000" />
</p>
``clustimage`` overcomess the following challenges:
* 1. Robustly groups similar images.
* 2. Returns the unique images.
* 3. Finds higly similar images for a given input image.
``clustimage`` is fun because:
* It does not require a learning proces.
* It can group any set of images.
* It can return only the unique() images.
* it can find highly similar images given an input image.
* It provided many plots to improve understanding of the feature-space and sample-sample relationships
* It is build on core statistics, such as PCA, HOG and many more, and therefore it does not has a dependency block.
* It works out of the box.
### Installation
* Install clustimage from PyPI (recommended). clustimage is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
* A new environment can be created as following:
```bash
conda create -n env_clustimage python=3.8
conda activate env_clustimage
```
* Install from pypi
```bash
pip install -U clustimage
```
#### Import the clustimage package
```python
from clustimage import Clustimage
```
### Example 1: Digit images.
In this example we will be using a flattened grayscale image array loaded from sklearn.
The array in NxM, where N are the samples and M the flattened raw rgb/gray image.
```python
# Load library
import matplotlib.pyplot as plt
from clustimage import Clustimage
# init
cl = Clustimage()
# Load example digit data
X = cl.import_example(data='mnist')
print(X)
# Each row is an image that can be plotted after reshaping:
plt.imshow(X[0,:].reshape(8,8), cmap='binary')
# array([[ 0., 0., 5., ..., 0., 0., 0.],
# [ 0., 0., 0., ..., 10., 0., 0.],
# [ 0., 0., 0., ..., 16., 9., 0.],
# ...,
# [ 0., 0., 0., ..., 9., 0., 0.],
# [ 0., 0., 0., ..., 4., 0., 0.],
# [ 0., 0., 6., ..., 6., 0., 0.]])
#
# Preprocessing and feature extraction
results = cl.fit_transform(X)
# Lets examine the results.
print(results.keys())
# ['feat', 'xycoord', 'pathnames', 'filenames', 'labels']
#
# feat : Extracted features
# xycoord : Coordinates of samples in the embedded space.
# filenames : Name of the files
# pathnames : Absolute location of the files
# labels : Cluster labels in the same order as the input
# Get the unique images
unique_samples = cl.unique()
#
print(unique_samples.keys())
# ['labels', 'idx', 'xycoord_center', 'pathnames']
#
# Collect the unique images from the input
X[unique_samples['idx'],:]
```
##### Plot the unique images.
```python
cl.plot_unique()
```
<p align="center">
<img src="https://github.com/erdogant/clustimage/blob/main/docs/figs/digits_unique.png" width="300" />
</p>
##### Scatter samples based on the embedded space.
```python
# The scatterplot that is coloured on the clusterlabels. The clusterlabels should match the unique labels.
# Cluster 1 contains digit 4
# Cluster 5 contains digit 2
# etc
#
# No images in scatterplot
cl.scatter(zoom=None)
# Include images scatterplot
cl.scatter(zoom=4)
cl.scatter(zoom=8, plt_all=True, figsize=(150,100))
```
<p align="center">
<img src="https://github.com/erdogant/clustimage/blob/main/docs/figs/digits_fig2_tsne.png" width="400" />
<img src="https://github.com/erdogant/clustimage/blob/main/docs/figs/digits_fig21_tsne.png" width="400" />
</p>
<p align="center">
<img src="https://github.com/erdogant/clustimage/blob/main/docs/figs/scatter_mnist_all.png" width="400" />
</p>
#### Plot the clustered images
```python
# Plot all images per cluster
cl.plot(cmap='binary')
# Plot the images in a specific cluster
cl.plot(cmap='binary', labels=[1,5])
```
<p align="center">
<img src="https://github.com/erdogant/clustimage/blob/main/docs/figs/digits_cluster1.png" width="400" />
<img src="https://github.com/erdogant/clustimage/blob/main/docs/figs/digits_cluster5.png" width="400" />
</p>
#### Dendrogram
```python
# The dendrogram is based on the high-dimensional feature space.
cl.dendrogram()
```
<p align="center">
<img src="https://github.com/erdogant/clustimage/blob/main/docs/figs/digits_dendrogram.png" width="400" />
</p>
#### Make various other plots
```python
# Plot the explained variance
cl.pca.plot()
# Make scatter plot of PC1 vs PC2
cl.pca.scatter(legen
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
共17个文件
py:7个
txt:4个
pkg-info:2个
资源分类:Python库 所属语言:Python 资源全名:clustimage-1.3.11.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
clustimage-1.3.11.tar.gz (17个子文件)
clustimage-1.3.11
MANIFEST.in 60B
PKG-INFO 16KB
clustimage
tests
test_clustimage.py 7KB
__init__.py 0B
__init__.py 2KB
examples.py 10KB
clustimage.py 91KB
utils
__init__.py 0B
clustimage.egg-info
PKG-INFO 16KB
requires.txt 130B
SOURCES.txt 372B
top_level.txt 11B
dependency_links.txt 1B
LICENSE 1KB
setup.cfg 42B
setup.py 1KB
README.md 16KB
共 17 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 谷歌浏览器自动化测试版113.0.5672.0(包含linux,windows32/64,mac三个版本,不会自动更新)
- uniapp中tab切换,底部内容跟着移动,相反,底部移动,tab也跟着切换-组件
- 基于JS+TS实现跨平台3D相机控制器-附项目源码-优质项目分享.zip
- 跨相机-基于Rust实现的跨平台相机捕获-附项目源码-优质项目分享.zip
- odise 14离线安装包 大众斯柯达奥迪 5054 6153
- 网页设计期末作业-纯html加css+少量js-盗墓笔记旅游导航网站.rar
- 算法笔记模拟退火.rar
- MATLAB大数据仿真案例-蚁群算法(ACO)用于求解旅行商(TSP)问题.rar
- 基于yolov5的吸烟行为检测源码+模型.zip
- MySQL基础知识-个人笔记.rar
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功