# Contextual Encoders
[![code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![license](https://img.shields.io/badge/license-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
![Python: >= 3.7](https://img.shields.io/badge/python-^3.7-blue)
[![Documentation Status](https://readthedocs.org/projects/contextual-encoders/badge/?version=latest)](https://contextual-encoders.readthedocs.io/en/latest/?badge=latest)
[![Python Tests](https://github.com/StuttgarterDotNet/contextual-encoders/actions/workflows/python.yml/badge.svg?branch=main)](https://github.com/StuttgarterDotNet/contextual-encoders/actions/workflows/python.yml)
Contextual Encoders is a library of [scikit-learn](https://scikit-learn.org/stable) compatible contextual variable encoders.
The documentation can be found here: [ReadTheDocs](https://contextual-encoders.readthedocs.io).
This package uses Poetry ([documentation](https://python-poetry.org/docs/)).
## What are contextual variables?
Contextual variables are numerical or categorical variables, that underlie a certain context or relationship.
Examples are the days of the week, that have a hidden graph structure:
<p align="center">
<img src="https://raw.githubusercontent.com/StuttgarterDotNet/contextual-encoders/main/docs/_static/weekdays.svg" alt="">
</p>
When encoding these categorical variables with a simple encoding strategy such as <em>One-Hot-Encoding</em>, the hidden structure will be neglected.
However, when the context can be specified, this additional information can be put it into the learning procedure to increase the performance of the learning model.
This is, where Contextual Encoders come into place.
## Principle
The step of encoding contextual variables is split up into four sub-steps:
1) Define the context
2) Define the measure
3) Calculate the (dis-) similarity matrix
4) Map the distance matrix to euclidean vectors
Setp 4. is optional and depends on the ML technique that uses the encoding.
For example, [Agglomerative Clustering](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html)
techniques do not require euclidean vectors, they can use a dissimilarity matrix directly.
## Basic Usage
The code below demonstrates the basic usage of the library.
Here, a simple dataset with 10 features is used.
```python
from contextual_encoders import ContextualEncoder, GraphContext, PathLengthMeasure
import numpy as np
# Create a sample dataset
x = np.array(["Fri", "Tue", "Fri", "Sat", "Mon", "Tue", "Wed", "Tue", "Fri", "Fri"])
# Step 1: Define the context
day = GraphContext("day")
day.add_concept("Mon", "Tue")
day.add_concept("Tue", "Wed")
day.add_concept("Wed", "Thur")
day.add_concept("Thur", "Fri")
day.add_concept("Fri", "Sat")
day.add_concept("Sat", "Sun")
day.add_concept("Sun", "Mon")
# Step 2: Define the measure
day_measure = PathLengthMeasure(day)
# Step 3+4: Calculate (Dis-) similarity Matrix
# and map to euclidean vectors
encoder = ContextualEncoder(day_measure)
encoded_data = encoder.fit_transform(x)
similarity_matrix = encoder.get_similarity_matrix()
dissimilarity_matrix = encoder.get_dissimilarity_matrix()
```
The output of the code is visualized below.
The graph-based structure can be clearly seen when the euclidean data points are plotted.
Note, that only five points can be seen, because the days "Thur" and "Sun" are missing in the dataset.
Similarity Matrix | Dissimilarity Matrix | Euclidean Data Points
:-------------------------:|:-------------------------:|:-------------------------:
![](https://github.com/StuttgarterDotNet/contextual-encoders/blob/main/docs/_static/readme_example_similarity_matrix.png?raw=true) | ![](https://github.com/StuttgarterDotNet/contextual-encoders/blob/main/docs/_static/readme_example_dissimilarity_matrix.png?raw=true) | ![](https://github.com/StuttgarterDotNet/contextual-encoders/blob/main/docs/_static/readme_example_euclidean_data_points.png?raw=true)
More complicated examples can be found in the [documentation](https://contextual-encoders.readthedocs.io/en/latest/examples.html).
## Notice
The [Preprocessing](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing) module from scikit-learn offers multiple encoders for categorical variables.
These encoders use simple techniques to encode categorical variables into numerical variables.
Additionally, the [Category Encoders](http://contrib.scikit-learn.org/category_encoders) package offers more sophisticated encoders for the same purpose.
This package is meant to be used as an extension to the previous two packages in the cases, when the context of a numerical or categorical variable can be specified.
This project is currently in the developer stage.
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
资源分类:Python库 所属语言:Python 资源全名:contextual-encoders-0.1.0.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
contextual-encoders-0.1.0.tar.gz (16个子文件)
contextual-encoders-0.1.0
PKG-INFO 6KB
pyproject.toml 2KB
contextual_encoders
encoder.py 5KB
reducer.py 1KB
google_comparer.py 2KB
computer.py 2KB
inverter.py 2KB
__init__.py 1KB
context.py 5KB
data_utils.py 1KB
measure.py 5KB
gatherer.py 5KB
aggregator.py 5KB
LICENSE 11KB
setup.py 6KB
README.md 5KB
共 16 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 14w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 包含约100万条由BELLE项目生成的中文指令数据
- BIP集成NC65预算
- 包含约50万条由BELLE项目生成的中文指令数据
- 完整的交叉编译好支持xcb的qt库(qt5.15.2、arm64、xcb、no-opengl)
- 包含约40万条由BELLE项目生成的个性化角色对话数据,包含角色介绍
- YOLOv8 使用 TensorRT 加速!.zip
- YOLOv8 使用 DeepSORT 对象跟踪进行分割(ID + 轨迹).zip
- YOLOv5系列多主干(TPH-YOLOv5、Ghostnet、ShuffleNetv2、Mobilenetv3Small、EfficientNetLite、PP-LCNet、SwinTran.zip
- STM32小实验:使用双轴摇杆控制舵机云台
- Yolov5+SlowFast基于PytorchVideo的实时动作检测.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功