<div align="center"><img src="https://gitlab.com/Plasticity/magnitude/raw/master/images/magnitude.png" alt="magnitude" height="50"></div>
## <div align="center">Magnitude: a fast, simple vector embedding utility library<br /><br />[![pipeline status](https://gitlab.com/Plasticity/magnitude/badges/master/pipeline.svg)](https://gitlab.com/Plasticity/magnitude/commits/master) [![Build Status](https://travis-ci.org/plasticityai/magnitude.svg?branch=master)](https://travis-ci.org/plasticityai/magnitude) [![Build status](https://ci.appveyor.com/api/projects/status/72lwh2g7a9ddbnt2/branch/master?svg=true)](https://ci.appveyor.com/project/plasticity-admin/magnitude/branch/master)<br/>[![PyPI version](https://badge.fury.io/py/pymagnitude.svg)](https://pypi.python.org/pypi/pymagnitude/) [![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://gitlab.com/Plasticity/magnitude/blob/master/LICENSE.txt)<br />[![Python version](https://img.shields.io/pypi/pyversions/pymagnitude.svg)](https://pypi.python.org/pypi/pymagnitude/) [![DOI](https://zenodo.org/badge/122715432.svg)](https://zenodo.org/badge/latestdoi/122715432)</div>
A feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by [Plasticity](https://www.plasticity.ai/). It is primarily intended to be a simpler / faster alternative to [Gensim](https://radimrehurek.com/gensim/), but can be used as a generic key-vector store for domains outside NLP.
## Table of Contents
- [Installation](#installation)
- [Motivation](#motivation)
- [Benchmarks and Features](#benchmarks-and-features)
- [Pre-converted Magnitude Formats of Popular Embeddings Models](#pre-converted-magnitude-formats-of-popular-embeddings-models)
- [Using the Library](#using-the-library)
* [Constructing a Magnitude Object](#constructing-a-magnitude-object)
* [Querying](#querying)
* [Basic Out-of-Vocabulary Keys](#basic-out-of-vocabulary-keys)
* [Advanced Out-of-Vocabulary Keys](#advanced-out-of-vocabulary-keys)
+ [Handling Misspellings and Typos](#handling-misspellings-and-typos)
* [Concatenation of Multiple Models](#concatenation-of-multiple-models)
* [Additional Featurization (Parts of Speech, etc.)](#additional-featurization-parts-of-speech-etc)
* [Using Magnitude with a ML library](#using-magnitude-with-a-ml-library)
+ [Keras](#keras)
+ [PyTorch](#pytorch)
+ [TFLearn](#tflearn)
* [Utils](#utils)
- [Concurrency and Parallelism](#concurrency-and-parallelism)
- [File Format and Converter](#file-format-and-converter)
- [Other Documentation](#other-documentation)
- [Other Languages](#other-languages)
- [Other Programming Languages](#other-programming-languages)
- [Other Domains](#other-domains)
- [Contributing](#contributing)
- [Other Notable Projects](#other-notable-projects)
- [Citing this Repository](#citing-this-repository)
- [LICENSE and Attribution](#license-and-attribution)
## Installation
You can install this package with `pip`:
```python
pip install pymagnitude # Python 2.7
pip3 install pymagnitude # Python 3
```
## Motivation
Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.
The Magnitude file format (`.magnitude`) for vector embeddings is intended to be a more efficient universal vector embedding format that allows for lazy-loading for faster cold starts in development, LRU memory caching for performance in production, multiple key queries, direct featurization to the inputs for a neural network, performant similiarity calculations, and other nice to have features for edge cases like handling out-of-vocabulary keys or misspelled keys and concatenating multiple vector models together. It also is intended to work with large vector models that may not fit in memory.
It uses [SQLite](http://www.sqlite.org), a fast, popular embedded database, as its underlying data store. It uses indexes for fast key lookups as well as uses memory mapping, SIMD instructions, and spatial indexing for fast similarity search in the vector space off-disk with good memory performance even between multiple processes. Moreover, memory maps are cached between runs so even after closing a process, speed improvements are reaped.
## Benchmarks and Features
| **Metric** | **Magnitude Light** | **Magnitude Medium** | **Magnitude Heavy** |
| ----------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------: | :------------------: | :-----------------: |
| Initial load time | **0.7210s** | ━ <sup>1</sup> | ━ <sup>1</sup> |
| Cold single key query | **0.0001s** | ━ <sup>1</sup> | ━ <sup>1</sup> |
| Warm single key query <br /><sup>*(same key as cold query)*</sup> | **0.00004s** | ━ <sup>1</sup> | ━ <sup>1</sup> |
| Cold multiple key query <br /><sup>*(n=25)*</sup> | **0.0442s** | ━ <sup>1</sup> | ━ <sup>1</sup> |
| Warm multiple key query <br /><sup>*(n=25) (same keys as cold query)*</sup> | **0.00004s** | ━ <sup>1</sup> | ━ <sup>1</sup> |
| First `most_similar` search query <br /><sup>*(n=10) (worst case)*</sup> | 247.05s | ━ <sup>1</sup> | ━ <sup>1</sup> |
| First `most_similar` search query <br /><sup>*(n=10) (average case) (w/ disk persistent cache)*</sup> | **1.8217s** | ━ <sup>1</sup> | ━ <sup>1</sup> |
| Subsequent `most_similar` search <br /><sup>*(n=10) (different key than first query)*</sup> | **0.2434s** | ━ <sup>1</sup> | ━ <sup>1</sup> |
| Warm subsequent `most_similar` search <br /><sup>*(n=10) (same key as first query)*</sup> | **0.00004s** | **0.00004s** | **0.00004s** |
| First `most_similar_approx` search query <br /><sup>*(n=10, effort=1.0) (worst case)*</sup> | N/A | N/A | **29.610s** |
| First `most_similar_approx` search query <br /><sup>*(n=10, effort=1.0) (average case) (w/ disk persistent cache)*</sup> | N/A | N/A | **0.9155s** |
| Subsequent `most_similar_approx` search <br /><sup>*(n=10, effort=1.0) (different key than first query)*</sup> | N/A | N/A | **0.1873s** |
| Subsequent `most_similar_approx` search <br /><sup>*(n=10, effort=0.1) (different key than first query)*</sup> | N/A | N/A | **0.0199
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
pymagnitude-0.1.36.tar.gz (82个子文件)
pymagnitude-0.1.36
MANIFEST.in 556B
PKG-INFO 2KB
pymagnitude.egg-info
PKG-INFO 2KB
requires.txt 82B
SOURCES.txt 3KB
top_level.txt 12B
dependency_links.txt 1B
pymagnitude
__init__.py 66KB
third_party
_pysqlite
.travis.yml 91B
src3
statement.h 2KB
prepare_protocol.c 4KB
module.h 2KB
sqlite3.h 475KB
row.h 1KB
statement.c 16KB
util.h 2KB
cursor.c 31KB
prepare_protocol.h 1KB
module.c 15KB
cache.h 2KB
__init__.py 66KB
util.c 4KB
cursor.h 2KB
connection.h 5KB
cache.c 12KB
row.c 9KB
microprotocols.h 2KB
connection.c 55KB
microprotocols.c 4KB
tox.ini 325B
test.sh 127B
src2
statement.h 2KB
prepare_protocol.c 4KB
module.h 2KB
sqlite3.h 475KB
backup.c 5KB
row.h 1KB
backup.h 1KB
statement.c 18KB
util.h 2KB
cursor.c 31KB
prepare_protocol.h 1KB
module.c 14KB
cache.h 2KB
__init__.py 66KB
util.c 5KB
cursor.h 2KB
connection.h 5KB
cache.c 13KB
sqlite_constants.h 29KB
row.c 10KB
microprotocols.h 2KB
connection.c 49KB
microprotocols.c 4KB
LICENSE 892B
CHANGES 4KB
__init__.py 66KB
setup.cfg 66B
setup.py 13KB
sqlite3.c 6.6MB
lib
dbapi2.py 3KB
__init__.py 1020B
dump.py 3KB
cross_bdist_wininst.py 14KB
README.md 267B
update_sqlite_constants.py 566B
repoze
lru
__init__.py 15KB
tests.py 28KB
__init__.py 154B
internal
__init__.py 0B
repoze_LICENSE.txt 2KB
__init__.py 0B
req_wheels
.gitkeep 0B
converter.py 18KB
setup.cfg 79B
requirements.txt 81B
pep425tags.py 11KB
setup.py 13KB
README.md 48KB
version.py 77B
LICENSE.txt 1KB
glibc.py 3KB
共 82 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 乐播投屏 5.9.02版.apk
- 数据库管理工具:dbeaver-ce-23.2.1-x86-64-setup.exe
- 高分项目,基于Unity3D开发实现的贪吃蛇游戏,内含完整源码+资源+视频教程
- 数据库管理工具:dbeaver-ce-23.1.3-macos-x86-64.dmg
- 数据库管理工具:dbeaver-ce-23.1.3-macos-aarch64.dmg
- 22数12袁溢科227401069.pptx
- 数据库管理工具:dbeaver-ce-23.1.2-macos-x86-64.dmg
- Fortran开发详解(文档)
- 数据库管理工具:dbeaver-ce-23.0.4-x86-64-setup.exe
- Rust与C/C++有何不同?
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功