[![CircleCI](https://circleci.com/gh/tomgrek/zincbase.svg?style=svg)](https://circleci.com/gh/tomgrek/zincbase)
[![DOI](https://zenodo.org/badge/183831265.svg)](https://zenodo.org/badge/latestdoi/183831265)
[![Documentation Status](https://readthedocs.org/projects/zincbase/badge/?version=latest)](https://zincbase.readthedocs.io/en/latest/?badge=latest)
<img src="https://user-images.githubusercontent.com/2245347/57199440-c45daf00-6f33-11e9-91df-1a6a9cae6fb7.png" width="140" alt="Zincbase logo">
Zincbase is a batteries-included kit for building knowledge bases. It exists to do the following:
* Extract facts (aka triples and rules) from unstructured data/text
* Store and retrieve those facts efficiently
* Build them into a graph
* Provide ways to query the graph, including via bleeding-edge graph neural networks.
Zincbase exists to answer questions like "what is the probability that Tom likes LARPing", or "who likes LARPing", or "classify people into LARPers vs normies":
<img src="https://user-images.githubusercontent.com/2245347/57595488-2dc45b80-74fa-11e9-80f4-dc5c7a5b22de.png" width="320" alt="Example graph for reasoning">
It combines the latest in neural networks with symbolic logic (think expert systems and prolog) and graph search.
View full documentation [here](https://zincbase.readthedocs.io).
## Quickstart
```
from zincbase import KB
kb = KB()
kb.store('eats(tom, rice)')
for ans in kb.query('eats(tom, Food)'):
print(ans['Food']) # prints 'rice'
...
# The included assets/countries_s1_train.csv contains triples like:
# (namibia, locatedin, africa)
# (lithuania, neighbor, poland)
# Note that it won't be included if you pip install, only if you git clone.
kb = KB()
kb.from_csv('./assets/countries.csv')
kb.build_kg_model(cuda=False, embedding_size=40)
kb.train_kg_model(steps=2000, batch_size=1, verbose=False)
kb.estimate_triple_prob('fiji', 'locatedin', 'melanesia')
0.8467
```
# Requirements
* Python 3
* Libraries from requirements.txt
* GPU preferable for large graphs but not required
# Installation
`pip install zincbase`
This won't get you the examples or the assets (except those which are automatically
downloaded as needed, such as the NER model.) Advanced users may instead wish to:
```
git clone https://github.com/tomgrek/zincbase.git
pip install -r requirements.txt
```
_Note:_ Requirements might differ for PyTorch depending on your system. On Mac OSX
you might need to `brew install libomp` first.
# Testing
```
python -m doctest zincbase/zincbase.py
python test/test_main.py
python test/test_graph.py
python test/test_lists.py
python test/test_nn_basic.py
python test/test_nn.py
python test/test_neg_examples.py
python test/test_truthiness.py
```
# Validation
"Countries" and "FB15k" datasets are included in this repo.
There is a script to evaluate that ZincBase gets at least as good
performance on the Countries dataset as the original (2019) RotatE paper. From the repo's
root directory:
```
python examples/eval_countries_s3.py
```
It tests the hardest Countries task and prints out the AUC ROC, which should be
~ 0.95 to match the paper. It takes about 30 minutes to run on a modern GPU.
There is also a script to evaluate performance on FB15k: `python examples/fb15k_mrr.py`.
## Building documentation
From docs/ dir: `make html`. If something changed a lot: `sphinx-apidoc -o . ..`
## Building the pypi wheel
From the repo's root dir:
```
python setup.py sdist
twine upload dist/*
```
# TODO
* Add documentation
* to_csv method
* utilize postgres as backend triple store
* The to_csv/from_csv methods do not yet support node attributes.
* Add relation extraction from arbitrary unstructured text
* Add context to triple - that is interpreted by BERT/ULM/GPT-2 similar and
put into an embedding that's concat'd to the KG embedding.
* Reinforcement learning for graph traversal.
# References & Acknowledgements
[Theo Trouillon. Complex-Valued Embedding Models for Knowledge Graphs. Machine Learning[cs.LG]. Université Grenoble Alpes, 2017. English. ffNNT : 2017GREAM048](https://tel.archives-ouvertes.fr/tel-01692327/file/TROUILLON_2017_archivage.pdf)
[L334: Computational Syntax and Semantics -- Introduction to Prolog, Steve Harlow](http://www-users.york.ac.uk/~sjh1/courses/L334css/complete/complete2li1.html)
[Open Book Project: Prolog in Python, Chris Meyers](http://www.openbookproject.net/py4fun/prolog/intro.html)
[Prolog Interpreter in Javascript](https://curiosity-driven.org/prolog-interpreter)
[RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space, Zhiqing Sun and Zhi-Hong Deng and Jian-Yun Nie and Jian Tang, International Conference on Learning Representations, 2019](https://openreview.net/forum?id=HkgEQnRqYQ)
# Citing
If you use this software, please consider citing:
```
@software{zincbase,
author = {{Tom Grek}},
title = {ZincBase: A state of the art knowledge base},
url = {https://github.com/tomgrek/zincbase},
version = {0.1.1},
date = {2019-05-12}
}
```
# Contributing
See CONTRIBUTING. And please do!
没有合适的资源?快使用搜索试试~ 我知道了~
Python-Zincbase一个知识图谱构建工具包
共62个文件
py:37个
rst:6个
txt:4个
需积分: 39 55 下载量 20 浏览量
2019-08-10
05:49:26
上传
评论 8
收藏 11.88MB ZIP 举报
温馨提示
Zincbase 一个知识图谱构建工具包
资源推荐
资源详情
资源评论
收起资源包目录
Python-Zincbase一个知识图谱构建工具包.zip (62个子文件)
tomgrek-zincbase-5f9f1b9
MANIFEST.in 24B
CONTRIBUTING 258B
.circleci
config.yml 545B
requirements.txt 87B
zincbase
logic
__init__.py 0B
Negative.py 517B
Rule.py 482B
Term.py 2KB
Goal.py 203B
common.py 2KB
utils
file_utils.py 469B
string_utils.py 1KB
calc_auc_roc.py 577B
type_checks.py 148B
__init__.py 0B
calc_mrr.py 1KB
misc_utils.py 415B
kb
__init__.py 18B
kb.py 38KB
__init__.py 18B
nn
tokenizer.py 312B
ner.py 6KB
dataloader.py 6KB
__init__.py 25B
rotate.py 12KB
examples
countries.py 2KB
eval_countries_s3.py 2KB
fb15k_mrr.py 443B
sparql_prep.py 5KB
LICENSE 1KB
assets
countries_s3_train.csv 28KB
fb15k_train_mod.txt 44.59MB
countries_s1_train.csv 32KB
LICENSE 992B
training
ner
train_ner.py 4KB
dataloader.py 2KB
README.md 408B
countries_s3_test.csv 610B
countries_s1_test.csv 609B
fb15k_test_mod.txt 5.45MB
test
test_nn_basic.py 2KB
test_nn.py 8KB
test_main.py 5KB
test_truthiness.py 1KB
test_lists.py 791B
test_graph.py 1KB
context.py 103B
test_neg_examples.py 2KB
setup.py 806B
README.md 5KB
docs
conf.py 6KB
requirements.txt 59B
utils.rst 744B
README.md 12B
index.rst 225B
Makefile 580B
nn.rst 406B
zincbase.rst 1KB
modules.rst 85B
logic.rst 793B
.readthedocs.yml 145B
.gitignore 91B
共 62 条
- 1
资源评论
weixin_39840924
- 粉丝: 491
- 资源: 1万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 脑部肿瘤检测YOLOV8
- 问题2.zip
- Fooocus软件AI作画必备模型-东方审美,人工智能,AI模型根据提示词作画
- QT登录模块,登录时获取txt文件内账号密码进行判断,注册时写入到本地文件user.txt
- 湖南麒麟系统下,因某些原因无法正常进入系统时可以进入单用户模式
- tsunami-udp 是一款专为网络加速诞生的小工具 用TCP进行传输控制、用UDP进行数据传输
- 基于FreeRTOS、LCD1602 、STM32CubeMX、GP2Y0A21YK0F红外测距传感器的测距proteus仿真
- C语言中的一些算法和面试题
- A72BDB68-F5FA-4D0F-906E-EACAA6A1EFA5.rar
- TTP229-BSF数据手册
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功