# OpenKE
An Open-source Framework for Knowledge Embedding.
More information is available on our website
[http://openke.thunlp.org/](http://openke.thunlp.org/)
If you use the code, please cite the following [paper](http://aclweb.org/anthology/D18-2024):
```
@inproceedings{han2018openke,
title={OpenKE: An Open Toolkit for Knowledge Embedding},
author={Han, Xu and Cao, Shulin and Lv Xin and Lin, Yankai and Liu, Zhiyuan and Sun, Maosong and Li, Juanzi},
booktitle={Proceedings of EMNLP},
year={2018}
}
```
## Overview
This is an Efficient implementation based on TensorFlow for knowledge representation learning (KRL). We use C++ to implement some underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by TensorFlow with Python interfaces so that there is a convenient platform to run models on GPUs. OpenKE composes 4 repositories:
OpenKE: the main project based on TensorFlow, which provides the optimized and stable framework for knowledge graph embedding models.
<a href="https://github.com/thunlp/OpenKE/tree/OpenKE-PyTorch"> OpenKE-PyTorch</a>: OpenKE implemented with PyTorch, also providing the optimized and stable framework for knowledge graph embedding models.
<a href="https://github.com/thunlp/TensorFlow-TransX"> TensorFlow-TransX</a>: light and simple version of OpenKE based on TensorFlow, including TransE, TransH, TransR and TransD.
<a href="https://github.com/thunlp/Fast-TransX"> Fast-TransX</a>: efficient lightweight C++ inferences for TransE and its extended models utilizing the framework of OpenKE, including TransH, TransR, TransD, TranSparse and PTransE.
## Installation
1. Install TensorFlow
2. Clone the OpenKE repository:
```bash
$ git clone https://github.com/thunlp/OpenKE
$ cd OpenKE
```
3. Compile C++ files
```bash
$ bash make.sh
```
## Data
* For training, datasets contain three files:
train2id.txt: training file, the first line is the number of triples for training. Then the following lines are all in the format ***(e1, e2, rel)*** which indicates there is a relation ***rel*** between ***e1*** and ***e2*** .
**Note that train2id.txt contains ids from entitiy2id.txt and relation2id.txt instead of the names of the entities and relations. If you use your own datasets, please check the format of your training file. Files in the wrong format may cause segmentation fault.**
entity2id.txt: all entities and corresponding ids, one per line. The first line is the number of entities.
relation2id.txt: all relations and corresponding ids, one per line. The first line is the number of relations.
* For testing, datasets contain additional two files (totally five files):
test2id.txt: testing file, the first line is the number of triples for testing. Then the following lines are all in the format ***(e1, e2, rel)*** .
valid2id.txt: validating file, the first line is the number of triples for validating. Then the following lines are all in the format ***(e1, e2, rel)*** .
type_constrain.txt: type constraining file, the first line is the number of relations. Then the following lines are type constraints for each relation. For example, the relation with id 1200 has 4 types of head entities, which are 3123, 1034, 58 and 5733. The relation with id 1200 has 4 types of tail entities, which are 12123, 4388, 11087 and 11088. You can get this file through **n-n.py** in folder benchmarks/FB15K.
## Quick Start
### Training
To compute a knowledge graph embedding, first import datasets and set configure parameters for training, then train models and export results. For instance, we write an example_train_transe.py to train TransE:
```python
import config
import models
import tensorflow as tf
import numpy as np
con = config.Config()
#Input training files from benchmarks/FB15K/ folder.
con.set_in_path("./benchmarks/FB15K/")
con.set_work_threads(4)
con.set_train_times(500)
con.set_nbatches(100)
con.set_alpha(0.001)
con.set_margin(1.0)
con.set_bern(0)
con.set_dimension(50)
con.set_ent_neg_rate(1)
con.set_rel_neg_rate(0)
con.set_opt_method("SGD")
#Models will be exported via tf.Saver() automatically.
con.set_export_files("./res/model.vec.tf", 0)
#Model parameters will be exported to json files automatically.
con.set_out_files("./res/embedding.vec.json")
#Initialize experimental settings.
con.init()
#Set the knowledge embedding model
con.set_model(models.TransE)
#Train the model.
con.run()
```
#### Step 1: Import datasets
```python
con.set_in_path("benchmarks/FB15K/")
```
We import knowledge graphs from benchmarks/FB15K/ folder. The data consists of three essential files mentioned before:
* train2id.txt
* entity2id.txt
* relation2id.txt
Validation and test files are required and used to evaluate the training results, However, they are not indispensable for training.
```python
con.set_work_threads(8)
```
We can allocate several threads to sample positive and negative cases.
#### Step 2: Set configure parameters for training.
```python
con.set_train_times(500)
con.set_nbatches(100)
con.set_alpha(0.5)
con.set_dimension(200)
con.set_margin(1)
```
We set essential parameters, including the data traversing rounds, learning rate, batch size, and dimensions of entity and relation embeddings.
```python
con.set_bern(0)
con.set_ent_neg_rate(1)
con.set_rel_neg_rate(0)
```
For negative sampling, we can corrupt entities and relations to construct negative triples. set\_bern(0) will use the traditional sampling method, and set\_bern(1) will use the method in (Wang et al. 2014) denoted as "bern".
```python
con.set_optimizer("SGD")
```
We can select a proper gradient descent optimization algorithm to train models.
#### Step 3: Export results
```python
con.set_export_files("./res/model.vec.tf", 0)
con.set_out_files("./res/embedding.vec.json")
```
Models will be exported via tf.Saver() automatically every few rounds. Also, model parameters will be exported to json files finally.
#### Step 4: Train models
```python
con.init()
con.set_model(models.TransE)
con.run()
```
We set the knowledge graph embedding model and start the training process.
### Testing
#### Link Prediction
Link prediction aims to predict the missing h or t for a relation fact triple (h, r, t). In this task, for each position of missing entity, the system is asked to rank a set of candidate entities from the knowledge graph, instead of only giving one best result. For each test triple (h, r, t), we replace the head/tail entity by all entities in the knowledge graph, and rank these entities in descending order of similarity scores calculated by score function fr. we use two measures as our evaluation metric:
* ***MR*** : mean rank of correct entities;
* ***MRR***: the average of the reciprocal ranks of correct entities;
* ***Hit@N*** : proportion of correct entities in top-N ranked entities.
#### Triple Classification
Triple classification aims to judge whether a given triple (h, r, t) is correct or not. This is a binary classification
task. For triple classification, we set a relationspecific threshold δr. For a triple (h, r, t), if the dissimilarity
score obtained by fr is below δr, the triple will be classified as positive, otherwise negative. δr is optimized by maximizing classification accuracies on the validation set.
#### Predict Head Entity
Given tail entity and relation, predict the top k possible head entities. All the objects are represented by their id.
```python
def predict_head_entity(self, t, r, k):
r'''This mothod predicts the top k head entities given tail entity and relation.
Args:
t (int): tail entity id
r (int): relation id
k (int): top k head entities
Returns:
list: k possible head entity ids
'''
self.init_link_prediction()
if self.importName != None:
self.restore_tensorflow()
test_h = np.array(range(self.entTotal))
test_r = np.array([r] * self.entTotal)
test_t = np.array([t] * self.entTotal)
res = self.test_step(test_h, te
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
基于知识图表示模型的常识推理python源码+数据+报告.zip 基于知识图表示模型的常识推理python源码+数据+报告.zip 基于知识图表示模型的常识推理python源码+数据+报告.zip 【资源介绍】 该项目是个人毕设项目,答辩评审分达到95分,代码都经过调试测试,确保可以运行!欢迎下载使用,可用于小白学习、进阶。 该资源主要针对计算机、通信、人工智能、自动化等相关专业的学生、老师或从业者下载使用,亦可作为期末课程设计、课程大作业、毕业设计等。 项目整体具有较高的学习借鉴价值!基础能力强的可以在此基础上修改调整,以实现不同的功能。
资源推荐
资源详情
资源评论
收起资源包目录
基于知识图表示模型的常识推理python源码+数据+报告.zip (66个子文件)
说明.md 2KB
report.pdf 278KB
taskA
train.tsv 580KB
dev.tsv 146KB
test.tsv 182KB
exp.txt 227B
program
graph_gen_data.py 3KB
gat.py 6KB
graph_dataset.py 9KB
convert_to_statements.py 2KB
augmentation_FT copy.py 5KB
OpenKE
__init__.py 0B
LICENSE 1KB
transe.py 760B
res
checkpoint 81B
base
Reader.h 9KB
Setting.h 1KB
Random.h 603B
Triple.h 893B
Test.h 14KB
Base.cpp 3KB
Corrupt.h 3KB
models
__init__.py 299B
TransH.py 5KB
RESCAL.py 3KB
ComplEx.py 3KB
HolE.py 4KB
TransE.py 3KB
TransD.py 5KB
Analogy.py 4KB
Model.py 4KB
DistMult.py 2KB
TransR.py 5KB
.gitignore 1KB
README.md 19KB
make.sh 82B
config
__init__.py 27B
Config.py 17KB
trigram.py 6KB
gat_train.py 8KB
graph_models.py 15KB
augmentation_FT.py 4KB
gat_dataset.py 5KB
bert_finetuning.py 10KB
graph_train.py 10KB
.idea
codeStyles
codeStyleConfig.xml 149B
vcs.xml 194B
workspace.xml 28KB
misc.xml 185B
modules.xml 266B
program.iml 453B
relation_extraction.py 12KB
taskA_prediction.csv 13KB
dataloader.py 9KB
bert_perplexity.py 3KB
perplexity_tf_version
modeling.py 37KB
tokenization.py 10KB
bert_perplexity_tf.py 722B
TransE.py 2KB
graph_gen_cpt.py 7KB
__pycache__
relation_extraction.cpython-37.pyc 9KB
modeling.cpython-37.pyc 25KB
extract_relations.cpython-37.pyc 9KB
tokenization.cpython-37.pyc 8KB
xlnet_finetuning.py 10KB
accuracy.py 3KB
共 66 条
- 1
资源评论
- 柚子同学992024-04-17dgl文件在哪呀 #内容缺失
z同学的编程之路
- 粉丝: 1874
- 资源: 2130
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- # 微信小程序-健康菜谱 基于微信小程序的一个查找检索菜谱的应用 ### 效果 !动态图(./res/gif/demo
- zabbix-get命令包资源
- 毕业设计,基于PyQt5实现的可视化界面的Python车牌自动识别系统源码
- 26-朴素贝叶斯分类.rar
- 没有安Matlab 也可以 生成FIR抽头系数工具.py
- python烟花代码.rar
- 实验目的: 1.构建基于verilog语言的组合逻辑电路和时序逻辑电路; 2.掌握verilog语言的电路设计技巧 3.完成如
- 扩展卡尔曼滤波matlab仿真
- 3_base.apk.1
- 躺赢者PRO飞控常见典型问题合集(续一)无名小哥 余义 20240501待修
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功