基于知识图表示模型的常识推理python源码+数据+报告.zip

共66个文件

py：37个

h：6个

xml：5个

版权申诉

python

常识推理

2星 44 浏览量 2023-07-10 15:30:04 上传评论 1 收藏 658KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于知识图表示模型的常识推理python源码+数据+报告.zip （66个子文件）

说明.md 2KB

report.pdf 278KB

taskA

train.tsv 580KB

dev.tsv 146KB

test.tsv 182KB

exp.txt 227B

program

graph_gen_data.py 3KB

gat.py 6KB

graph_dataset.py 9KB

convert_to_statements.py 2KB

augmentation_FT copy.py 5KB

OpenKE

__init__.py 0B

LICENSE 1KB

transe.py 760B

res

checkpoint 81B

base

Reader.h 9KB

Setting.h 1KB

Random.h 603B

Triple.h 893B

Test.h 14KB

Base.cpp 3KB

Corrupt.h 3KB

models

__init__.py 299B

TransH.py 5KB

RESCAL.py 3KB

ComplEx.py 3KB

HolE.py 4KB

TransE.py 3KB

TransD.py 5KB

Analogy.py 4KB

Model.py 4KB

DistMult.py 2KB

TransR.py 5KB

.gitignore 1KB

README.md 19KB

make.sh 82B

config

__init__.py 27B

Config.py 17KB

trigram.py 6KB

gat_train.py 8KB

graph_models.py 15KB

augmentation_FT.py 4KB

gat_dataset.py 5KB

bert_finetuning.py 10KB

graph_train.py 10KB

.idea

codeStyles

codeStyleConfig.xml 149B

vcs.xml 194B

workspace.xml 28KB

misc.xml 185B

modules.xml 266B

program.iml 453B

relation_extraction.py 12KB

taskA_prediction.csv 13KB

dataloader.py 9KB

bert_perplexity.py 3KB

perplexity_tf_version

modeling.py 37KB

tokenization.py 10KB

bert_perplexity_tf.py 722B

TransE.py 2KB

graph_gen_cpt.py 7KB

__pycache__

relation_extraction.cpython-37.pyc 9KB

modeling.cpython-37.pyc 25KB

extract_relations.cpython-37.pyc 9KB

tokenization.cpython-37.pyc 8KB

xlnet_finetuning.py 10KB

accuracy.py 3KB

# OpenKE An Open-source Framework for Knowledge Embedding. More information is available on our website [http://openke.thunlp.org/](http://openke.thunlp.org/) If you use the code, please cite the following [paper](http://aclweb.org/anthology/D18-2024): ``` @inproceedings{han2018openke, title={OpenKE: An Open Toolkit for Knowledge Embedding}, author={Han, Xu and Cao, Shulin and Lv Xin and Lin, Yankai and Liu, Zhiyuan and Sun, Maosong and Li, Juanzi}, booktitle={Proceedings of EMNLP}, year={2018} } ``` ## Overview This is an Efficient implementation based on TensorFlow for knowledge representation learning (KRL). We use C++ to implement some underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by TensorFlow with Python interfaces so that there is a convenient platform to run models on GPUs. OpenKE composes 4 repositories: OpenKE: the main project based on TensorFlow, which provides the optimized and stable framework for knowledge graph embedding models. <a href="https://github.com/thunlp/OpenKE/tree/OpenKE-PyTorch"> OpenKE-PyTorch</a>: OpenKE implemented with PyTorch, also providing the optimized and stable framework for knowledge graph embedding models. <a href="https://github.com/thunlp/TensorFlow-TransX"> TensorFlow-TransX</a>: light and simple version of OpenKE based on TensorFlow, including TransE, TransH, TransR and TransD. <a href="https://github.com/thunlp/Fast-TransX"> Fast-TransX</a>: efficient lightweight C++ inferences for TransE and its extended models utilizing the framework of OpenKE, including TransH, TransR, TransD, TranSparse and PTransE. ## Installation 1. Install TensorFlow 2. Clone the OpenKE repository: ```bash $ git clone https://github.com/thunlp/OpenKE $ cd OpenKE ``` 3. Compile C++ files ```bash $ bash make.sh ``` ## Data * For training, datasets contain three files: train2id.txt: training file, the first line is the number of triples for training. Then the following lines are all in the format ***(e1, e2, rel)*** which indicates there is a relation ***rel*** between ***e1*** and ***e2*** . **Note that train2id.txt contains ids from entitiy2id.txt and relation2id.txt instead of the names of the entities and relations. If you use your own datasets, please check the format of your training file. Files in the wrong format may cause segmentation fault.** entity2id.txt: all entities and corresponding ids, one per line. The first line is the number of entities. relation2id.txt: all relations and corresponding ids, one per line. The first line is the number of relations. * For testing, datasets contain additional two files (totally five files): test2id.txt: testing file, the first line is the number of triples for testing. Then the following lines are all in the format ***(e1, e2, rel)*** . valid2id.txt: validating file, the first line is the number of triples for validating. Then the following lines are all in the format ***(e1, e2, rel)*** . type_constrain.txt: type constraining file, the first line is the number of relations. Then the following lines are type constraints for each relation. For example, the relation with id 1200 has 4 types of head entities, which are 3123, 1034, 58 and 5733. The relation with id 1200 has 4 types of tail entities, which are 12123, 4388, 11087 and 11088. You can get this file through **n-n.py** in folder benchmarks/FB15K. ## Quick Start ### Training To compute a knowledge graph embedding, first import datasets and set configure parameters for training, then train models and export results. For instance, we write an example_train_transe.py to train TransE: ```python import config import models import tensorflow as tf import numpy as np con = config.Config() #Input training files from benchmarks/FB15K/ folder. con.set_in_path("./benchmarks/FB15K/") con.set_work_threads(4) con.set_train_times(500) con.set_nbatches(100) con.set_alpha(0.001) con.set_margin(1.0) con.set_bern(0) con.set_dimension(50) con.set_ent_neg_rate(1) con.set_rel_neg_rate(0) con.set_opt_method("SGD") #Models will be exported via tf.Saver() automatically. con.set_export_files("./res/model.vec.tf", 0) #Model parameters will be exported to json files automatically. con.set_out_files("./res/embedding.vec.json") #Initialize experimental settings. con.init() #Set the knowledge embedding model con.set_model(models.TransE) #Train the model. con.run() ``` #### Step 1: Import datasets ```python con.set_in_path("benchmarks/FB15K/") ``` We import knowledge graphs from benchmarks/FB15K/ folder. The data consists of three essential files mentioned before: * train2id.txt * entity2id.txt * relation2id.txt Validation and test files are required and used to evaluate the training results, However, they are not indispensable for training. ```python con.set_work_threads(8) ``` We can allocate several threads to sample positive and negative cases. #### Step 2: Set configure parameters for training. ```python con.set_train_times(500) con.set_nbatches(100) con.set_alpha(0.5) con.set_dimension(200) con.set_margin(1) ``` We set essential parameters, including the data traversing rounds, learning rate, batch size, and dimensions of entity and relation embeddings. ```python con.set_bern(0) con.set_ent_neg_rate(1) con.set_rel_neg_rate(0) ``` For negative sampling, we can corrupt entities and relations to construct negative triples. set\_bern(0) will use the traditional sampling method, and set\_bern(1) will use the method in (Wang et al. 2014) denoted as "bern". ```python con.set_optimizer("SGD") ``` We can select a proper gradient descent optimization algorithm to train models. #### Step 3: Export results ```python con.set_export_files("./res/model.vec.tf", 0) con.set_out_files("./res/embedding.vec.json") ``` Models will be exported via tf.Saver() automatically every few rounds. Also, model parameters will be exported to json files finally. #### Step 4: Train models ```python con.init() con.set_model(models.TransE) con.run() ``` We set the knowledge graph embedding model and start the training process. ### Testing #### Link Prediction Link prediction aims to predict the missing h or t for a relation fact triple (h, r, t). In this task, for each position of missing entity, the system is asked to rank a set of candidate entities from the knowledge graph, instead of only giving one best result. For each test triple (h, r, t), we replace the head/tail entity by all entities in the knowledge graph, and rank these entities in descending order of similarity scores calculated by score function fr. we use two measures as our evaluation metric: * ***MR*** : mean rank of correct entities; * ***MRR***: the average of the reciprocal ranks of correct entities; * ***Hit@N*** : proportion of correct entities in top-N ranked entities. #### Triple Classification Triple classification aims to judge whether a given triple (h, r, t) is correct or not. This is a binary classification task. For triple classification, we set a relationspecific threshold δr. For a triple (h, r, t), if the dissimilarity score obtained by fr is below δr, the triple will be classified as positive, otherwise negative. δr is optimized by maximizing classification accuracies on the validation set. #### Predict Head Entity Given tail entity and relation, predict the top k possible head entities. All the objects are represented by their id. ```python def predict_head_entity(self, t, r, k): r'''This mothod predicts the top k head entities given tail entity and relation. Args: t (int): tail entity id r (int): relation id k (int): top k head entities Returns: list: k possible head entity ids ''' self.init_link_prediction() if self.importName != None: self.restore_tensorflow() test_h = np.array(range(self.entTotal)) test_r = np.array([r] * self.entTotal) test_t = np.array([t] * self.entTotal) res = self.test_step(test_h, te

评论收藏

内容反馈

版权申诉