# SDM-RDFizer
This project presents the SDM-RDFizer, an interpreter of mapping rules that allows the transformation of (un)structured data into RDF knowledge graphs. The current version of the SDM-RDFizer assumes mapping rules are defined in the [RDF Mapping Language (RML) by Dimou et al](https://rml.io/specs/rml/). The SDM-RDFizer implements optimized data structures and relational algebra operators that enable an efficient execution of RML triple maps even in the presence of Big data. SDM-RDFizer is able to process data from heterogeneous data sources (CSV, JSON, RDB, XML) processing each set of RML rules (TriplesMap) in a multi-thread safe procedure.
![SDM-RDFizer workflow](https://raw.githubusercontent.com/SDM-TIB/SDM-RDFizer/beta/images/architecture.png "SDM-RDFizer workflow")
# The new features presented by SDM-RDFizer version4.0
In version 4.0 of SDM-RDFizer, we have addressed the problem of efficiency in KG creation in terms of memory storage. SDM-RDFizer version4.0 includes a new module called "TriplesMap Planning" a.k.a. TMP which defines an optimized evaluation plan for the execution of triples maps. Additionally, version4.0 extends the previously included module (i.e. TriplesMap Execution a.k.a. TME) by introducing a new operator for compressing data stored in the data structures. These new features can be configured using two new parameters added to the configuration file, named "large_file" and "ordered".
We have performed extensive empirical evaluation on SDM-RDFizer version4.0 in terms of execution time and memory usage. The experiments are set up to empirically compare the impact of data duplicate rates, data size, and the complexity and the execution order of the triples maps on two versions of SDM-RDFizer (i.e. version4.0 and version3.6) and other exisiting engines icluding [RMLMapper v4.7](https://github.com/RMLio/rmlmapper-java) and [RocketRML](https://github.com/semantifyit/RocketRML) ), in terms of execution time and memory usage. The experiments are performed on two different benchmarks:
- From [SDM-Genomic-datasets](https://figshare.com/articles/dataset/SDM-Genomic-Datasets/14838342/1), datasets including 10k, 100k, and 1M records with 25% and 75% duplicates rates, over six mapping rules with different complexities (1/4 simple object map, 2/5 object reference maps, 2/5 object join maps)
- From [GTFS-Madrid](https://github.com/oeg-upm/gtfs-bench), datasets with scale values of 1-csv, 5-csv, 10-csv, and 50-csv, over two different mapping rules (72 simple object maps and 11 object join maps).
The results of explained experiments can be summarized as the following:
![Overview of Results (Execution Time Comparison)](https://raw.githubusercontent.com/SDM-TIB/SDM-RDFizer/beta/images/time.png "Execution Time Comparison")
As observed in the figures above, both versions of SDM-RDFizer completed all the testbeds successfully while the other two engines have cases of timeout. SDM-RDFizer version3.6 and RocketRML version 1.7.0 are competitve in simple testbeds, however, SDM-RDFizer version4.0 shows the best performance in all the testbeds.
![Overview of Results (Memory Consumption Comparison)](https://raw.githubusercontent.com/SDM-TIB/SDM-RDFizer/beta/images/memory.png "Memory Consumption Comparison")
As illustrated in the figures above, SDM-RDFizer version4.0 has the smallest peak in memory usage compared to the previous version of SDM-RDFizer.
The results of the execution of SDM-RDFizer has been described in the following research reports:
- Enrique Iglesias, Samaneh Jozashoori, David Chaves-Fraga, Diego Collarana, and Maria-Esther Vidal. 2020. SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. The 29th ACM International Conference on Information and Knowledge Management (CIKM ’20).
- Samaneh Jozashoori, David Chaves-Fraga, Enrique Iglesias, Oscar Corcho, and Maria-Esther Vidal. 2020. FunMap: Efficient Execution of Functional Mappings for Knowledge Graph Creation. The 19th International Semantic Web Conference - Research Track (ISWC 2020).
- Samaneh Jozashoori and Maria-Esther Vidal. MapSDI: A Scaled-up Semantic Data Integrationframework for Knowledge Graph Creation. The 27th International Conference on Cooperative Information Systems (CoopIS 2019).
- David Chaves-Fraga, Kemele M. Endris, Enrique Iglesias, Oscar Corcho, and Maria-Esther Vidal. What are the Parameters that Affect the Construction of a Knowledge Graph?. The 18th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE 2019).
- David Chaves-Fraga, Antón Adolfo, Jhon Toledo, and Oscar Corcho. ONETT: Systematic Knowledge Graph Generation for National Access Points. The 1st International Workshop on Semantics for Transport co-located with SEMANTiCS 2019.
- David Chaves-Fraga, Freddy Priyatna, Andrea Cimmino, Jhon Toledo, Edna Ruckhaus, and Oscar Corcho. GTFS-Madrid-Bench: A benchmark for virtual knowledge graph access in the transport domain. Journal of Web Semantics, 2020.
Additional References:
- Dimou et al. 2014. Dimou, A., Sande, M.V., Colpaert, P., Verborgh, R., Mannens, E., de Walle, R.V.:RML: A generic language for integrated RDF mappings of heterogeneous data. In:Proceedings of the Workshop on Linked Data on the Web co-located with the 23rdInternational World Wide Web Conference (WWW 2014)
# Projects where the SDM-RDFizer has been used
The SDM-RDFizer is used in the creation of the knowledge graphs of EU H2020 projects and national projects where the Scientific Data Management group participates. These projects include:
- iASiS (http://project-iasis.eu/): big data for precision medicine, based on patient data insights. The iASiS RDF knowledge graph comprises more than 1.2B RDF triples collected from more than 40 heterogeneous sources using over 1300 RML triple maps.
- BigMedilytics (https://www.bigmedilytics.eu/): lung cancer pilot. 800 RML triple maps are used to create the lung cancer knowledge graph from around 25 data sources with 500M RDF triples.
- CLARIFY (https://www.clarify2020.eu/): predict poor health status after specific oncological treatments
- P4-LUCAT (https://www.tib.eu/de/forschung-entwicklung/projektuebersicht/projektsteckbrief/p4-lucat)
- ImProVIT (https://www.tib.eu/de/forschung-entwicklung/projektuebersicht/projektsteckbrief/improvit)
- PLATOON (https://platoon-project.eu/)
- EUvsVirus Hackathon (April 2020) (https://blogs.tib.eu/wp/tib/2020/05/06/how-do-knowledge-graphs-contribute-to-understanding-covid-19-related-treatments/). SDM-RDFizer created the Knowledge4COVID-19 knowledge graph during the participation of the team of the Scientific Data Management group. By June 7th, 2020, this KG is comprised of 28M RDF triples describing at a fine-grained level 63527 COVID-19 scientific publications and COVID-19 related concepts (e.g., 5802 substances, 1.2M drug-drug interactions, and 103 molecular disfunctions).
The SDM-RDFizer is also used in EU H2020, EIT-Digital and Spanish national projects where the Ontology Engineering Group (Technical University of Madrid) participates. These projects, mainly focused on the transportation and smart cities domain, include:
- H2020 - SPRINT (http://sprint-transport.eu/): performance and scalability to test a semantic architecture for the Interoperability Framework on Transport across Europe.
- EIT-SNAP (https://www.snap-project.eu/): innovation project on the application of semantic technologies for national access points.
- Open Cities (https://ciudades-abiertas.es/): national project on creating common and shared vocabularies for Spanish Cities
- Drugs4Covid (https://drugs4covid.oeg.fi.upm.es/): NLP annotations and metadata from more than 60,000 scientific papers about COVID viruses are integrated in a KG with almost 44M of facts (triples). SDM-RDFizer was used for creating this KG.
Other projects were the SDM-RDFizer is also used:
- Virtual Platform for the H2020 European Joint Programme on Rare Disease (http
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
共16个文件
py:7个
txt:5个
pkg-info:2个
资源分类:Python库 所属语言:Python 资源全名:rdfizer-4.1.1.dev1645540824.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
rdfizer-4.1.1.dev1645540824.tar.gz (16个子文件)
rdfizer-4.1.1.dev1645540824
PKG-INFO 11KB
rdfizer.egg-info
PKG-INFO 11KB
requires.txt 118B
SOURCES.txt 354B
entry_points.txt 51B
top_level.txt 8B
dependency_links.txt 1B
rdfizer
functions.py 39KB
semantify.py 226KB
__main__.py 1KB
__init__.py 226KB
triples_map
__init__.py 0B
TriplesMap.py 5KB
setup.cfg 38B
setup.py 2KB
README.md 10KB
共 16 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功