[![Build Status](https://travis-ci.org/UUDigitalHumanitieslab/tei_reader.svg?branch=master)](https://travis-ci.org/UUDigitalHumanitieslab/tei_reader)
# Python 3 Library for Reading the Text Content and Metadata of TEI P5 (Lite) Files
The library focuses on extracting the main text content from a file and provide the available metadata about the text.
# TL;DR
```bash
pip install tei-reader
```
```python
from tei_reader import TeiReader
reader = TeiReader()
corpora = reader.read_file('example-tei.xml') # or read_string
print(corpora.text)
# show element attributes before the actual element text
print(corpora.tostring(lambda x, text: str(list(a.key + '=' + a.text for a in x.attributes)) + text))
```
# More Explanation
A reader can be opened using `TeiReader()`. It is then possible to either call `read_file(file_name)` or `read_string(str)`. Both will return a `Corpora` object containing the following properties:
| Property | Description |
| --- | --- |
| `corpora[]` | A corpora can contain sub-corpora. |
| `documents[]` | The `Document` objects directly part of this corpora. |
`Corpora` and `Document` all inherit from `Element`. In all objects deriving from this it is possible to call:
| Property | Description
| --- | --- |
| `attributes{}` | Contain attributes applicable to this element. If an attribute contains attributes these are also returned. (e.g. `encodingDesc::editorialDecl::normalization`) |
| `text` | Get the entire text content as `str` |
| `divisions[]` | Recursively get all the text divisions in document order. If an element contains parts or text without tag. Those will be returned in order and wrapped with a `PlaceholderDivision`. |
| `parts[]` | Recursively get the parts in document order constituting the entire text e.g. if something has emphasis, a footnote or is marked as foreign. Text without a container element will be returned in order and wrapped with a `PlaceholderPart`. |
`Attribute`, `PlaceholderDivision` and `PlaceholderPart` all support the same properties as `Element`.
# Upload to PyPi
```bash
python setup.py sdist
twine upload dist/*
```
PyPI 官网下载 | tei_reader_3.5-0.1.4.tar.gz
版权申诉
121 浏览量
2022-01-16
17:26:16
上传
评论
收藏 6KB GZ 举报
挣扎的蓝藻
- 粉丝: 12w+
- 资源: 15万+
最新资源
- 筷手引流工具.apk
- 论文(最终)_20240430235101.pdf
- 基于python编写的Keras深度学习框架开发,利用卷积神经网络CNN,快速识别图片并进行分类
- 最全空间计量实证方法(空间杜宾模型和检验以及结果解释文档).txt
- 5uonly.apk
- 蓝桥杯Python组的历年真题
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 前端开发技术实验报告:内含4四实验&实验报告
- Highlight Plus v20.0.1
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈