# sheatless - A python library for extracting parts from sheetmusic pdfs
Sheatless, a tool for The Beatless to become sheetless. Written and managed by the web-committee in the student orchestra The Beatless. Soon to be integrated in [taktlaus.no](https://taktlaus.no/).
# API
## PdfPredictor
```py
class PdfPredictor():
def __init__(
self,
pdf : BytesIO | bytes,
instruments=None,
instruments_file=None,
instruments_file_format="yaml",
use_lstm=False,
tessdata_dir=None,
log_stream=sys.stdout,
crop_to_top=True,
crop_to_left=True,
):
...
def parts(self):
for ...:
yield {
"name": "<part name>",
"partNumber": "<part number>",
"instruments": ["<instrument name", ...],
"fromPage": "<from page>",
"toPage": "<to page>",
}
```
### Arguments for `__init__`:
- `pdf` - PDF file object
- `instruments` (optional) - Dictionary of instruments. Will override any provided instruments file.
- `instruments_file` (optional) - Full path to instruments file or instruments file object. Accepted extensions: .yaml, .yml, .json
- `instruments_file_format` (optional) - Format of instruments_file if it is a file object. Accepted formats: yaml, json
- If neither instruments_file nor instruments is provided a default instruments file will be used.
- `use_lstm` (optional) - Use LSTM instead of legacy engine mode.
- `tessdata_dir` (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.
- `log_stream` (optional) - File stream log output will be sent to. Can be set to `None` to disable logging.
- `crop_to_top` (optional) - If set to `True` (not default), PDF pages will be cropped to top half.
- `crop_to_left` (optional) - If set to `True` (default), PDF pages will be cropped to left half.
## processUploadedPdf
```python
def processUploadedPdf(pdfPath, imagesDirPath, instruments_file=None, instruments=None, use_lstm=False, tessdata_dir=None):
...
return parts, instrumentsDefaultParts
```
which will be available with
```python
from sheatless import processUploadedPdf
```
Arguments description here:
| Argument | Optional | Description |
| ---------------- | ---------- | ---------------------------------------------------------------------------------------------------------------- |
| pdfPath | | Full path to PDF file. |
| imagesDirPath | | Full path to output images. |
| instruments_file | (optional) | Full path to instruments file. Accepted formats: YAML (.yaml, .yml), JSON (.json). |
| instruments | (optional) | Dictionary of instruments. Will override any provided instruments file. |
| | | If neither instruments_file nor instruments is provided a default instruments file will be used. |
| use_lstm | (optional) | Use LSTM instead of legacy engine mode. |
| tessdata_dir | (optional) | Full path to tessdata directory. If not provided, whatever the environment variable `TESSDATA_DIR` will be used. |
Returns description here:
| Return | Description |
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| parts | A list of dictionaries `{ "name": "name", "instruments": ["instrument 1", "instrument 2"...] "fromPage": i, "toPage": j }` describing each part |
| instrumentsDefaultParts | A dictionary `{ ..., "instrument_i": j, ... }`, where `j` is the index in the parts list for the default part for `instrument_i`. |
## predict_parts_in_pdf
```py
def predict_parts_in_pdf(
pdf : BytesIO | bytes,
instruments=None,
instruments_file=None,
instruments_file_format="yaml",
use_lstm=False,
tessdata_dir=None,
):
...
return parts, instrumentsDefaultParts
```
### Arguments:
- pdf - PDF file object
- instruments (optional) - Dictionary of instruments. Will override any provided instruments file.
- instruments_file (optional) - Full path to instruments file or instruments file object. Accepted extensions: .yaml, .yml, .json
- instruments_file_format (optional) - Format of instruments_file if it is a file object. Accepted formats: yaml, json
- If neither instruments_file nor instruments is provided a default instruments file will be used.
- use_lstm (optional) - Use LSTM instead of legacy engine mode.
- tessdata_dir (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.
### Returns:
- parts - A list of dictionaries `{ "name": "name", "instruments": ["instrument 1", "instrument 2"...] "fromPage": i, "toPage": j }` describing each part
- instrumentsDefaultParts - A dictionary `{ ..., "instrument_i": j, ... }`, where j is the index in the parts list for the default part for instrument_i.
## predict_parts_in_img
```py
def predict_parts_in_img(img : io.BytesIO | bytes | PIL.Image.Image, instruments, use_lstm=False, tessdata_dir=None) -> typing.Tuple[list, list]:
...
return partNames, instrumentses
```
### Arguments:
- img - image object
- instruments - dictionary of instruments
- use_lstm (optional) - Use LSTM instead of legacy engine mode.
- tessdata_dir (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.
### Returns:
- partNames - a list of part names
- instrumentses - a list of lists of instruments for each part
# Example docker setup
Sheatless requires tesseract and poppler installed on the system to work. An example docker setup as well as integration of the library can be found in [sheatless-splitter](https://github.com/sigurdo/sheatless-splitter).
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
资源分类:Python库 所属语言:Python 资源全名:sheatless-1.6.1.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
sheatless-1.6.1.tar.gz (16个子文件)
sheatless-1.6.1
MANIFEST.in 22B
PKG-INFO 7KB
pyproject.toml 104B
LICENSE 34KB
src
sheatless.egg-info
PKG-INFO 7KB
requires.txt 136B
SOURCES.txt 371B
top_level.txt 10B
dependency_links.txt 1B
sheatless
pdf_predictor.py 4KB
engine.py 15KB
instruments.yaml 2KB
__init__.py 120B
api.py 4KB
setup.cfg 865B
README.md 7KB
共 16 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 14w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 主要物体检测15-YOLO(v5至v9)、COCO、CreateML、Darknet、Paligemma、TFRecord、VOC数据集合集.rar
- Google Maps API Web 服务的 Python 客户端库.zip
- Google Authenticator 服务器端代码.zip
- logo标志检测26-YOLOv7、COCO、CreateML、Darknet、Paligemma、TFRecord、VOC数据集合集.rar
- golang 的算法和数据结构.zip
- Vue + SpringBoot前后端项目实例
- Golang 日志库.zip
- DET组件查找器检测15-YOLO(v5至v9)、COCO、CreateML、Darknet、Paligemma、TFRecord、VOC数据集合集.rar
- jsp实现增删改查,自行建立数据库和表,表的四个字段分别为 name ,stuid , zhuanye ,id 主键自增,stuid 添加 unique 约束,已解决类爆炸问题
- 第02章 文件与用户管理
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功