深度学习实战10-数学公式识别-将图片转换为Latex(img2Latex).zip资源-CSDN文库

共91个文件

pyc：27个

js：25个

py：22个

版权申诉

python

深度学习

5星 · 超过95%的资源 169 浏览量 2023-08-24 09:49:09 上传评论收藏 1.1MB ZIP 举报

深度学习是一种人工智能领域的核心技术，它模仿人脑神经网络的工作方式，通过学习大量数据来自动提取特征并进行预测或分类。在这个“深度学习实战10-数学公式识别-将图片转换为Latex(img2Latex)”项目中，我们将探讨如何利用深度学习模型将图像中的数学公式转化为LaTeX代码。 `README.md`文件通常包含项目介绍、安装指南、运行步骤等信息。在这个项目中，它可能详细解释了如何准备环境，如何运行代码，以及项目的目标和预期结果。 `123.png`是可能包含数学公式的示例图像，用于训练和测试模型。深度学习模型需要大量的标注数据，这些数据通常是成对的输入（在这种情况下是图像）和输出（LaTeX代码）。模型会学习从图像像素中识别出数学符号和结构。 `gui.py`可能是一个图形用户界面（GUI）程序，允许用户上传图片并显示转换后的LaTeX代码。GUI简化了与软件的交互，使得非程序员也能轻松使用。 `pix2tex.py`可能是主要的处理脚本，它包含了图像到LaTeX转换的核心算法。这个脚本可能包含图像预处理、特征提取、序列建模（如循环神经网络RNN或长短时记忆LSTM）以及解码部分，将学到的特征转换为LaTeX字符串。 `models.py`则可能包含了定义深度学习模型的代码，例如卷积神经网络（CNN）用于图像特征提取，以及上述提到的RNN或LSTM用于序列生成。 `train_resizer.py`和`train.py`是训练相关的脚本。`train_resizer.py`可能涉及图像的预处理，如尺寸调整、归一化等，以便于输入到模型中。`train.py`则是实际的训练脚本，它会调用`models.py`中的模型定义，并使用`123.png`等数据集进行训练，更新模型参数以优化性能。 `eval.py`可能包含了模型评估的代码，用于在验证集上测试模型的性能，比如准确率、召回率等指标，以确保模型在未见过的数据上表现良好。 `setup_desktop.py`可能是一个用于在桌面上设置快捷方式或启动脚本的文件，方便用户快速运行程序。 Python作为主要的编程语言，以其丰富的库支持深度学习项目。Keras、TensorFlow和PyTorch是常用的深度学习框架，它们可能被用来实现模型定义和训练过程。此外，PIL（Python Imaging Library）或OpenCV可能用于图像处理，而matplotlib或seaborn可能用于数据可视化。这个项目展示了深度学习如何应用于特定的自然语言处理任务——数学公式识别。通过学习和理解这些代码，开发者不仅可以提升深度学习技术，还能掌握将图像内容转化为结构化文本的方法。

资源推荐

资源详情

资源评论

收起资源包目录

深度学习实战10-数学公式识别-将图片转换为Latex(img2Latex).zip （91个子文件）

gui.py 9KB

setup_desktop.py 4KB

newlatex

gongshi6.png 9KB

img2latex.py 394B

eval.py 5KB

resources

resources.py 920KB

MathJax.js 511KB

processing-icon-anim.svg 680B

resources.qrc 329B

icon.svg 4KB

__pycache__

resources.cpython-38.pyc 224KB

resources.cpython-37.pyc 224KB

LICENSE 1KB

models.py 6KB

utils

utils.py 5KB

__init__.py 25B

__pycache__

utils.cpython-38.pyc 5KB

__init__.cpython-37.pyc 179B

utils.cpython-37.pyc 5KB

__init__.cpython-38.pyc 183B

dataset

__init__.py 144B

arxiv.py 5KB

tokenizer.json 24KB

dataset.py 11KB

scraping.py 3KB

extract_latex.py 3KB

render.py 6KB

latex2png.py 4KB

postprocess.py 695B

__pycache__

latex2png.cpython-37.pyc 4KB

dataset.cpython-37.pyc 10KB

render.cpython-37.pyc 5KB

render.cpython-38.pyc 5KB

arxiv.cpython-38.pyc 5KB

scraping.cpython-38.pyc 3KB

scraping.cpython-37.pyc 3KB

demacro.cpython-37.pyc 3KB

extract_latex.cpython-38.pyc 3KB

__init__.cpython-37.pyc 330B

dataset.cpython-38.pyc 10KB

extract_latex.cpython-37.pyc 3KB

latex2png.cpython-38.pyc 4KB

arxiv.cpython-37.pyc 5KB

__init__.cpython-38.pyc 334B

demacro.cpython-38.pyc 3KB

preprocessing

preprocess_formulas.py 4KB

preprocess_latex.js 10KB

generate_latex_vocab.py 3KB

third_party

match-at

package.json 1KB

README.md 125B

katex

cli.js 927B

LICENSE.txt 1KB

src

Options.js 5KB

delimiter.js 19KB

buildMathML.js 14KB

domTree.js 7KB

buildHTML.js 48KB

parseData.js 221B

parseTree.js 377B

symbols.js 32KB

utils.js 2KB

environments.js 8KB

Parser.js 26KB

functions.js 16KB

buildCommon.js 14KB

fontMetrics.js 4KB

Lexer.js 5KB

Style.js 3KB

ParseError.js 1KB

fontMetricsData.js 66KB

buildTree.js 1KB

Settings.js 783B

mathMLTree.js 3KB

package.json 2KB

katex.js 2KB

README.md 4KB

README.md 60B

demacro.py 3KB

123.png 7KB

requirements.txt 387B

train.py 4KB

__pycache__

pix2tex.cpython-38.pyc 7KB

models.cpython-38.pyc 5KB

eval.cpython-38.pyc 5KB

models.cpython-37.pyc 5KB

pix2tex.cpython-37.pyc 7KB

pix2tex.py 8KB

README.md 6KB

settings

debug.yaml 998B

config.yaml 779B

train_resizer.py 6KB

# pix2tex - LaTeX OCR [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ba_qCGJl29dFQqfBjdqMik3o_EqPE4fr) The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code. ![header](https://user-images.githubusercontent.com/55287601/109183599-69431f00-778e-11eb-9809-d42b9451e018.png) ## Requirements ### Model * PyTorch (tested on v1.7.1) * Python 3.7+ & dependencies (`requirements.txt`) ``` pip install -r requirements.txt ``` ### Dataset In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools: * [XeLaTeX](https://www.ctan.org/pkg/xetex) * [ImageMagick](https://imagemagick.org/) with [Ghostscript](https://www.ghostscript.com/index.html). (for converting pdf to png) * [Node.js](https://nodejs.org/) to run [KaTeX](https://github.com/KaTeX/KaTeX) (for normalizing Latex code) * [`de-macro`](https://www.ctan.org/pkg/de-macro) >= 1.4 (only for parsing arxiv papers) * Python 3.7+ & dependencies (`requirements.txt`) ## Using the model 1. Download/Clone this repository 2. For now you need to install the Python dependencies specified in `requirements.txt` (look [above](#Requirements)) 3. Download the `weights.pth` (and optionally `image_resizer.pth`) file from the [Releases](https://github.com/lukas-blecher/LaTeX-OCR/releases/latest)->Assets section and place it in the `checkpoints` directory Thanks to [@katie-lim](https://github.com/katie-lim), you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with `python gui.py`. From here you can take a screenshot and the predicted latex code is rendered using [MathJax](https://www.mathjax.org/) and copied to your clipboard. ![demo](https://user-images.githubusercontent.com/55287601/117812740-77b7b780-b262-11eb-81f6-fc19766ae2ae.gif) If the model is unsure about the what's in the image it might output a different prediction every time you click "Retry". With the `temperature` parameter you can control this behavior (low temperature will produce the same result). Alternatively you can use `pix2tex.py` with similar functionality as `gui.py`, only as command line tool. In this case you don't need to install PyQt5. Using this script you can also parse already existing images from the disk. **Note:** As of right now it works best with images of smaller resolution. Don't zoom in all the way before taking a picture. Double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong. **Update:** I have trained an image classifier on randomly scaled images of the training data to predict the original size. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. To use this preprocessing step, all you have to do is download the second weights file mentioned above. You should be able to take bigger (or smaller) images of the formula and still get a satisfying result ## Training the model [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MqZSKzSgEnJB9lU7LyPma4bo4J3dnj1E) 1. First we need to combine the images with their ground truth labels. I wrote a dataset class (which needs further improving) that saves the relative paths to the images with the LaTeX code they were rendered with. To generate the dataset pickle file run ``` python dataset/dataset.py --equations path_to_textfile --images path_to_images --tokenizer dataset/tokenizer.json --out dataset.pkl ``` You can find my generated training data on the [Google Drive](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO) as well (formulae.zip - images, math.txt - labels). Repeat the step for the validation and test data. All use the same label text file. 2. Edit the `data` (and `valdata`) entry in the config file to the newly generated `.pkl` file. Change other hyperparameters if you want to. See `settings/config.yaml` for a template. 3. Now for the actual training run ``` python train.py --config path_to_config_file ``` If you want to use your own data you might be interested in creating your own tokenizer with ``` python dataset/dataset.py --equations path_to_textfile --vocab-size 8000 --out tokenizer.json ``` Don't forget to update the path to the tokenizer in the config file and set `num_tokens` to your vocabulary size. ## Model The model consist of a ViT [[1](#References)] encoder with a ResNet backbone and a Transformer [[2](#References)] decoder. ### Performance | BLEU score | normed edit distance | | ---------- | -------------------- | | 0.88 | 0.10 | ## Data We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. [wikipedia](https://www.wikipedia.org), [arXiv](https://www.arxiv.org). We also use the formulae from the [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) dataset. All of it can be found [here](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO) ### Fonts Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math ## TODO - [x] add more evaluation metrics - [x] create a GUI - [ ] add beam search - [ ] support handwritten formulae - [ ] reduce model size (distillation) - [ ] find optimal hyperparameters - [ ] tweak model structure - [ ] fix data scraping and scrape more data - [ ] trace the model ## Contribution Contributions of any kind are welcome. ## Acknowledgment Code taken and modified from [lucidrains](https://github.com/lucidrains), [rwightman](https://github.com/rwightman/pytorch-image-models), [im2markup](https://github.com/harvardnlp/im2markup), [arxiv_leaks](https://github.com/soskek/arxiv_leaks), [pkra: Mathjax](https://github.com/pkra/MathJax-single-file), [harupy: snipping tool](https://github.com/harupy/snipping-tool) ## References [1] [An Image is Worth 16x16 Words](https://arxiv.org/abs/2010.11929) [2] [Attention Is All You Need](https://arxiv.org/abs/1706.03762)

评论收藏

内容反馈

版权申诉