Python实现公式图片OCR项目源码+权重文件+数据集，输入公式图片转换为katex输出资源-CSDN文库

共181个文件

png：63个

py：51个

jpg：24个

版权申诉

python

数据集

75 浏览量 2023-12-10 15:54:00 上传评论收藏 71.07MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Python实现公式图片OCR项目源码+权重文件+数据集，输入公式图片转换为katex输出（181个子文件）

checkpoint 101B

seq2seqAtt_781500.ckpt.data-00000-of-00001 73.37MB

.DS_Store 10KB

.DS_Store 8KB

.DS_Store 6KB

.gitignore 62B

predict.html 1.19MB

predict_200.html 910KB

index.html 2KB

seq2seqAtt_781500.ckpt.index 3KB

newImage1.jpg 11KB

newImage0.jpg 7KB

100_enh.jpg 5KB

flower_0_1252.jpg 3KB

flower_0_808.jpg 3KB

flower_0_1082.jpg 3KB

flower_0_8342.jpg 3KB

flower_0_6836.jpg 3KB

flower_0_2713.jpg 3KB

flower_0_9896.jpg 3KB

flower_0_1811.jpg 3KB

flower_0_1727.jpg 3KB

flower_0_614.jpg 3KB

flower_0_6406.jpg 3KB

flower_0_6300.jpg 3KB

flower_0_3745.jpg 3KB

flower_0_9252.jpg 3KB

flower_0_1221.jpg 3KB

flower_0_8175.jpg 3KB

flower_0_358.jpg 3KB

flower_0_5819.jpg 3KB

flower_0_5396.jpg 3KB

flower_0_7894.jpg 3KB

flower_0_1914.jpg 3KB

LICENSE 1KB

makefile 1KB

readme.md 13KB

Readme.md 9KB

CMD.md 9KB

README_en.md 4KB

seq2seqAtt_781500.ckpt.meta 1.33MB

predict_out.npy 2.51MB

predict_details_saved.npy 179KB

properties.npy 16KB

What You Get Is What You See- A Visual Markup Decompiler.pdf 827KB

imagemakup.pdf 630KB

test.pkl 60B

web_demo.png 127KB

katex_renedr.png 117KB

tes.png 82KB

model.png 39KB

6.png 16KB

1.png 11KB

12.png 10KB

8.png 9KB

11.png 8KB

14.png 8KB

10.png 8KB

noise.png 7KB

rotate.png 7KB

13.png 6KB

1586587520752586-13.png 6KB

4.png 6KB

1586587918020267-4.png 6KB

trans.png 5KB

2.png 5KB

15865866565539892-2.png 5KB

1586587831208693-15.png 5KB

100001.png 5KB

158658715275716-1a0a2bdae8.png 4KB

1586587493175126-1a0a2bdae8.png 4KB

3.png 3KB

1586587530193481-3.png 3KB

1586587638629975-1a1ee0df37.png 3KB

1586587704373087-1a1ee0df37.png 3KB

15865874728250759-1a0ad579d6.png 3KB

15865875079271822-1a0ad579d6.png 3KB

1586587557282547-1a00b6791d.png 3KB

15865874859745102-1a00b6791d.png 3KB

9.png 3KB

7.png 3KB

1586587931671571-7.png 3KB

1586587601574955-1a0fcb9fb1.png 3KB

demo.png 2KB

100001_0.png 1KB

100001_2.png 1KB

100001_5.png 917B

1586587520752586-13.png 893B

100001_4.png 855B

100001_6.png 847B

100001_9.png 835B

100001_1.png 834B

1586587918020267-4.png 830B

100001_3.png 806B

100001_7.png 789B

100001_8.png 782B

1586587831208693-15.png 748B

15865866565539892-2.png 720B

共 181 条

# Image TO Latex ## dataset - http://lstm.seas.harvard.edu/latex/data/ A general-purpose, deep learning-based system to decompile an image into presentational markup. For example, we can infer the LaTeX or HTML source from a rendered image. <img src="http://lstm.seas.harvard.edu/latex/network.png" width="400"> An example input is a rendered LaTeX formula: <img src="http://lstm.seas.harvard.edu/latex/results/website/images/119b93a445-orig.png"> The goal is to infer the LaTeX formula that can render such an image: ``` d s _ { 1 1 } ^ { 2 } = d x ^ { + } d x ^ { - } + l _ { p } ^ { 9 } \frac { p _ { - } } { r ^ { 7 } } \delta ( x ^ { - } ) d x ^ { - } d x ^ { - } + d x _ { 1 } ^ { 2 } + \; \cdots \; + d x _ { 9 } ^ { 2 } ``` Our model employs a convolutional network for text and layout recognition in tandem with an attention-based neural machine translation system. The use of attention additionally provides an alignment from the generated markup to the original source image: <img src="http://lstm.seas.harvard.edu/latex/mathex.png"> See [our website](http://lstm.seas.harvard.edu/latex/) for a complete interactive version of this visualization over the test set. Our paper (http://arxiv.org/pdf/1609.04938v1.pdf) provides more technical details of this model. What You Get Is What You See: A Visual Markup Decompiler Yuntian Deng, Anssi Kanervisto, and Alexander M. Rush http://arxiv.org/pdf/1609.04938v1.pdf # Prerequsites Most of the code is written in [Torch](http://torch.ch), with Python for preprocessing. ### Torch #### Model The following lua libraries are required for the main model. * tds * class * nn * nngraph * cunn * cudnn * cutorch Note that currently we only support **GPU** since we use cudnn in the CNN part. #### Preprocess Python * Pillow * numpy Optional: We use Node.js and KaTeX for preprocessing [Installation](https://nodejs.org/en/) ##### pdflatex [Installaton](https://www.tug.org/texlive/) Pdflatex is used for rendering LaTex during evaluation. ##### ImageMagick convert [Installation](http://www.imagemagick.org/script/index.php) Convert is used for rending LaTex during evaluation. ##### Webkit2png [Installation](http://www.paulhammond.org/webkit2png/) Webkit2png is used for rendering HTML during evaluation. #### Evaluate Python image-based evaluation * python-Levenshtein * matplotlib * Distance ``` wget http://lstm.seas.harvard.edu/latex/third_party/Distance-0.1.3.tar.gz ``` ``` tar zxf Distance-0.1.3.tar.gz ``` ``` cd distance; sudo python setup.py install ``` ##### Perl [Installation](https://www.perl.org/) Perl is used for evaluating BLEU score. # Usage We assume that the working directory is `im2markup` throught this document. The task is to convert an image into its presentational markup, so we need to specify a `data_base_dir` storing the images, a `label_path` storing all labels (e.g., latex formulas). Besides, we need to specify a `data_path` for the training (or test) data samples. The format of `data_path` shall look like: ``` <img_name1> <label_idx1> <img_name2> <label_idx2> <img_name3> <label_idx3> ... ``` where `<label_idx>` denotes the line index of the label (starting from 0). ## Quick Start (Math-to-LaTeX Toy Example) To get started with, we provide a toy Math-to-LaTex example. We have a larger dataset [im2latex-100k-dataset](https://zenodo.org/record/56198#.V2p0KTXT6eA) of the same format but with much more samples. ### Preprocess **NOTICE** the dataset supported contains the image folder and formula lst file. especailly the formula lst file, it is decoded with the unix newline. so the function in the `scripts/preprocessing/preprocess_formulas.py` must be modified for the python 3.x . As is mentioned in the dataset webset, we must use `open(formula_lst_dir,newline='\n)`, but when I modified like that ,an unicode error occured in the line 7489. What should we do? ps. my runing device is mac pro. - open the formula file use vim, code like `vim formula.lst` - type `:set fileencoding=utf-8` and saved the file `:wq` - modified the function as `open(formula.lst,newline='\n',encoding='ISO-8859-1')` - run success! - Any questions? please create an issue. The images in the dataset contain a LaTeX formula rendered on a full page. To accelerate training, we need to preprocess the images. ``` python scripts/preprocessing/preprocess_images.py --input-dir data/sample/images --output-dir data/sample/images_processed ``` The above command will crop the formula area, and group images of similar sizes to facilitate batching. Next, the LaTeX formulas need to be tokenized or normalized. ``` python scripts/preprocessing/preprocess_formulas.py --mode normalize --input-file data/sample/formulas.lst --output-file data/sample/formulas.norm.lst ``` The above command will normalize the formulas. Note that this command will produce some error messages since some formulas cannot be parsed by the KaTeX parser. Then we need to prepare train, validation and test files. We will exclude large images from training and validation set, and we also ignore formulas with too many tokens or formulas with grammar errors. ``` python scripts/preprocessing/preprocess_filter.py --filter --image-dir data/sample/images_processed --label-path data/sample/formulas.norm.lst --data-path data/sample/train.lst --output-path data/sample/train_filter.lst ``` ``` python scripts/preprocessing/preprocess_filter.py --filter --image-dir data/sample/images_processed --label-path data/sample/formulas.norm.lst --data-path data/sample/validate.lst --output-path data/sample/validate_filter.lst ``` ``` python scripts/preprocessing/preprocess_filter.py --no-filter --image-dir data/sample/images_processed --label-path data/sample/formulas.norm.lst --data-path data/sample/test.lst --output-path data/sample/test_filter.lst ``` Finally, we generate the vocabulary from training set. All tokens occuring less than (including) 1 time will be excluded from the vocabulary. ``` python scripts/preprocessing/generate_latex_vocab.py --data-path data/sample/train_filter.lst --label-path data/sample/formulas.norm.lst --output-file data/sample/latex_vocab.txt ``` ### Train For a complete set of parameters, run ``` th src/train.lua -h ``` The most important parameters for training are `data_base_dir`, which specifies where the images live; `data_path`, the training file; `label_path`, the LaTeX formulas, `val_data_path`, the validation file; `vocab_file`, the vocabulary file with one token per each line. ``` th src/train.lua -phase train -gpu_id 1 \ -model_dir model \ -input_feed -prealloc \ -data_base_dir data/sample/images_processed/ \ -data_path data/sample/train_filter.lst \ -val_data_path data/sample/validate_filter.lst \ -label_path data/sample/formulas.norm.lst \ -vocab_file data/sample/latex_vocab.txt \ -max_num_tokens 150 -max_image_width 500 -max_image_height 160 \ -batch_size 20 -beam_size 1 ``` In the default setting, the log file will be put to `log.txt`. The log file records the training and validation perplexities. `model_dir` speicifies where the models should be saved. The default parameters are optimized for the full dataset. In order to overfit on this toy example, use flags `-learning_rate 0.05`, `-lr_decay 1.0` and `-num_epochs 30`, then after 30 epochs, the training perplexity can reach around 1.1 and the validation perplexity can only reach around 17. ### Test After training, we can load a model and use it to test on test dataset. We provide a model trained on the [im2latex-100k-dataset](https://zenodo.org/record/56198#.V2p0KTXT6eA). ``` mkdir -p model/latex; wget -P model/latex/ http://lstm.seas.harvard.edu/latex/model/latex/final-model ``` Now we can load the model and test on test set. Note that in order to output the predictions, a flag `-visualize` must be set. ``` th src/train.lua -phase test -gpu_id 1 -load_model -model_di

评论收藏

内容反馈

版权申诉