# Image TO Latex
## dataset
- http://lstm.seas.harvard.edu/latex/data/
A general-purpose, deep learning-based system to decompile an image into presentational markup. For example, we can infer the LaTeX or HTML source from a rendered image.
<p align="center"><img src="http://lstm.seas.harvard.edu/latex/network.png" width="400"></p>
An example input is a rendered LaTeX formula:
<p align="center"><img src="http://lstm.seas.harvard.edu/latex/results/website/images/119b93a445-orig.png"></p>
The goal is to infer the LaTeX formula that can render such an image:
```
d s _ { 1 1 } ^ { 2 } = d x ^ { + } d x ^ { - } + l _ { p } ^ { 9 } \frac { p _ { - } } { r ^ { 7 } } \delta ( x ^ { - } ) d x ^ { - } d x ^ { - } + d x _ { 1 } ^ { 2 } + \; \cdots \; + d x _ { 9 } ^ { 2 }
```
Our model employs a convolutional network for text and layout recognition in tandem with an attention-based neural machine translation system. The use of attention additionally provides an alignment from the generated markup to the original source image:
<p align="center"><img src="http://lstm.seas.harvard.edu/latex/mathex.png"></p>
See [our website](http://lstm.seas.harvard.edu/latex/) for a complete interactive version of this visualization over the test set. Our paper (http://arxiv.org/pdf/1609.04938v1.pdf) provides more technical details of this model.
What You Get Is What You See: A Visual Markup Decompiler
Yuntian Deng, Anssi Kanervisto, and Alexander M. Rush
http://arxiv.org/pdf/1609.04938v1.pdf
# Prerequsites
Most of the code is written in [Torch](http://torch.ch), with Python for preprocessing.
### Torch
#### Model
The following lua libraries are required for the main model.
* tds
* class
* nn
* nngraph
* cunn
* cudnn
* cutorch
Note that currently we only support **GPU** since we use cudnn in the CNN part.
#### Preprocess
Python
* Pillow
* numpy
Optional: We use Node.js and KaTeX for preprocessing [Installation](https://nodejs.org/en/)
##### pdflatex [Installaton](https://www.tug.org/texlive/)
Pdflatex is used for rendering LaTex during evaluation.
##### ImageMagick convert [Installation](http://www.imagemagick.org/script/index.php)
Convert is used for rending LaTex during evaluation.
##### Webkit2png [Installation](http://www.paulhammond.org/webkit2png/)
Webkit2png is used for rendering HTML during evaluation.
#### Evaluate
Python image-based evaluation
* python-Levenshtein
* matplotlib
* Distance
```
wget http://lstm.seas.harvard.edu/latex/third_party/Distance-0.1.3.tar.gz
```
```
tar zxf Distance-0.1.3.tar.gz
```
```
cd distance; sudo python setup.py install
```
##### Perl [Installation](https://www.perl.org/)
Perl is used for evaluating BLEU score.
# Usage
We assume that the working directory is `im2markup` throught this document.
The task is to convert an image into its presentational markup, so we need to specify a `data_base_dir` storing the images, a `label_path` storing all labels (e.g., latex formulas). Besides, we need to specify a `data_path` for the training (or test) data samples. The format of `data_path` shall look like:
```
<img_name1> <label_idx1>
<img_name2> <label_idx2>
<img_name3> <label_idx3>
...
```
where `<label_idx>` denotes the line index of the label (starting from 0).
## Quick Start (Math-to-LaTeX Toy Example)
To get started with, we provide a toy Math-to-LaTex example. We have a larger dataset [im2latex-100k-dataset](https://zenodo.org/record/56198#.V2p0KTXT6eA) of the same format but with much more samples.
### Preprocess
**NOTICE**
the dataset supported contains the image folder and formula lst file. especailly the formula lst file, it is decoded with the unix newline. so the function in the `scripts/preprocessing/preprocess_formulas.py` must be modified for the python 3.x .
As is mentioned in the dataset webset, we must use `open(formula_lst_dir,newline='\n)`, but when I modified like that ,an unicode error occured in the line 7489. What should we do? ps. my runing device is mac pro.
- open the formula file use vim, code like `vim formula.lst`
- type `:set fileencoding=utf-8` and saved the file `:wq`
- modified the function as `open(formula.lst,newline='\n',encoding='ISO-8859-1')`
- run success!
- Any questions? please create an issue.
The images in the dataset contain a LaTeX formula rendered on a full page. To accelerate training, we need to preprocess the images.
```
python scripts/preprocessing/preprocess_images.py --input-dir data/sample/images --output-dir data/sample/images_processed
```
The above command will crop the formula area, and group images of similar sizes to facilitate batching.
Next, the LaTeX formulas need to be tokenized or normalized.
```
python scripts/preprocessing/preprocess_formulas.py --mode normalize --input-file data/sample/formulas.lst --output-file data/sample/formulas.norm.lst
```
The above command will normalize the formulas. Note that this command will produce some error messages since some formulas cannot be parsed by the KaTeX parser.
Then we need to prepare train, validation and test files. We will exclude large images from training and validation set, and we also ignore formulas with too many tokens or formulas with grammar errors.
```
python scripts/preprocessing/preprocess_filter.py --filter --image-dir data/sample/images_processed --label-path data/sample/formulas.norm.lst --data-path data/sample/train.lst --output-path data/sample/train_filter.lst
```
```
python scripts/preprocessing/preprocess_filter.py --filter --image-dir data/sample/images_processed --label-path data/sample/formulas.norm.lst --data-path data/sample/validate.lst --output-path data/sample/validate_filter.lst
```
```
python scripts/preprocessing/preprocess_filter.py --no-filter --image-dir data/sample/images_processed --label-path data/sample/formulas.norm.lst --data-path data/sample/test.lst --output-path data/sample/test_filter.lst
```
Finally, we generate the vocabulary from training set. All tokens occuring less than (including) 1 time will be excluded from the vocabulary.
```
python scripts/preprocessing/generate_latex_vocab.py --data-path data/sample/train_filter.lst --label-path data/sample/formulas.norm.lst --output-file data/sample/latex_vocab.txt
```
### Train
For a complete set of parameters, run
```
th src/train.lua -h
```
The most important parameters for training are `data_base_dir`, which specifies where the images live; `data_path`, the training file; `label_path`, the LaTeX formulas, `val_data_path`, the validation file; `vocab_file`, the vocabulary file with one token per each line.
```
th src/train.lua -phase train -gpu_id 1 \
-model_dir model \
-input_feed -prealloc \
-data_base_dir data/sample/images_processed/ \
-data_path data/sample/train_filter.lst \
-val_data_path data/sample/validate_filter.lst \
-label_path data/sample/formulas.norm.lst \
-vocab_file data/sample/latex_vocab.txt \
-max_num_tokens 150 -max_image_width 500 -max_image_height 160 \
-batch_size 20 -beam_size 1
```
In the default setting, the log file will be put to `log.txt`. The log file records the training and validation perplexities. `model_dir` speicifies where the models should be saved. The default parameters are optimized for the full dataset. In order to overfit on this toy example, use flags `-learning_rate 0.05`, `-lr_decay 1.0` and `-num_epochs 30`, then after 30 epochs, the training perplexity can reach around 1.1 and the validation perplexity can only reach around 17.
### Test
After training, we can load a model and use it to test on test dataset. We provide a model trained on the [im2latex-100k-dataset](https://zenodo.org/record/56198#.V2p0KTXT6eA).
```
mkdir -p model/latex; wget -P model/latex/ http://lstm.seas.harvard.edu/latex/model/latex/final-model
```
Now we can load the model and test on test set. Note that in order to output the predictions, a flag `-visualize` must be set.
```
th src/train.lua -phase test -gpu_id 1 -load_model -model_di
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
公式图片ocr,输入图片输出对应的latex表达式 本项目中包含三个模型,包含im2katex,errorchecker,dismodel分别实现图像预测生成katex,预测katex错误语法纠正,和预测katex语法错误判别器。 对于im2katex和errorchecker均可以使用项目文件夹下的makefile文件进行训练和测试。 参数说明 data_type: 使用哪种数据集训练得到的权重,目前有'handwritten'—手写体图片训练集, 'original'—印刷体图片训练集, 'merged'—二者合并的训练集,默认使用二者合并的训练集。 model_type 运行那个程序,目前有两个模型,一个是im2katex,也就是上文所说的输入公式图片,输出预测的katex表达式;另一个是error,这个是对im2katex的改进(预测的katex存在缺失会导致katex无法渲染生成图片,对该类错误的图片使用nmt的方式进行错误纠正),经过训练,效果不好,目前已经废除,暂不使用。 mode 何种网络模型运行方式,可选参数为'trainval', 'test', 'val
资源推荐
资源详情
资源评论
收起资源包目录
Python实现公式图片OCR项目源码+权重文件+数据集,输入公式图片转换为katex输出 (181个子文件)
checkpoint 101B
seq2seqAtt_781500.ckpt.data-00000-of-00001 73.37MB
.DS_Store 10KB
.DS_Store 8KB
.DS_Store 8KB
.DS_Store 6KB
.gitignore 62B
predict.html 1.19MB
predict_200.html 910KB
index.html 2KB
seq2seqAtt_781500.ckpt.index 3KB
newImage1.jpg 11KB
newImage0.jpg 7KB
100_enh.jpg 5KB
flower_0_1252.jpg 3KB
flower_0_808.jpg 3KB
flower_0_1082.jpg 3KB
flower_0_8342.jpg 3KB
flower_0_6836.jpg 3KB
flower_0_2713.jpg 3KB
flower_0_9896.jpg 3KB
flower_0_1811.jpg 3KB
flower_0_1727.jpg 3KB
flower_0_614.jpg 3KB
flower_0_6406.jpg 3KB
flower_0_6300.jpg 3KB
flower_0_3745.jpg 3KB
flower_0_9252.jpg 3KB
flower_0_1221.jpg 3KB
flower_0_8175.jpg 3KB
flower_0_358.jpg 3KB
flower_0_5819.jpg 3KB
flower_0_5396.jpg 3KB
flower_0_7894.jpg 3KB
flower_0_1914.jpg 3KB
LICENSE 1KB
makefile 1KB
readme.md 13KB
Readme.md 9KB
CMD.md 9KB
README_en.md 4KB
seq2seqAtt_781500.ckpt.meta 1.33MB
predict_out.npy 2.51MB
predict_details_saved.npy 179KB
properties.npy 16KB
What You Get Is What You See- A Visual Markup Decompiler.pdf 827KB
imagemakup.pdf 630KB
test.pkl 60B
web_demo.png 127KB
katex_renedr.png 117KB
tes.png 82KB
model.png 39KB
6.png 16KB
1.png 11KB
12.png 10KB
8.png 9KB
11.png 8KB
14.png 8KB
10.png 8KB
noise.png 7KB
rotate.png 7KB
13.png 6KB
1586587520752586-13.png 6KB
4.png 6KB
1586587918020267-4.png 6KB
trans.png 5KB
2.png 5KB
15865866565539892-2.png 5KB
1586587831208693-15.png 5KB
100001.png 5KB
100001.png 5KB
1586587493175126-1a0a2bdae8.png 4KB
158658715275716-1a0a2bdae8.png 4KB
3.png 3KB
1586587530193481-3.png 3KB
1586587638629975-1a1ee0df37.png 3KB
1586587704373087-1a1ee0df37.png 3KB
15865874728250759-1a0ad579d6.png 3KB
15865875079271822-1a0ad579d6.png 3KB
1586587557282547-1a00b6791d.png 3KB
15865874859745102-1a00b6791d.png 3KB
9.png 3KB
7.png 3KB
1586587931671571-7.png 3KB
1586587601574955-1a0fcb9fb1.png 3KB
demo.png 2KB
100001_0.png 1KB
100001_2.png 1KB
100001_5.png 917B
1586587520752586-13.png 893B
100001_4.png 855B
100001_6.png 847B
100001_9.png 835B
100001_1.png 834B
1586587918020267-4.png 830B
100001_3.png 806B
100001_7.png 789B
100001_8.png 782B
1586587831208693-15.png 748B
15865866565539892-2.png 720B
共 181 条
- 1
- 2
资源评论
云哲-吉吉2021
- 粉丝: 4055
- 资源: 1128
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 塑料检测23-YOLO(v5至v11)、COCO、CreateML、Paligemma、TFRecord、VOC数据集合集.rar
- Python圣诞节倒计时与节日活动管理系统
- 数据结构之哈希查找方法
- 系统DLL文件修复工具
- 塑料、玻璃、金属、纸张、木材检测36-YOLO(v5至v11)、COCO、CreateML、Paligemma、TFRecord、VOC数据集合集.rar
- Python新年庆典倒计时与节日活动智能管理助手
- Nosql期末复习资料
- 数据结构排序算法:插入排序、希尔排序、冒泡排序及快速排序算法
- 2011-2024年各省数字普惠金融指数数据.zip
- 计算机程序设计员三级(选择题)
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功