# vLSTM
Vectorized Long Short-term Memory (LSTM) using Matlab and GPU <br>
It supports both the regular LSTM described [here](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf) and the multimodal LSTM described [here](http://www.jimmyren.com/papers/AAAI16_Ren.pdf). <br>
If you are interested, visit [here](https://github.com/jimmy-ren/lstm_speaker_naming_aaai16) for details of the experiments described in the multimodal LSTM [paper](http://www.jimmyren.com/papers/AAAI16_Ren.pdf).
## Hardware/software requirements
To run the code, you have to have a NVidia GPU with at least 4GB GPU memory. The code was tested in Ubuntu 14.04 and Windows 7 using MATLAB 2014b.
## Character level language generation
The task is the same as that in the [char-rnn](https://github.com/karpathy/char-rnn) project, which is a good indicator to show if the LSTM implementation is effective.
### Generation using a pre-trained model
Open the `applications/writer` folder but don't enter it. Run `lstm_writer_test.m` and it will start to generate. In the first a few lines of `lstm_writer_val.m` you can adjust the starting character. Currently, it starts with "I", so a typical generation is like <br>
`I can be the most programmers who would be try to them. But I was anyway that the most professors and press right. It's hard to make them things like the startups that was much their fundraising the founders who was by being worth in the side of a startup would be to be the smart with good as work with an angel round by companies and funding a lot of the partners is that they want to competitive for the top was a strange could be would be a company that was will be described startups in the paper we could probably be were the same thing that they can be some to investors...`
### Data generation and training
Paul Graham's [essay](http://www.paulgraham.com/articles.html) is used in this sample. All text is stored in `data/writer/all_text.mat` as a string. You may load it manually and see the content. The whole text contains about 2 million characters. To generate the training data, please run `data/writer/gen_char_data_from_text_2.m`. It will generate four .mat files under `data/writer/graham`, each file contains 10000 character sequences of length 50, so the four files adds upto 2 million characters.<br>
Once the data is ready, you may run `lstm_writer_train.m` under `applications/writer` to start the training. During training, intermediate models will be saved under `results/writer`. You may launch another Matlab and run `lstm_writer_test.m` with the newly saved model instead of `writer.mat` to test it.
## Multimodal LSTM for speaker naming
The training procedure of the Multimodal speaker naming LSTM as well as the pre-processed data (the one you can use off-the-shelf) has been releaseed. Please follow the instruction below to perform the training.
### Download data
Please go [here](https://drive.google.com/folderview?id=0B6nl_KFEGWG0QWVJakhRcEUyVDQ&usp=sharing) or [here](http://pan.baidu.com/s/1kV6KbOF) to download all the pre-processed training data and put all the files under `data/speaker-naming/processed_training_data/`, following the existing folder structure inside. <br>
In addition, please go [here](https://drive.google.com/folderview?id=0B6nl_KFEGWG0NkdYcEduc2twQW8&usp=sharing) or [here](http://pan.baidu.com/s/1bpymRHd) to download the pre-processed multimodal validation data and put all the files under `data/speaker-naming/raw_full/`, following the existing folder structure inside. <br>
### Start training
Once all the data is in place, you may start to train 3 types of models, namly the model only classifies the face features, the model only classifies the audio features and the model simultaneously classifies the face+audio multimodal features (multimodal LSTM). <br>
To train the face only model, you may run this [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/face_only/sn_face_train.m). <br>
To train the audio only model, you may run this [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/audio_only/sn_audio_train.m). <br>
To train the face+audio multimodal LSTM model, you may run this [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/face_audio/sn_FA_5c_train_v52.m). <br>
Meanwhile, you can also run tests for the aforementioned three models by using the pre-train models. <br>
This [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/face_only/test_face_all.m) for testing the pre-train face only model. <br>
This [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/audio_only/test_audio_all.m) for testing the pre-train audio only model. <br>
This [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/face_audio/test_FA_all_v52.m) for testing the pre-train face-audio multimodal LSTM model. <br>
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
Matlab_使用Matlab+GPU实现的矢量化多模态LSTM算法.zip (45个子文件)
Matlab_使用Matlab+GPU实现的矢量化多模态LSTM算法
data
speaker-naming
processed_training_data
val_audio
note.txt 31B
train_audio
note.txt 31B
note.txt 31B
train_face
note.txt 31B
val_face
note.txt 31B
raw_full
test
5classes
info.txt 22B
writer
next_letter.m 576B
next_char.m 476B
gen_char_data_from_text_2.m 2KB
graham
note.txt 31B
all_text.mat 1.17MB
next_word.m 688B
utils
to_gpu.m 59B
softmax.m 263B
relu.m 53B
set_grad_to_zeros_v52.m 1KB
sigmoid.m 65B
deri_sigmoid.m 66B
deri_relu.m 70B
deri_tanh.m 170B
save_weights.m 110B
optimization
adagrad_init.m 1KB
adagrad_update.m 4KB
core
lstm_init_v52.m 7KB
lstm_core_v52.m 13KB
computeNumericalGradient.m 3KB
lstm_verify.m 765B
applications
speaker-naming
face_audio
sn_FA_configure.m 666B
test_FA_all_v52.m 3KB
sn_FA_5c_train_v52.m 8KB
face_only
test_face_all.m 2KB
sn_face_configure.m 659B
sn_face_train.m 5KB
audio_only
sn_audio_train.m 5KB
sn_audio_configure.m 657B
test_audio_all.m 2KB
writer
lstm_writer_val.m 1KB
lstm_writer_configure.m 665B
lstm_writer_test.m 251B
lstm_writer_train.m 2KB
results
speaker-naming
face_audio
pre-train.mat 8.66MB
face_only
pre-train.mat 8.5MB
audio_only
pre-train.mat 7.9MB
writer
writer.mat 9.2MB
README.md 5KB
共 45 条
- 1
资源评论
DdddJMs__135
- 粉丝: 1051
- 资源: 283
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功