Matlab-使用Matlab+GPU实现的矢量化多模态LSTM算法.zip

共45个文件

m：32个

txt：7个

mat：5个

matlab

lstm

人工智能

需积分: 1 159 浏览量 2024-03-09 17:07:57 上传评论收藏 35.09MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Matlab_使用Matlab+GPU实现的矢量化多模态LSTM算法.zip （45个子文件）

Matlab_使用Matlab+GPU实现的矢量化多模态LSTM算法

data

speaker-naming

processed_training_data

val_audio

note.txt 31B

train_audio

note.txt 31B

train_face

note.txt 31B

val_face

note.txt 31B

raw_full

test

5classes

info.txt 22B

writer

next_letter.m 576B

next_char.m 476B

gen_char_data_from_text_2.m 2KB

graham

note.txt 31B

all_text.mat 1.17MB

next_word.m 688B

utils

to_gpu.m 59B

softmax.m 263B

relu.m 53B

set_grad_to_zeros_v52.m 1KB

sigmoid.m 65B

deri_sigmoid.m 66B

deri_relu.m 70B

deri_tanh.m 170B

save_weights.m 110B

optimization

adagrad_init.m 1KB

adagrad_update.m 4KB

core

lstm_init_v52.m 7KB

lstm_core_v52.m 13KB

computeNumericalGradient.m 3KB

lstm_verify.m 765B

applications

speaker-naming

face_audio

sn_FA_configure.m 666B

test_FA_all_v52.m 3KB

sn_FA_5c_train_v52.m 8KB

face_only

test_face_all.m 2KB

sn_face_configure.m 659B

sn_face_train.m 5KB

audio_only

sn_audio_train.m 5KB

sn_audio_configure.m 657B

test_audio_all.m 2KB

writer

lstm_writer_val.m 1KB

lstm_writer_configure.m 665B

lstm_writer_test.m 251B

lstm_writer_train.m 2KB

results

speaker-naming

face_audio

pre-train.mat 8.66MB

face_only

pre-train.mat 8.5MB

audio_only

pre-train.mat 7.9MB

writer

writer.mat 9.2MB

README.md 5KB

# vLSTM Vectorized Long Short-term Memory (LSTM) using Matlab and GPU It supports both the regular LSTM described [here](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf) and the multimodal LSTM described [here](http://www.jimmyren.com/papers/AAAI16_Ren.pdf). If you are interested, visit [here](https://github.com/jimmy-ren/lstm_speaker_naming_aaai16) for details of the experiments described in the multimodal LSTM [paper](http://www.jimmyren.com/papers/AAAI16_Ren.pdf). ## Hardware/software requirements To run the code, you have to have a NVidia GPU with at least 4GB GPU memory. The code was tested in Ubuntu 14.04 and Windows 7 using MATLAB 2014b. ## Character level language generation The task is the same as that in the [char-rnn](https://github.com/karpathy/char-rnn) project, which is a good indicator to show if the LSTM implementation is effective. ### Generation using a pre-trained model Open the `applications/writer` folder but don't enter it. Run `lstm_writer_test.m` and it will start to generate. In the first a few lines of `lstm_writer_val.m` you can adjust the starting character. Currently, it starts with "I", so a typical generation is like `I can be the most programmers who would be try to them. But I was anyway that the most professors and press right. It's hard to make them things like the startups that was much their fundraising the founders who was by being worth in the side of a startup would be to be the smart with good as work with an angel round by companies and funding a lot of the partners is that they want to competitive for the top was a strange could be would be a company that was will be described startups in the paper we could probably be were the same thing that they can be some to investors...` ### Data generation and training Paul Graham's [essay](http://www.paulgraham.com/articles.html) is used in this sample. All text is stored in `data/writer/all_text.mat` as a string. You may load it manually and see the content. The whole text contains about 2 million characters. To generate the training data, please run `data/writer/gen_char_data_from_text_2.m`. It will generate four .mat files under `data/writer/graham`, each file contains 10000 character sequences of length 50, so the four files adds upto 2 million characters. Once the data is ready, you may run `lstm_writer_train.m` under `applications/writer` to start the training. During training, intermediate models will be saved under `results/writer`. You may launch another Matlab and run `lstm_writer_test.m` with the newly saved model instead of `writer.mat` to test it. ## Multimodal LSTM for speaker naming The training procedure of the Multimodal speaker naming LSTM as well as the pre-processed data (the one you can use off-the-shelf) has been releaseed. Please follow the instruction below to perform the training. ### Download data Please go [here](https://drive.google.com/folderview?id=0B6nl_KFEGWG0QWVJakhRcEUyVDQ&usp=sharing) or [here](http://pan.baidu.com/s/1kV6KbOF) to download all the pre-processed training data and put all the files under `data/speaker-naming/processed_training_data/`, following the existing folder structure inside. In addition, please go [here](https://drive.google.com/folderview?id=0B6nl_KFEGWG0NkdYcEduc2twQW8&usp=sharing) or [here](http://pan.baidu.com/s/1bpymRHd) to download the pre-processed multimodal validation data and put all the files under `data/speaker-naming/raw_full/`, following the existing folder structure inside. ### Start training Once all the data is in place, you may start to train 3 types of models, namly the model only classifies the face features, the model only classifies the audio features and the model simultaneously classifies the face+audio multimodal features (multimodal LSTM). To train the face only model, you may run this [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/face_only/sn_face_train.m). To train the audio only model, you may run this [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/audio_only/sn_audio_train.m). To train the face+audio multimodal LSTM model, you may run this [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/face_audio/sn_FA_5c_train_v52.m). Meanwhile, you can also run tests for the aforementioned three models by using the pre-train models. This [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/face_only/test_face_all.m) for testing the pre-train face only model. This [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/audio_only/test_audio_all.m) for testing the pre-train audio only model. This [script](https://github.com/jimmy-ren/vLSTM/blob/master/applications/speaker-naming/face_audio/test_FA_all_v52.m) for testing the pre-train face-audio multimodal LSTM model.

评论收藏

内容反馈