A_PyTorch_implementation_of_Speech

共46个文件

py：26个

sh：8个

txt：2个

需积分: 5 79 浏览量 2024-08-24 23:53:07 上传评论收藏 672KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

A_PyTorch_implementation_of_Speech_Transformer,_an_Speech-Transformer.zip （46个子文件）

DataXujing-Speech-Transformer-8cd4e0f

tools

Makefile 612B

kaldi-io-for-python.tar.gz 562KB

src

__init__.py 0B

transformer

__init__.py 0B

loss.py 2KB

module.py 2KB

decoder.py 10KB

encoder.py 3KB

transformer.py 4KB

attention.py 3KB

optimizer.py 2KB

data

__init__.py 0B

data.py 7KB

utils

utils.py 4KB

__init__.py 0B

text2token.py 3KB

mergejson.py 3KB

scp2json.py 788B

data2json.sh 2KB

filt.py 937B

json2trn.py 2KB

dump.sh 2KB

bin

recognize.py 3KB

train.py 7KB

solver

__init__.py 0B

solver.py 8KB

egs

aishell

steps 34B

utils 34B

cmd.sh 1KB

local

score.sh 653B

aishell_data_prep.sh 2KB

path.sh 693B

figures

train-k0.2-bf15000-shuffle-ls0.1.png 36KB

train-k0.2-bf15000-shuffle-ls0.1-lr.png 28KB

run.sh 6KB

conf

fbank.conf 43B

requirements.txt 6B

test

data

train_nodup_sp_units.txt 31KB

data.json 9KB

test_data.py 1KB

path.sh 48B

test_decode.py 2KB

learn_visdom.py 741B

learn_pytorch.py 3KB

.gitignore 168B

README.md 3KB

# Speech Transformer: End-to-End ASR with Transformer A PyTorch implementation of Speech Transformer [1], an end-to-end automatic speech recognition with [Transformer](https://arxiv.org/abs/1706.03762) network, which directly converts acoustic features to character sequence using a single nueral network. ``` Ad: Welcome to join Kwai Speech Team, make your career great! Send your resume to: xukaituo [at] kuaishou [dot] com! 广告时间：欢迎加入快手语音组，make your career great! 快发送简历到xukaituo [at] kuaishou [dot] com吧！広告：Kwai チームへようこそ！自分のキャリアを照らそう！レジュメをこちらへ: xukaituo [at] kuaishou [dot] com! ``` ## Install - Python3 (recommend Anaconda) - PyTorch 0.4.1+ - [Kaldi](https://github.com/kaldi-asr/kaldi) (just for feature extraction) - `pip install -r requirements.txt` - `cd tools; make KALDI=/path/to/kaldi` - If you want to run `egs/aishell/run.sh`, download [aishell](http://www.openslr.org/33/) dataset for free. ## Usage ### Quick start ```bash $ cd egs/aishell # Modify aishell data path to your path in the begining of run.sh $ bash run.sh ``` That's all! You can change parameter by `$ bash run.sh --parameter_name parameter_value`, egs, `$ bash run.sh --stage 3`. See parameter name in `egs/aishell/run.sh` before `. utils/parse_options.sh`. ### Workflow Workflow of `egs/aishell/run.sh`: - Stage 0: Data Preparation - Stage 1: Feature Generation - Stage 2: Dictionary and Json Data Preparation - Stage 3: Network Training - Stage 4: Decoding ### More detail `egs/aishell/run.sh` provide example usage. ```bash # Set PATH and PYTHONPATH $ cd egs/aishell/; . ./path.sh # Train $ train.py -h # Decode $ recognize.py -h ``` #### How to visualize loss? If you want to visualize your loss, you can use [visdom](https://github.com/facebookresearch/visdom) to do that: 1. Open a new terminal in your remote server (recommend tmux) and run `$ visdom`. 2. Open a new terminal and run `$ bash run.sh --visdom 1 --visdom_id "<any-string>"` or `$ train.py ... --visdom 1 --vidsdom_id "<any-string>"`. 3. Open your browser and type `<your-remote-server-ip>:8097`, egs, `127.0.0.1:8097`. 4. In visdom website, chose `<any-string>` in `Environment` to see your loss. ![loss](egs/aishell/figures/train-k0.2-bf15000-shuffle-ls0.1.png) #### How to resume training? ```bash $ bash run.sh --continue_from <model-path> ``` #### How to solve out of memory? When happened in training, try to reduce `batch_size`. `$ bash run.sh --batch_size <lower-value>`. ## Results | Model | CER | Config | | :---: | :-: | :----: | | LSTMP | 9.85| 4x(1024-512). See [kaldi-ktnet1](https://github.com/kaituoxu/kaldi-ktnet1/blob/ktnet1/egs/aishell/s5/local/nnet1/run_4lstm.sh)| | Listen, Attend and Spell | 13.2 | See [Listen-Attend-Spell](https://github.com/kaituoxu/Listen-Attend-Spell)'s egs/aishell/run.sh | | SpeechTransformer | 12.8 | See egs/aishell/run.sh | ## Reference - [1] Yuanyuan Zhao, Jie Li, Xiaorui Wang, and Yan Li. "The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition." ICASSP 2019.

评论收藏

内容反馈