没有合适的资源？快使用搜索试试~ 我知道了~

文库首页人工智能深度学习Dan介绍Kaldi2的ppt

Dan介绍Kaldi2的ppt

需积分: 29 6 下载量 71 浏览量 2021-12-13 20:25:40 上传评论收藏 5.24MB PDF 举报

温馨提示

试读

37页

kaldi2 ASR

资源详情

资源评论

Recent work and near-term

plans for next-gen Kaldi

Brief introduction and examples

https://github.com/k2-fsa/icefall

Introduction

• Recap current status of project

• Describe near-term plans (RNN-T)

• Mention some recent experiments to give you a

flavor of our current work

• Will briefly show you the structure of Icefall

Big picture

• Our medium-term goal is to get good streaming performance with modern

“e2e” models

• What this means:

• Substantially better WER than best Kaldi models

• Real-time decoding with very little delay

• Cloud machine using GPU can support high concurrency (e.g. 50-100 streams)

• Able to rapidly update LM (e.g. new menu items). Preferably support grammars

• Our current struggle:

• Conformer model with transformer decoding and LM is substantially better than Kaldi

(e.g. 30% relative for real data), BUT…

• This gap almost disappears, or becomes inconsistent, once we make the compromises

necessary for fast streaming decoding

Near-term plans

• We have decided that the quickest path to “product-izable” models is RNN-

T, specifically transformer-transducer

• Would be, for example: conformer encoder, LSTM language model, possibly LSTM-

based RNN-T decoder

• We want to enable combining FST+RNNLM language models, to support grammars etc.

• Google’s products use this kind of thing (without the FST part):

https://www.youtube.com/watch?v=eODdowVNPU4 (LTI Colloqium, Tara Sainath)

• On things like Librispeech, in the literature, RNN-T is getting as good results

as transformer decoder (and more practical).

• We are trying to rapidly switch gears to move to RNN-T models

Relationship RNN-T vs. k2

• “If using RNN-T decoding, what is the point of k2?”

• We may combine RNN-T with FSA-based decoding. E.g. Google is using RNN-

T with 2-token “predictor” context and n-gram LM

https://arxiv.org/pdf/2109.07513.pdf

• Can also use our CUDA/C++/Python experience to accelerate various RNN-T

variants

• Can combine this type of model with FSA decoding

剩余36页未读，继续阅读

评论收藏

内容反馈

最新资源

植物大战僵尸 · 戴夫的老年生活手机版.apk
pta题库答案c语言.docx
python烟花代码.docx
#P0015. 全排列超级简单
pta题库答案c语言之排序4统计工龄.zip
pta题库答案c语言之树结构7堆中的路径.zip
pta题库答案c语言之树结构3TreeTraversalsAgain.zip
pta题库答案c语言之树结构2ListLeaves.zip
pta题库答案c语言之树结构1树的同构.zip
基于C++实现民航飞行与地图简易管理系统可执行程序+说明+详细注释.zip

闲云潭有

粉丝: 0
资源: 1

上传资源快速赚钱

前往需求广场，查看用户热搜

Dan介绍Kaldi2的ppt

评论0

最新资源

相关推荐

Open-JTalk:日本TTS系统

Open JTalk-开源

莫愁前路无网络，离线语音正待君——Kaldi篇

jtalk.py:日语文本到语音脚本（Oreore规范）。 在Ubuntu 20.04上确认操作

语音识别大神dan-povery介绍kaldi的ppt.rar

kaldi入门资料整理

kaldi语音识别资料.rar_kaldi_kaldi pdf 0.7_kaldi资料_语音识别

kaldi-master.zip_kaldi_kaldi 源码_声纹识别_音频_音频识别

kaldi voxforge online demo

kaldi详细介绍资料

kaldi自由说训练好的模型

Kaldi 和语音识别

kaldi语音识别教程

kaldi学习资料

语音识别kaldi安装

Kaldi 学习03.pdf

kaldi资料合集

pytorch-kaldi.zip

构建Kaldi需要的openfst和测试Kaldi是否安装成功的音频文件

kaldi最新源码

Kaldi 学习-02.pdf

kaldi-master.zip

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

行人跌倒数据集（VOC格式）

jtalk.py:日语文本到语音脚本（Oreore规范）。在Ubuntu 20.04上确认操作