Near-term plans
• We have decided that the quickest path to “product-izable” models is RNN-
T, specifically transformer-transducer
• Would be, for example: conformer encoder, LSTM language model, possibly LSTM-
based RNN-T decoder
• We want to enable combining FST+RNNLM language models, to support grammars etc.
• Google’s products use this kind of thing (without the FST part):
https://www.youtube.com/watch?v=eODdowVNPU4 (LTI Colloqium, Tara Sainath)
• On things like Librispeech, in the literature, RNN-T is getting as good results
as transformer decoder (and more practical).
• We are trying to rapidly switch gears to move to RNN-T models
评论0
最新资源