没有合适的资源?快使用搜索试试~ 我知道了~
剑桥大学编写的语音识别的软件ATK(HTK的C++接口)
5星 · 超过95%的资源 需积分: 9 106 下载量 107 浏览量
2009-12-29
11:02:51
上传
评论 1
收藏 1.46MB PDF 举报
温馨提示
试读
70页
剑桥的htk,想必很多搞语音的都知道,atk就是对它的c++多线程封装,你可以在vc2005中调试它了
资源推荐
资源详情
资源评论
2
Acknowledgements
The original ATK core was written by Steve Young in 2000/2001 following a student project
undertaken by Khe Chai Sim. In addition to a variety of bug fixes and enhancements, Matt Stuttle
implemented the Linux version in 2004 and restored HTK’s N-best output functionality in 2005. Hui
(KK) Ye implemented and tested the CMLLR support in 2005. The contribution of the EU-sponsored
Talk Project and the CMI-sponsored SCILL project during the years 2004/06 are gratefully
acknowledged.
Release 1.6 marks a significant upgrade with the inclusion of new support for synthesis and
asynchronous audio i/o management. To enable out of the box synthesis for US English, an
implementation of Alan Black’s Flite is included in the ATK distribution.
Matt Stuttle, now at the Toshiba Cambridge Research Lab, continues to assist with Linux issues and he
assisted with the preparation of this release.
SJY June 2007
3
Contents
1 Overview .......................................................................................................................................... 5
1.1 The ATK Library ...................................................................................................................... 5
1.2 Example ATK Configurations .................................................................................................. 6
1.3 Building an Application ............................................................................................................ 8
2 Packets ............................................................................................................................................ 13
2.1 Generic Packet Properties ....................................................................................................... 13
2.1.1 Programming .................................................................................................................. 13
2.2 Empty Packets ........................................................................................................................ 14
2.2.1 Programming .................................................................................................................. 14
2.3 String Packets ......................................................................................................................... 14
2.3.1 Programming .................................................................................................................. 14
2.4 Command Packets .................................................................................................................. 15
2.4.1 Programming .................................................................................................................. 15
2.5 WaveData Packets .................................................................................................................. 15
2.5.1 Programming .................................................................................................................. 16
2.6 Observation Packets ............................................................................................................... 16
2.6.1 Programming .................................................................................................................. 16
2.7 Phrase Packets ........................................................................................................................ 16
2.7.1 Programming .................................................................................................................. 18
3 Buffers ............................................................................................................................................ 19
3.1 Programming .......................................................................................................................... 19
4 Components .................................................................................................................................... 21
4.1 Programming .......................................................................................................................... 21
5 The Monitor .................................................................................................................................... 25
5.1 Programming .......................................................................................................................... 27
5.2 Configuration Variables ......................................................................................................... 27
6 System Components ....................................................................................................................... 28
6.1 Audio Input (ASource) ........................................................................................................... 28
6.1.1 Configuration Variables.................................................................................................. 29
6.1.2 Run-Time Commands ..................................................................................................... 30
6.1.3 Programming .................................................................................................................. 31
6.2 Coder (ACode) ....................................................................................................................... 32
6.2.1 Configuration Variables.................................................................................................. 33
6.2.2 Run-Time Commands ..................................................................................................... 34
6.2.3 Programming .................................................................................................................. 35
6.3 Recogniser (ARec) ................................................................................................................. 36
6.3.1 Configuration Variables.................................................................................................. 39
6.3.2 Confidence Scoring ........................................................................................................ 40
6.3.3 N-Gram Language Models ............................................................................................. 41
6.3.4 Run-Time Commands ..................................................................................................... 41
6.3.5 Programming .................................................................................................................. 41
6.4 Synthesis (ASyn) .................................................................................................................... 42
6.4.1 Configuration Variables.................................................................................................. 42
6.4.2 Run-Time Commands ..................................................................................................... 42
6.4.3 Programming .................................................................................................................. 43
7 Resources (ARMan) ....................................................................................................................... 44
4
7.1 Resources and Resource Groups ............................................................................................ 44
7.1.1 Configuration Variables.................................................................................................. 45
7.1.2 Programming .................................................................................................................. 45
7.2 Dictionary (ADict) .................................................................................................................. 46
7.2.1 Configuration Variables.................................................................................................. 46
7.2.2 Programming .................................................................................................................. 46
7.3 Grammar (AGram) ................................................................................................................. 47
7.3.1 Configuration Variables.................................................................................................. 47
7.3.2 Programming .................................................................................................................. 48
7.4 N-Gram Language Model (ANGram) .................................................................................... 49
7.4.1 Configuration Variables.................................................................................................. 49
7.4.2 Programming .................................................................................................................. 50
7.5 HMMSet (AHmms) ................................................................................................................ 50
7.5.1 Configuration Variables.................................................................................................. 50
7.5.2 Programming .................................................................................................................. 50
8 The AIO Asynchronous Input/Output Control Component ............................................................ 52
8.1 Configuration Variables ......................................................................................................... 53
8.2 Run-Time Commands ............................................................................................................. 54
8.3 Programming .......................................................................................................................... 54
9 Using ATK ..................................................................................................................................... 56
9.1 Initialisation and Error Reporting ........................................................................................... 56
9.2 No-Console Mode................................................................................................................... 56
9.3 Resource Incompatibility Issues ............................................................................................. 57
9.4 ATKLib and Test Programs.................................................................................................... 57
9.5 Using ATK with Windows MFC ............................................................................................ 58
9.6 Tuning an ATK Recogniser .................................................................................................... 60
10 ATK Application Examples ....................................................................................................... 61
10.1 AVite - an ATK-based version of HVite ................................................................................ 61
10.2 Simple Spoken Dialogue System (SSDS) .............................................................................. 62
10.3 Asynchronous Spoken Dialogue System (ASDS) .................................................................. 65
Index ....................................................................................................................................................... 66
5
1 Overview
ATK is an API designed to facilitate building experimental applications for HTK. It consists of a C++
layer sitting on top of the standard HTK libraries. This allows novel recognisers built using customised
versions of HTK to be compiled with ATK and then tested in working systems.
1
Like HTK itself, it is
supported on both Linux and Windows (both as a terminal application and an MFC application
2
).
ATK is multi-threaded. It allows a variety of components to be connected together to implement
different architectures and applications. Efficiency is a relatively low priority but the pipeline structure
is designed to reduce latency to a minimum to enable highly responsive systems to be built.
In addition to recognition using HTK models, ATK supports basic speech synthesis in English by
embedding the CMU Flite package.
3
ATK is a flexible programming environment for building spoken dialog systems and related
applications. The use of ATK requires a reasonable level of competence in C++, experience in
building applications on Linux or Windows, and a basic understanding of speech recognition
technology.
ATK is not designed for newcomers to speech recognition or for novice programmers.
1.1 The ATK Library
The module structure of the ATK Library is shown by the dependency diagram in Figure 1. The
library modules AHTK, AGram, ANGram, ADict, and AHmms provide wrappers around the basic
HTK resources: grammars, n-grams, dictionaries and acoustic models. The modules ARMan and
AResource provide a manager for these resources.
The modules APacket, ABuffer, ATee and AComponent provide the basic types for creating
components and plumbing them together:
Packet: is a chunk of information. Packets are used for transmitting a variety information between
asynchronously executing components. In particular packets are used to convey various forms of
user input and output signals (speech, event markers such as mouse clicks, etc). In these cases,
each packet has a time stamp to define the temporal span to which it relates. The types of data that
a packet can carry include text strings, waveform fragments, coded feature vectors, word labels
and semantic tags.
Buffer: is a fifo packet queue. Buffers provide the channel for passing packets from one
component to another. Buffers can be of fixed size or unlimited size. Components wishing to
access a buffer can test to see whether the buffer operation would block before committing to the
operation.
Component: is a processing element. Each component is executed within its own individual
thread. Components communicate by passing packets via buffers. In addition, components have a
command interface which can be used to update control parameters during operation and thereby
modify the runtime behaviour of the component.
1
ATK uses its own version of the standard HTK libraries. These have been extended to support the
extra functionality needed by ATK. In addition, a new HTK module called HThreads provides basic
platform independent thread support. The ATK HTK libraries cannot be used to compile HTKTools,
however, every attempt is made to keep them consistent with the latest HTK release (currently V3.4).
2
MFC support is currently rudimentary and largely untested.
3
See http://www.speech.cs.cmu.edu/flite
剩余69页未读,继续阅读
wangfeineu
- 粉丝: 0
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 部署yolov9模型ncnn模型到树莓派4或5嵌入式C++源码.zip
- 2024年上半年三星评定题库(客运) (1).xlsx
- 大学院校基础信息表(3237所大学)
- docker-compose-linux-x86-64
- 基于深度学习的常用显示接口及触摸屏液晶屏测试方法,适合FPGA初学者
- YOLOv9 QT+NCNN实现安卓端部署源码+部署步骤+演示apk.zip
- 【计算机毕业设计】基于SSM+Vue的网上花店系统【源码+lw+部署文档+讲解】
- 使用NCNN在安卓平台上部署YOLOv8实现实时目标检测分割旋转框源码.zip
- C# 调用ComfyUI 接口小案例,可以生成任务,可以获取图片,可以显示图片
- opencv-基于c++实现的opencv图像处理算法之直方图均衡算法.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
- 1
- 2
- 3
前往页