2
Acknowledgements
The original ATK core was written by Steve Young in 2000/2001 following a student project
undertaken by Khe Chai Sim. In addition to a variety of bug fixes and enhancements, Matt Stuttle
implemented the Linux version in 2004 and restored HTK’s N-best output functionality in 2005. Hui
(KK) Ye implemented and tested the CMLLR support in 2005. The contribution of the EU-sponsored
Talk Project and the CMI-sponsored SCILL project during the years 2004/06 are gratefully
acknowledged.
Release 1.6 marks a significant upgrade with the inclusion of new support for synthesis and
asynchronous audio i/o management. To enable out of the box synthesis for US English, an
implementation of Alan Black’s Flite is included in the ATK distribution.
Matt Stuttle, now at the Toshiba Cambridge Research Lab, continues to assist with Linux issues and he
assisted with the preparation of this release.
SJY June 2007
3
Contents
1 Overview .......................................................................................................................................... 5
1.1 The ATK Library ...................................................................................................................... 5
1.2 Example ATK Configurations .................................................................................................. 6
1.3 Building an Application ............................................................................................................ 8
2 Packets ............................................................................................................................................ 13
2.1 Generic Packet Properties ....................................................................................................... 13
2.1.1 Programming .................................................................................................................. 13
2.2 Empty Packets ........................................................................................................................ 14
2.2.1 Programming .................................................................................................................. 14
2.3 String Packets ......................................................................................................................... 14
2.3.1 Programming .................................................................................................................. 14
2.4 Command Packets .................................................................................................................. 15
2.4.1 Programming .................................................................................................................. 15
2.5 WaveData Packets .................................................................................................................. 15
2.5.1 Programming .................................................................................................................. 16
2.6 Observation Packets ............................................................................................................... 16
2.6.1 Programming .................................................................................................................. 16
2.7 Phrase Packets ........................................................................................................................ 16
2.7.1 Programming .................................................................................................................. 18
3 Buffers ............................................................................................................................................ 19
3.1 Programming .......................................................................................................................... 19
4 Components .................................................................................................................................... 21
4.1 Programming .......................................................................................................................... 21
5 The Monitor .................................................................................................................................... 25
5.1 Programming .......................................................................................................................... 27
5.2 Configuration Variables ......................................................................................................... 27
6 System Components ....................................................................................................................... 28
6.1 Audio Input (ASource) ........................................................................................................... 28
6.1.1 Configuration Variables.................................................................................................. 29
6.1.2 Run-Time Commands ..................................................................................................... 30
6.1.3 Programming .................................................................................................................. 31
6.2 Coder (ACode) ....................................................................................................................... 32
6.2.1 Configuration Variables.................................................................................................. 33
6.2.2 Run-Time Commands ..................................................................................................... 34
6.2.3 Programming .................................................................................................................. 35
6.3 Recogniser (ARec) ................................................................................................................. 36
6.3.1 Configuration Variables.................................................................................................. 39
6.3.2 Confidence Scoring ........................................................................................................ 40
6.3.3 N-Gram Language Models ............................................................................................. 41
6.3.4 Run-Time Commands ..................................................................................................... 41
6.3.5 Programming .................................................................................................................. 41
6.4 Synthesis (ASyn) .................................................................................................................... 42
6.4.1 Configuration Variables.................................................................................................. 42
6.4.2 Run-Time Commands ..................................................................................................... 42
6.4.3 Programming .................................................................................................................. 43
7 Resources (ARMan) ....................................................................................................................... 44
4
7.1 Resources and Resource Groups ............................................................................................ 44
7.1.1 Configuration Variables.................................................................................................. 45
7.1.2 Programming .................................................................................................................. 45
7.2 Dictionary (ADict) .................................................................................................................. 46
7.2.1 Configuration Variables.................................................................................................. 46
7.2.2 Programming .................................................................................................................. 46
7.3 Grammar (AGram) ................................................................................................................. 47
7.3.1 Configuration Variables.................................................................................................. 47
7.3.2 Programming .................................................................................................................. 48
7.4 N-Gram Language Model (ANGram) .................................................................................... 49
7.4.1 Configuration Variables.................................................................................................. 49
7.4.2 Programming .................................................................................................................. 50
7.5 HMMSet (AHmms) ................................................................................................................ 50
7.5.1 Configuration Variables.................................................................................................. 50
7.5.2 Programming .................................................................................................................. 50
8 The AIO Asynchronous Input/Output Control Component ............................................................ 52
8.1 Configuration Variables ......................................................................................................... 53
8.2 Run-Time Commands ............................................................................................................. 54
8.3 Programming .......................................................................................................................... 54
9 Using ATK ..................................................................................................................................... 56
9.1 Initialisation and Error Reporting ........................................................................................... 56
9.2 No-Console Mode................................................................................................................... 56
9.3 Resource Incompatibility Issues ............................................................................................. 57
9.4 ATKLib and Test Programs.................................................................................................... 57
9.5 Using ATK with Windows MFC ............................................................................................ 58
9.6 Tuning an ATK Recogniser .................................................................................................... 60
10 ATK Application Examples ....................................................................................................... 61
10.1 AVite - an ATK-based version of HVite ................................................................................ 61
10.2 Simple Spoken Dialogue System (SSDS) .............................................................................. 62
10.3 Asynchronous Spoken Dialogue System (ASDS) .................................................................. 65
Index ....................................................................................................................................................... 66
5
1 Overview
ATK is an API designed to facilitate building experimental applications for HTK. It consists of a C++
layer sitting on top of the standard HTK libraries. This allows novel recognisers built using customised
versions of HTK to be compiled with ATK and then tested in working systems.
1
Like HTK itself, it is
supported on both Linux and Windows (both as a terminal application and an MFC application
2
).
ATK is multi-threaded. It allows a variety of components to be connected together to implement
different architectures and applications. Efficiency is a relatively low priority but the pipeline structure
is designed to reduce latency to a minimum to enable highly responsive systems to be built.
In addition to recognition using HTK models, ATK supports basic speech synthesis in English by
embedding the CMU Flite package.
3
ATK is a flexible programming environment for building spoken dialog systems and related
applications. The use of ATK requires a reasonable level of competence in C++, experience in
building applications on Linux or Windows, and a basic understanding of speech recognition
technology.
ATK is not designed for newcomers to speech recognition or for novice programmers.
1.1 The ATK Library
The module structure of the ATK Library is shown by the dependency diagram in Figure 1. The
library modules AHTK, AGram, ANGram, ADict, and AHmms provide wrappers around the basic
HTK resources: grammars, n-grams, dictionaries and acoustic models. The modules ARMan and
AResource provide a manager for these resources.
The modules APacket, ABuffer, ATee and AComponent provide the basic types for creating
components and plumbing them together:
Packet: is a chunk of information. Packets are used for transmitting a variety information between
asynchronously executing components. In particular packets are used to convey various forms of
user input and output signals (speech, event markers such as mouse clicks, etc). In these cases,
each packet has a time stamp to define the temporal span to which it relates. The types of data that
a packet can carry include text strings, waveform fragments, coded feature vectors, word labels
and semantic tags.
Buffer: is a fifo packet queue. Buffers provide the channel for passing packets from one
component to another. Buffers can be of fixed size or unlimited size. Components wishing to
access a buffer can test to see whether the buffer operation would block before committing to the
operation.
Component: is a processing element. Each component is executed within its own individual
thread. Components communicate by passing packets via buffers. In addition, components have a
command interface which can be used to update control parameters during operation and thereby
modify the runtime behaviour of the component.
1
ATK uses its own version of the standard HTK libraries. These have been extended to support the
extra functionality needed by ATK. In addition, a new HTK module called HThreads provides basic
platform independent thread support. The ATK HTK libraries cannot be used to compile HTKTools,
however, every attempt is made to keep them consistent with the latest HTK release (currently V3.4).
2
MFC support is currently rudimentary and largely untested.
3
See http://www.speech.cs.cmu.edu/flite