PocketSphinx 5.0.0
==================
This is PocketSphinx, one of Carnegie Mellon University's open source large
vocabulary, speaker-independent continuous speech recognition engines.
Although this was at one point a research system, active development
has largely ceased and it has become very, very far from the state of
the art. I am making a release, because people are nonetheless using
it, and there are a number of historical errors in the build system
and API which needed to be corrected.
The version number is strangely large because there was a "release"
that people are using called 5prealpha, and we will use proper
[semantic versioning](https://semver.org/) from now on.
**Please see the LICENSE file for terms of use.**
Installation
------------
We now use CMake for building, which should give reasonable results
across Linux and Windows. Not certain about Mac OS X because I don't
have one of those. In addition, the audio library, which never really
built or worked correctly on any platform at all, has simply been
removed.
There is no longer any dependency on SphinxBase. There is no
SphinxBase anymore. This is not the SphinxBase you're looking for.
All your SphinxBase are belong to us.
To install the Python module in a virtual environment (replace
`~/ve_pocketsphinx` with the virtual environment you wish to create),
from the top level directory:
```
python3 -m venv ~/ve_pocketsphinx
. ~/ve_pocketsphinx/bin/activate
pip install .
```
To install the C library and bindings (assuming you have access to
/usr/local - if not, use `-DCMAKE_INSTALL_PREFIX` to set a different
prefix in the first `cmake` command below):
```
cmake -S . -B build
cmake --build build
cmake --build build --target install
```
Usage
-----
The `pocketsphinx` command-line program reads single-channel 16-bit
PCM audio from standard input or one or more files, and attemps to
recognize speech in it using the default acoustic and language model.
It accepts a large number of options which you probably don't care
about, a *command* which defaults to `live`, and one or more inputs
(except in `align` mode), or `-` to read from standard input.
If you have a single-channel WAV file called "speech.wav" and you want
to recognize speech in it, you can try doing this (the results may not
be wonderful):
pocketsphinx single speech.wav
If your input is in some other format I suggest converting it with
`sox` as described below.
The commands are as follows:
- `help`: Print a long list of those options you don't care about.
- `config`: Dump configuration as JSON to standard output (can be
loaded with the `-config` option).
- `live`: Detect speech segments in each input, run recognition
on them (using those options you don't care about), and write the
results to standard output in line-delimited JSON. I realize this
isn't the prettiest format, but it sure beats XML. Each line
contains a JSON object with these fields, which have short names
to make the lines more readable:
- `b`: Start time in seconds, from the beginning of the stream
- `d`: Duration in seconds
- `p`: Estimated probability of the recognition result, i.e. a
number between 0 and 1 representing the likelihood of the input
according to the model
- `t`: Full text of recognition result
- `w`: List of segments (usually words), each of which in turn
contains the `b`, `d`, `p`, and `t` fields, for start, end,
probability, and the text of the word. If `-phone_align yes`
has been passed, then a `w` field will be present containing
phone segmentations, in the same format.
- `single`: Recognize each input as a single utterance, and write a
JSON object in the same format described above.
- `align`: Align a single input file (or `-` for standard input) to
a word sequence, and write a JSON object in the same format
described above. The first positional argument is the input, and
all subsequent ones are concatenated to make the text, to avoid
surprises if you forget to quote it. You are responsible for
normalizing the text to remove punctuation, uppercase, centipedes,
etc. For example:
pocketsphinx align goforward.wav "go forward ten meters"
By default, only word-level alignment is done. To get phone
alignments, pass `-phone_align yes` in the flags, e.g.:
pocketsphinx -phone_align yes align audio.wav $text
This will make not particularly readable output, but you can use
[jq](https://stedolan.github.io/jq/) to clean it up. For example,
you can get just the word names and start times like this:
pocketsphinx align audio.wav $text | jq '.w[]|[.t,.b]'
Or you could get the phone names and durations like this:
pocketsphinx -phone_align yes align audio.wav $text | jq '.w[]|.w[]|[.t,.d]'
There are many, many other possibilities, of course.
- `soxflags`: Return arguments to `sox` which will create the
appropriate input format. Note that because the `sox`
command-line is slightly quirky these must always come *after* the
filename or `-d` (which tells `sox` to read from the microphone).
You can run live recognition like this:
sox -d $(pocketsphinx soxflags) | pocketsphinx -
or decode from a file named "audio.mp3" like this:
sox audio.mp3 $(pocketsphinx soxflags) | pocketsphinx -
By default only errors are printed to standard error, but if you want
more information you can pass `-loglevel INFO`. Partial results are
not printed, maybe they will be in the future, but don't hold your
breath.
Programming
-----------
For programming, see the [examples directory](./examples/) for a
number of examples of using the library from C and Python. You can
also read the [documentation for the Python
API](https://pocketsphinx.readthedocs.io) or [the C
API](https://cmusphinx.github.io/doc/pocketsphinx/)
Authors
-------
PocketSphinx is ultimately based on `Sphinx-II` which in turn was
based on some older systems at Carnegie Mellon University, which were
released as free software under a BSD-like license thanks to the
efforts of Kevin Lenzo. Much of the decoder in particular was written
by Ravishankar Mosur (look for "rkm" in the comments), but various
other people contributed as well, see [the AUTHORS file](./AUTHORS)
for more details.
David Huggins-Daines (the author of this document) is
guilty^H^H^H^H^Hresponsible for creating `PocketSphinx` which added
various speed and memory optimizations, fixed-point computation, JSGF
support, portability to various platforms, and a somewhat coherent
API. He then disappeared for a while.
Nickolay Shmyrev took over maintenance for quite a long time
afterwards, and a lot of code was contributed by Alexander Solovets,
Vyacheslav Klimkov, and others.
Currently this is maintained by David Huggins-Daines again.
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
cmusphinx-zh-cn-5.2和pocketsphinx源码 (635个子文件)
pocketsphinx.1 11KB
pocketsphinx_batch.1 10KB
sphinx_fe.1 4KB
sphinx_cont_seg.1 2KB
sphinx_pitch.1 2KB
sphinx_lm_eval.1 1KB
sphinx_lm_convert.1 1KB
sphinx_cepview.1 881B
pocketsphinx_mdef_convert.1 813B
sphinx_lm_sort.1 620B
test-align.align 742B
AUTHORS 210B
make.bat 804B
en-us.lm.bin 25.86MB
zh_cn.lm.bin 5.56MB
en-us-phone.lm.bin 837KB
100.lm.bin 786KB
turtle.lm.bin 771KB
tidigits.lm.bin 256KB
100.lm.bz2 14KB
dtoa.c 81KB
jsgf_scanner.c 63KB
ngram_search_fwdtree.c 57KB
ps_lattice.c 57KB
blas_lite.c 52KB
fsg_search.c 49KB
jsgf_parser.c 49KB
ngram_search.c 48KB
fe_sigproc.c 47KB
pocketsphinx.c 44KB
feat.c 43KB
s2_semi_mgau.c 43KB
slapack_lite.c 39KB
acmod.c 39KB
ngram_search_fwdflat.c 33KB
ngram_model.c 31KB
ptm_mgau.c 30KB
fsg_lextree.c 30KB
lm_trie.c 30KB
fsg_model.c 29KB
slamch.c 28KB
ngram_model_set.c 28KB
pocketsphinx_batch.c 28KB
bin_mdef.c 28KB
gstpocketsphinx.c 27KB
allphone_search.c 26KB
vad_core.c 26KB
jsgf.c 25KB
pocketsphinx_main.c 23KB
hmm.c 23KB
mdef.c 22KB
fe_interface.c 22KB
ngram_model_trie.c 22KB
ps_config.c 21KB
kws_search.c 21KB
resample_by_2_internal.c 20KB
dict2pid.c 19KB
hash_table.c 17KB
ms_gauden.c 17KB
cmd_ln.c 17KB
resample.c 16KB
pio.c 16KB
bio.c 15KB
ps_alignment.c 15KB
state_align_search.c 15KB
dict.c 14KB
logmath.c 14KB
vad_filterbank.c 14KB
ms_senone.c 14KB
pocketsphinx_pitch.c 13KB
ngrams_raw.c 12KB
phone_loop_search.c 12KB
fe_noise.c 12KB
lm_trie_quant.c 10KB
yin.c 10KB
test_acmod.c 10KB
ckd_alloc.c 10KB
ps_endpointer.c 10KB
fixlog.c 10KB
listelem_alloc.c 9KB
f2c_lite.c 9KB
pocketsphinx_lm_eval.c 9KB
tmat.c 9KB
test_fe.c 9KB
soundfiles.c 9KB
ms_mgau.c 9KB
fsg_history.c 9KB
profile.c 8KB
test_config.c 8KB
err.c 8KB
resample_fractional.c 8KB
test_lm_set.c 7KB
fe_warp.c 7KB
cmn.c 7KB
test_word_align.c 7KB
heap.c 7KB
mmio.c 7KB
genrand.c 7KB
strtest.c 7KB
matrix.c 6KB
共 635 条
- 1
- 2
- 3
- 4
- 5
- 6
- 7
资源评论
hanking402
- 粉丝: 0
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功