没有合适的资源?快使用搜索试试~ 我知道了~
improved speaker segmentation and segments clustering
需积分: 9 8 下载量 3 浏览量
2016-11-04
15:10:01
上传
评论
收藏 75KB PDF 举报
温馨提示
IMPROVED SPEAKER SEGMENTATION AND SEGMENTS CLUSTERING USING THE BAYESIAN INFORMATION CRITERION(Alain Tritschler and Ramesh Gopinath)
资源推荐
资源详情
资源评论
IMPROVED SPEAKER SEGMENTATION AND SEGMENTS CLUSTERING USING THE
BAYESIAN INFORMATION CRITERION
Alain Tritschler and Ramesh Gopinath
IBM T. J. Watson Research Center
Yorktown Heights, NY 10598, USA
email:alain@us.ibm.com
ABSTRACT
Detection of speaker, channel and environmentchanges in
a continuous audio stream is importantinvarious applica-
tions (e.g., broadcast news, meetings/teleconferences etc.).
Standard schemes for segmentation use a classier and hence
do not generalize to unseen sp eaker / channel / environ-
ments. Recently S.Chen introduced new segmentation and
clustering algorithms, using the so-called BIC. This paper
presents more accurate and more ecientvariants of the
BIC scheme for segmentation and clustering. Specically,
the new algorithms improve the speed and accuracy of seg-
mentation and clustering and allow for a real-time imple-
mentation of simultaneous transcription, segmentation and
speaker tracking.
1. INTRODUCTION
The segmentation of continuous audio is useful as a pre-
processor for further classicatio n of the segments for sp eaker
identication/verication , noise rejection, music removal etc.
In automatic transcription applications such a segmentation
scheme allows the creation and use of speaker / channel
/environment-speci c acoustic mo dels for improved tran-
scription accuracy. In several of these applicatio ns cluster-
ing of segments from the same speaker / channel / envi-
ronment is also useful. Segmentation and clustering can be
used in conjunction in sp eaker tracking applications. To-
gether they can be used to increase the amount of adapta-
tion data for unsup ervised adaptation of acoustic mo dels in
transcription applications. In general they allow sp ecialized
processing of the audio for specic speakers / channels / en-
vironments. This paper presents improvements (both speed
and accuracy) to algorithms for segmentation and clustering
based on the Bayesian Information Criterion (BIC) intro-
duced recently in [1]. These improvements have allowed us
to create an application that concurrently segments, tran-
scribes, identies and tracks speakers in broadcast news au-
dio in real-time.
The pap er is organized as follows: Section 2 briey re-
views the BIC, which is the key concept used in b oth the
segmentation and clustering algorithms. Section 3 describ es
the new version of the segmentation algorithm and Sec-
tion 4 describes impovements to the clustering algorithm.
Section 5 describ es how these new algorithms are incorp o-
rated in a real-time transcription, segmentation and sp eaker
identication and tracking system for broadcast news.
2. THE BAYESIAN INFORMATION CRITERION
BIC is an asymtotically optimal Bayesian mo del-selection
criterion used to decide whichof
p
parametric models best
represents
n
data samples
x
1
;:::;x
n
,
x
i
2
IR
d
. Each model
M
j
has a number of parameters, say
k
j
.We assume that
the samples
x
i
are independent.
According to the BIC theory [3], for suciently large
n
,
the best mo del of the data is the one which maximizes
BIC
j
=
log
L
j
(
x
1
;:::;x
n
)
,
1
2
k
j
logn
(1)
with
= 1, and where
L
j
is the maximum likelihoo d of the
data under mo del
M
j
(i.e., the likelihood of the data with
maximum likelihoo d values for the
k
j
parameters of
M
j
).
In the particular case where there are only two models
wehave a simple test for model selection : choose the model
M
1
over
M
2
if
BIC
=
BIC
1
,
BIC
2
, is positive.
Note that BIC can also be viewed as a penalized maxi-
mum likelihoo d technique [3, 1].
3. SEGMENTATION USING BIC
3.1. BIC for segmentation
In this pap er standard 24-dimensional mel-cepstral feature
vectors generated at 10ms intervals from the continuous au-
dio stream form the data samples (or frames). The audio
stream is from a Broadcast news source sampled at 16KHz
with 16-bit PCM. The basic problem is to identify all pos-
sible frames where there is a segment boundary. Without
loss of generality consider a window of consecutive data
samples
f
x
1
:::x
n
g
in which there is at most one segment
boundary. In this case the basic question of whether or not
there is a segment boundary at frame
i
can be cast as a
model selection problem b etween the following two models:
model
M
1
where
f
x
1
;:::;x
n
g
is drawn from a single full-
covariance Gaussian, and mo del
M
2
where
f
x
1
;:::;x
n
g
is
drawn from two full-covariance Gaussians, with
f
x
1
:::x
i
g
drawn from the rst Gaussian, and
f
x
i
+1
;:::;x
n
g
drawn
from the second Gaussian. Since
x
i
2
IR
d
, model
M
1
has
k
1
=
d
+
d
(
d
+1)
2
parameters, while model
M
2
has twice as
many parameters (
k
2
=2
k
1
).
It is straightforward to show [1] that the
i
th
frame is a
goo d candidate for a segment boundary if the expression :
BIC
i
=
,
n
2
log
j
w
j
+
i
2
log
j
f
j
+
n
,
i
2
log
j
s
j
+
1
2
(
d
+
d
(
d
+1)
2
)
logn
资源评论
wh357589873
- 粉丝: 60
- 资源: 26
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- formatted_task070_abductivenli_incorrect_classification.json
- formatted_task071_abductivenli_answer_generation.json
- formatted_task073_commonsenseqa_answer_generation.json
- formatted_task072_abductivenli_answer_generation.json
- formatted_task074_squad1.1_question_generation.json
- formatted_task076_splash_correcting_sql_mistake.json
- formatted_task075_squad1.1_answer_generation.json
- formatted_task077_splash_explanation_to_sql.json
- formatted_task079_conala_concat_strings.json
- formatted_task078_all_elements_except_last_i.json
- formatted_task080_piqa_answer_generation.json
- task083_babi_t1_single_supporting_fact_answer_generation.json
- task082_babi_t1_single_supporting_fact_question_generation.json
- formatted_task081_piqa_wrong_answer_generation.json
- 基于JAVA+SpringBoot+Vue+MySQL的协同过滤电影推荐系统源码+数据库+论文(高分毕业设计).zip
- 084_babi_t1_single_supporting_fact_identify_relevant_fact.json
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功