MATLABVOICEBOX_matlabvoicebox资源-CSDN文库

共182个文件

m：182个

5星 · 超过95%的资源需积分: 9 34 浏览量 2009-05-20 22:35:56 上传评论 3 收藏 275KB ZIP 举报

【MATLAB VOICE BOX】是由剑桥大学教授Mike Brookes开发的一个扩展工具箱，专用于声音处理和语音识别。这个工具箱提供了丰富的函数，能够帮助用户在MATLAB环境中进行音频信号分析、处理和合成，是音频工程和语音识别研究的重要资源。 1. **dypsa.m**：该文件可能实现的是动态谱分析，这是一种用于分析非平稳信号的方法，能够捕捉到信号随时间变化的特性。在语音处理中，它可以用来观察语音信号在不同时间点的频率成分变化。 2. **gaussmix.m**：这个名字暗示它可能与高斯混合模型（Gaussian Mixture Model, GMM）有关。GMM是一种统计模型，常用于语音识别中的建模，它可以表示复杂的概率分布，模拟不同音素的概率组合。 3. **fxrapt.m**：根据名字推测，这可能是实现快速谱包络（Fast Spectral Envelope, FSE）计算的函数。谱包络是语音信号的重要特征，它反映了语音的频谱形状，对于识别和合成语音非常关键。 4. **estnoisem.m**：此函数可能用于噪声估计。在语音处理中，准确估计背景噪声可以帮助提高语音识别的性能，或者用于噪声抑制算法。 5. **readsfs.m**：这可能是一个读取声音文件的函数，特别是可能支持某种特定的声音文件格式，如SFS（Sound File System）格式。这种功能在处理和分析音频数据时十分必要。 6. **readsph.m**：可能用于读取球面波格式的声音文件。在声学或远距离语音处理中，球面波数据可能更为适用，因为它考虑了声音传播的方向性。 7. **specsub.m**：这个名字可能指的是频谱减法，一种去除噪声的常见方法。它通过将原始频谱与估计的噪声频谱相减，来提取纯净的语音信号。 8. **activlev.m**：这可能计算的是活动水平（Activity Level），即检测语音段和非语音段，这对于语音检测和语音活动检测（VAD）算法至关重要。 9. **ssubmmse.m**：可能涉及到子带最小均方误差（Subband Minimum Mean Square Error, MMSE）估计算法。这是一种在频域上进行参数估计的方法，常用于语音编码和增强。 10. **Contents.m**：这是通常包含工具箱目录和简要说明的文件，提供关于整个VOICE BOX工具箱的结构和功能的信息。 MATLAB VOICE BOX工具箱的这些功能为研究人员和工程师提供了强大的语音处理能力，涵盖了从数据预处理到特征提取、模型训练和信号合成的多个步骤，广泛应用于语音识别、语音合成、噪声抑制和音频分析等多个领域。通过使用这个工具箱，用户可以更方便地进行实验和开发，进一步推动语音技术的研究和发展。

资源推荐

资源详情

资源评论

收起资源包目录

MATLAB VOICE BOX （182个子文件）

dypsa.m 26KB

gaussmix.m 22KB

fxrapt.m 16KB

estnoisem.m 15KB

readsfs.m 15KB

readsph.m 11KB

specsub.m 10KB

activlev.m 10KB

ssubmmse.m 10KB

Contents.m 9KB

writewav.m 9KB

readwav.m 8KB

voicebox.m 8KB

gaussmixp.m 8KB

readaif.m 8KB

kmeanhar.m 8KB

gaussmixd.m 7KB

gausprod.m 7KB

windows.m 6KB

maxgauss.m 5KB

kmeans.m 5KB

writehtk.m 5KB

readhtk.m 5KB

fram2wav.m 5KB

randvec.m 5KB

lpccovar.m 5KB

melcepst.m 5KB

txalign.m 5KB

gmmlpdf.m 5KB

histndim.m 5KB

distisar.m 5KB

lpcconv.m 4KB

lpcauto.m 4KB

momfilt.m 4KB

findpeaks.m 4KB

distitar.m 4KB

entropy.m 4KB

importsii.m 4KB

distchar.m 4KB

distispf.m 4KB

disteusq.m 4KB

specsubm.m 4KB

distitpf.m 4KB

rotqr2eu.m 3KB

lpcifilt.m 3KB

maxfilt.m 3KB

melbankm.m 3KB

readcnx.m 3KB

distchpf.m 3KB

glotlf.m 3KB

lpcrf2ar.m 3KB

zoomfft.m 3KB

bitsprec.m 3KB

ldatrace.m 3KB

rotro2eu.m 3KB

unixwhich.m 3KB

readau.m 3KB

schmitt.m 3KB

rotro2qr.m 3KB

figbolden.m 3KB

lpcar2am.m 3KB

peak2dquad.m 3KB

lpcar2fm.m 3KB

soundspeed.m 3KB

dlyapsq.m 3KB

rotqr2ro.m 3KB

polygonxline.m 3KB

randfilt.m 3KB

frq2bark.m 3KB

randiscr.m 2KB

frac2bin.m 2KB

frq2erb.m 2KB

dualdiag.m 2KB

irdct.m 2KB

huffman.m 2KB

ewgrpdel.m 2KB

erb2frq.m 2KB

finishat.m 2KB

rotation.m 2KB

midi2frq.m 2KB

lpcrr2am.m 2KB

spgrambw.m 2KB

roteu2qr.m 2KB

frq2mel.m 2KB

mel2frq.m 2KB

choosenk.m 2KB

polygonwind.m 2KB

zerocros.m 2KB

lin2pcma.m 2KB

meansqtf.m 2KB

pcma2lin.m 2KB

rotro2pl.m 2KB

rotqr2mr.m 2KB

enframe.m 2KB

lpcrf2rr.m 2KB

cep2pow.m 2KB

rdct.m 2KB

rotpl2ro.m 2KB

frq2midi.m 2KB

lognmpdf.m 2KB

共 182 条

function [gci,goi] = dypsa(s,fs) %DYPSA Derive glottal closure instances from speech [gci,goi] = (s,fs) % Note: Needs to be combined with a voiced-voiceless detector to eliminate % spurious closures in unvoiced and silent regions. % % Inputs: % s is the speech signal % fs is the sampling frequncy % % Outputs: % gci is a vector of glottal closure sample numbers % gco is a vector of glottal opening sample numbers derived from % an assumed constant closed-phase fraction % % References: % [1] P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, Estimation of Glottal Closure % Instants in Voiced Speech using the DYPSA Algorithm, IEEE Trans on Speech and Audio % Processing, vol. 15, pp. 3443, Jan. 2007. % [2] M. Brookes, P. A. Naylor, and J. Gudnason, A Quantitative Assessment of Group Delay Methods % for Identifying Glottal Closures in Voiced Speech, IEEE Trans on Speech & Audio Processing, % vol. 14, no. 2, pp. 456466, Mar. 2006. % [3] A. Kounoudes, P. A. Naylor, and M. Brookes, The DYPSA algorithm for estimation of glottal % closure instants in voiced speech, in Proc ICASSP 2002, vol. 1, Orlando, 2002, pp. 349352. % [4] C. Ma, Y. Kamp, and L. F. Willems, A Frobenius norm approach to glottal closure detection % from the speech signal, IEEE Trans. Speech Audio Processing, vol. 2, pp. 258265, Apr. 1994. % [5] A. Kounoudes, Epoch Estimation for Closed-Phase Analysis of Speech, PhD Thesis, % Imperial College, 2001. % Algorithm Parameters % The following parameters are defined in voicebox() % % dy_cpfrac=0.3; % presumed closed phase fraction of larynx cycle % dy_cproj=0.2; % cost of projected candidate % dy_cspurt=-0.45; % cost of a talkspurt % dy_dopsp=1; % Use phase slope projection (1) or not (0)? % dy_ewdly=0.0008; % window delay for energy cost function term [~ energy peak delay from closure] (sec) % dy_ewlen=0.003; % window length for energy cost function term (sec) % dy_ewtaper=0.001; % taper length for energy cost function window (sec) % dy_fwlen=0.00045; % window length used to smooth group delay (sec) % dy_fxmax=500; % max larynx frequency (Hz) % dy_fxmin=50; % min larynx frequency (Hz) % dy_fxminf=60; % min larynx frequency (Hz) [used for Frobenius norm only] % dy_gwlen=0.0030; % group delay evaluation window length (sec) % dy_lpcdur=0.020; % lpc analysis frame length (sec) % dy_lpcn=2; % lpc additional poles % dy_lpcnf=0.001; % lpc poles per Hz (1/Hz) % dy_lpcstep=0.010; % lpc analysis step (sec) % dy_nbest=5; % Number of NBest paths to keep % dy_preemph=50; % pre-emphasis filter frequency (Hz) (to avoid preemphasis, make this very large) % dy_spitch=0.2; % scale factor for pitch deviation cost % dy_wener=0.3; % DP energy weighting % dy_wpitch=0.5; % DP pitch weighting % dy_wslope=0.1; % DP group delay slope weighting % dy_wxcorr=0.8; % DP cross correlation weighting % dy_xwlen=0.01; % cross-correlation length for waveform similarity (sec) % Revision History: % % 3.0 - 29 Jun 2006 - Rewrote DP function to improve speed % 2.6 - 29 Jun 2006 - Tidied up algorithm parameters % 2.4 - 10 Jun 2006 - Made into a single file aand put into VOICEBOX % 2.3 - 18 Mar 2005 - Removed 4kHz filtering of phase-slope function % 2.2 - 05 Oct 2004 - dpgci uses the slopes returned from xewgrdel % - gdwav from speech with fs<9000 is not filtered % - Various outputs and inputs of functions have been % removed since now there is no plotting % 1.0 - 30 Jan 2001 - Initial version [5] % Bugs: % 1. Allow the projections only to extend to the end of the larynx cycle % 2. Compensate for false pitch period cost at the start of a voicespurt % 3. Should include energy and pahse-slope costs for the first closeure of a voicespurt % 4. should delete candidates that are too close to the beginning or end of speech for the cost measures % currently this is 200 samples fixed in the main routine but it should adapt to window lengths of % cross-correlation, lpc and energy measures. % 5. Should have an integrated voiced/voiceless detector % 6. Allow dypsa to be called in chunks for a long speech file % 7. Do forward & backward passes to allow for gradient descent and/or discriminative training % 8. Glottal opening approximation does not really belong in this function % 9. The cross correlation window is asymmetric (and overcomplex) for no very good reason % 10. Incorporate -0.5 factor into dy_wxcorr and abolish erroneous (nx2-1)/(nx2-2) factor % 11. Add in talkspurt cost at the beginning rather than the end of a spurt (more efficient) % 12. Remove qmin>2 condition from voicespurt start detection (DYPSA 2 compatibility) in two places % 13. Include energy and phase-slope costs at the start of a voicespurt % 14. Single-closure voicespurt are only allowed if nbest=1 (should always be forbidden) % 15. Penultimate closure candidate is always acceptd % 16. Final element of gcic, Cfn and Ch is unused % 17. Needs to cope better with irregular voicing (e.g. creaky voice) % 18. Should give non-integer GCI positions for improved accuracy % 19. Remove constraint that first voicespurt cannot begin until qrmax after the first candidate % Copyright (C) Tasos Kounoudes, Jon Gudnason, Patrick Naylor and Mike Brookes 2006 % Version: $Id: dypsa.m,v 3.6 2007/05/04 07:01:38 dmb Exp $ % % VOICEBOX is a MATLAB toolbox for speech processing. % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This program is free software; you can redistribute it and/or modify % it under the terms of the GNU General Public License as published by % the Free Software Foundation; either version 2 of the License, or % (at your option) any later version. % % This program is distributed in the hope that it will be useful, % but WITHOUT ANY WARRANTY; without even the implied warranty of % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the % GNU General Public License for more details. % % You can obtain a copy of the GNU General Public License from % http://www.gnu.org/copyleft/gpl.html or by writing to % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Extract algorithm constants from VOICEBOX dy_preemph=voicebox('dy_preemph'); dy_lpcstep=voicebox('dy_lpcstep'); dy_lpcdur=voicebox('dy_lpcdur'); dy_dopsp=voicebox('dy_dopsp'); % Use phase slope projection (1) or not (0)? dy_ewtaper=voicebox('dy_ewtaper'); % Prediction order of FrobNorm method in seconds dy_ewlen=voicebox('dy_ewlen'); % windowlength of FrobNorm method in seconds dy_ewdly=voicebox('dy_ewdly'); % shift for assymetric speech shape at start of voiced cycle dy_cpfrac=voicebox('dy_cpfrac'); % presumed ratio of larynx cycle that is closed dy_lpcnf=voicebox('dy_lpcnf'); % lpc poles per Hz (1/Hz) dy_lpcn=voicebox('dy_lpcn'); % lpc additional poles lpcord=ceil(fs*dy_lpcnf+dy_lpcn); % lpc poles %PreEmphasise input speech s_used=filter([1 -exp(-2*pi*dy_preemph/fs)],1,s); % perform LPC analysis, AC method with Hamming windowing [ar, e, k] = lpcauto(s_used,lpcord,flo

评论收藏

内容反馈