XMsoftware-MPEG-7referencesoftware(part1)资源-CSDN文库

共1575个文件

m：315个

wav：278个

h：278个

Extraction

Applications,

Retrieval

5星 · 超过95%的资源需积分: 10 50 浏览量 2009-05-15 18:54:23 上传评论 1 收藏 16.77MB RAR 举报

资源详情

资源评论

资源推荐

收起资源包目录

XM software - MPEG-7 reference software( part1) （1575个子文件）

fifo.c 151KB

nhood.c 103KB

inter.c 55KB

motion.c 52KB

mpeg2enc.c 50KB

entropic_map.c 42KB

getpic.c 38KB

display.c 36KB

gethdr.c 31KB

mpeg2dec.c 21KB

vopio.c 19KB

store.c 17KB

predict.c 17KB

getvlc.c 16KB

recon.c 16KB

readpic.c 15KB

ratectl.c 15KB

intra.c 14KB

putseq.c 14KB

fb_analysis.c 14KB

getblk.c 13KB

lifo.c 13KB

stats.c 13KB

putpic.c 12KB

subspic.c 11KB

spatscal.c 10KB

conform.c 10KB

verify.c 9KB

entropic_map_mex.c 9KB

example.c 9KB

B3.c 8KB

transfrm.c 7KB

puthdr.c 7KB

lambert.c 7KB

motion.c 7KB

wcov.c 7KB

viterbi_lite.c 6KB

idct.c 6KB

putvlc.c 6KB

AudioSpectrumEnvelope.c 6KB

quantize.c 5KB

systems.c 5KB

getbits.c 5KB

B3_mex.c 4KB

mahalonobis.c 4KB

putmpg.c 4KB

vprod.c 4KB

vadd.c 4KB

address.c 4KB

idctref.c 3KB

fdctref.c 3KB

writepic.c 2KB

putbits.c 2KB

loglambertwn1.c 99B

loglambertw0.c 97B

CHANGES 3KB

CHANGES 780B

Perceptual3DShape_Aux.cpp 112KB

GenericDS.cpp 42KB

Spatial2DCoordinates.cpp 41KB

CameraMotion.cpp 36KB

SpatioTemporalLocator.cpp 35KB

RegionLocatorCS.cpp 33KB

XMObjects.cpp 31KB

SpokenContentSpeaker.cpp 28KB

SpatioTemporalLocatorCS.cpp 26KB

TemporalInterpolation.cpp 26KB

treeFunction.cpp 25KB

DominantColorCS.cpp 25KB

SpokenContentSpeakerCS.cpp 24KB

GridLayoutCS.cpp 23KB

SpokenContentHeader.cpp 23KB

MotionTrajectoryCS.cpp 23KB

RegionLocator.cpp 23KB

MotionActivity.cpp 23KB

MotionTrajectory.cpp 22KB

DOMPrint.cpp 22KB

DominantColor.cpp 22KB

ColorLayout.cpp 20KB

MotionTrajectoryCAppl.cpp 20KB

XMInterfaces.cpp 20KB

ParametricObjectMotion.cpp 20KB

Perceptual3DShape.cpp 19KB

TimeSeriesCS.cpp 19KB

CameraMotionCS.cpp 19KB

ParametricObjectMotionCAppl.cpp 18KB

TemporalInterpolationCS.cpp 17KB

GofGopFeature.cpp 17KB

SpokenContentNode.cpp 17KB

HarmonicInstrumentTimbreCS.cpp 17KB

MultiViewCAppl.cpp 17KB

Spatial2DCoordinatesCS.cpp 17KB

TimeSeriesSAppl.cpp 17KB

3DShapeSpectrum.cpp 16KB

HarmonicInstrumentTimbre.cpp 16KB

TimeSeries.cpp 16KB

GenericDSCS.cpp 16KB

SpokenContentConfusionInfo.cpp 16KB

AdvancedFaceRecognitionCAppl.cpp 16KB

共 1575 条

Reduced-Rank Spectra and Minimum-Entropy Priors as Consistent and

Reliable Cues for Generalized Sound Recognition

Michael A. Casey

MERL, Cambridge Research Laboratory

casey@merl.com

Abstract

We propose a generalized sound recognition system that uses

reduced-dimension log-spectral features and a minimum

entropy hidden Markov model classifier. The proposed

system addresses the major challenges of generalized sound

recognition—namely, selecting robust acoustic features and

finding models that perform well across diverse sound types.

To test the generality of the methods, we sought sound classes

consisting of time-localized events, sequences, textures and

mixed scenes. In other words, no assumptions on signal

composition were imposed on the corpus.

Comparison between the proposed system and conventional

maximum likelihood training showed that minimum entropy

models yielded superior performance in a 20-class recognition

experiment. The experiment tested discrimination between

speech, non-speech utterances, environmental sounds, general

sound effects, animal sounds, musical instruments and

commercial music recordings.

1. Introduction

1.1. Generalized Sound Recognition

There are many uses for generalized sound recognition in

audio applications. For example, robust speech / non-speech

classifiers may be used to enhance the performance of

automatic speech recognition systems, and classifiers that

recognize ambient acoustic sources may provide signal-to-

noise ratio estimates for missing-feature methods.

Additionally, audio and video recordings may be indexed and

searched using the classifiers and model state-variables

utilized for fast query-by-example retrieval tasks from large

general audio databases.

1.2. Previous Work

With each type of classifier comes the task of finding robust

features that yield classifications with high accuracy on novel

data sets. Previous work on non-speech audio classification

has addressed recognition of audio sources using ad-hoc

collections of features that are tested and fine-tuned to a

specific classification task.

Such audio classification systems generally employ front-

end processing to encode salient acoustic information; such as

fundamental frequency, attack time and spectral centroid.

These features are often subjected to further analysis to find

an optimal set for a given task such as speech/music

discrimination, musical instrument identification and sound

effects recognition, [1][2][3]. Whilst each of these systems

performs satisfactorily in their own right, they do not

generalize beyond their intended applications due to the prior

assumptions on the structure and composition of the input

signals. For example, fundamental frequency assumes

periodicity and, along with the spectral centroid, also assumes

that the observable signal was produced by a single source.

Here, we are concerned with general methods that can be

uniformly applied to diverse source classification tasks with

accurate performance, which is the goal of generalized sound

recognition (GSR). An acceptable criterion for GSR

performance is >90% recognition for a multi-way classifier

tested on novel data.

2. Maximally Informative Features

Machine learning systems are dependent upon the choice of

representation of the input data. A common starting point for

audio analysis is frequency-domain conversion using basis

functions. The complex exponentials used by the Fourier

transform form such a basis and yield complete

representations of spectral magnitude information. The

advantage of this complete spectral basis approach is that no

assumptions are made on signal composition.

However, this representation consists of many dimensions

and yields a high degree of correlation in the data. This

renders much of the data redundant and therefore requires

greater effort to be expended in parameter inference for

statistical models. In many cases the redundancy also creates

problems with numerical stability during training and

adversely affects model performance during recognition.

To understand why such representations are problematic

consider that higher dimensional populations of samples are

more sparsely distributed across each dimension. This

encourages over-fitting of the available data points, thus

decreasing the reliability of density estimates. In contrast, a

low dimensional representation of the same population yields

a more densely sampled distribution from which parameters

are more accurately inferred.

2.1.1.

Independent Subspace Analysis

To address the problems of dimensionality and redundancy,

whilst keeping the benefits of complete spectral

representations, we use projection to low-dimensional

subspaces via reduced-rank spectral basis functions. It is

assumed that much of the information in the data occupies a

subspace, or manifold, that is embedded in the larger spectral

data space. A number of methods exist that yield maximally

informative subspaces multivariate data; such as, local-linear

embedding, non-linear principal components analysis,

projection pursuit and independent component analysis. It has

been shown that these algorithms form a family of closely

related algorithms that use information maximization to find

salient components of multivariate data, [4][5][6].

We use independent subspace analysis (ISA) for

extracting statistically independent reduced-rank features

from spectral coefficients. ISA has previously been used for

scene analysis and source separation from single-channel

mixtures, [7]. The singular value decomposition (SVD) is

used to estimate a new basis for the data, and the right

singular basis functions are cropped to yield fewer basis

functions that are then passed to independent component

analysis (ICA). The SVD transformation produces de-

correlated, reduced-rank features and the ICA transformation

imposes the additional constraint of minimum mutual

information between the marginal components of the output

features. The resulting representation consists of the complete

data projected onto a lower-dimensional subspace with

marginal distributions that are as statistically independent as

possible.

2.1.2.

Independent Subspace Extraction

To extract reduced-rank spectral features a log-ferquency

power spectrum was initially computed with a hamming

window of length 20ms advanced at 10ms intervals.

Frequency channels were logarithmically spaced in ¼-octave

bands spaced between 62.5Hz and 8kHz. The resulting log-

frequency power spectrum was converted to a decibel scale

and each spectral vector was constrained to unit L2-norm,

thus yielding spectral shape coefficients. The full-rank

features for each frame l, consisted of both the L2-norm gain

value

()

and unit spectral shape vector

()

{}()

∑

log10

, (1)

and

()

{}

()

≤≤= 1,

log10

(2)

where N was the number of spectral coefficients and M

was the frame count. The next step was to extract a subspace

using the singular value decomposition. To yield a

statistically independent basis we used a reduced-rank set of

SVD basis functions and applied a linear transformation

obtained by independent component analysis, see Figure 1.

The resulting features were the product of the full-rank

observation matrix

, the dimension-reduced SVD basis

functions

and the ICA transformation matrix

WXVY

. (3)

The proportion of information retained for reduced-rank

features of dimension

is given by:

∑

, (4)

where N is the total number of SVD basis functions and

are the singular values.

Figure 1. Block diagram of independent subspace

feature extraction.

SVD

ICA

Features

Basis

WXVY

Log Freq.

ectrum

Audio

Window

dB Scale

L2 Norm

To see how the representation affects classification we

trained two HMM classifiers; one using the complete spectral

information as given by Equations (1) and (2), and the other

using the reduced-rank form of Equation (3). Table 1 shows

results for the classifiers tested on musical instrument

classification. The HMMs trained with the reduced rank form

of spectral data performed significantly better than the HMMs

trained using full-rank spectra. Analysis of variance indicates

that the results are significant with p < .00018. The details of

classifier training and testing are discussed in the rest of the

paper.

Table 1 Performance statistics of 7-class classifier

trained on direct spectra and reduced-rank spectra.

Direct

Spectra

Reduced-Rank

Spectra

Class # Hit # Miss # Hit # Miss

Flute 1340

Piano 5050

Cello 5151

Cor Anglais 1340

Guitar 0330

Trumpet 4150

Violin 4260

Totals 20 13 32 1

Performance 60.61% 92.65%

3. Minimum Entropy HMMs

3.1. HMM Classifiers

Hidden Markov models consist of three components; an

initial state distribution

)(

iqP

with

{}

K1∈

, a

state transition matrix

)|(

iqjqPA

ttij

===

−

and the

observation density function

)|()( jqPb

== yy

for each

state. Continuous HMMs set

)(y

to a multivariate

Gaussian distribution with mean

and covariance matrix

giving

{

}

jjj

B K,=

for each state. We use the

youngvc05

2012-07-25

MPEG-7的资料。注意这只是part1。

评论收藏

内容反馈

学为好人

粉丝: 98
资源: 18

XM software - MPEG-7 reference software( part1)

评论5

最新资源

XM software - MPEG-7 reference software( part1)

评论5

XM software - MPEG-7 reference software ( part2 )

XM8-茉莉系列.spt

Geant4使用手册

XM-04-HID-KV1.3.pdf

XM-select.js多选下拉框

XM0303-0055-001(1)_1.dwg

喜马拉雅xm文件解密工具

GA-F2A88XM-DS2 rev3.0/3.1带NVME模块固件(不能用于rev3.2)

新盟曼巴狂蛇鼠标XM-M393老版驱动

Overview of the MPEG-7 Standard

XM670K-XM679K多用冷柜用温控器.pdf

layui-多选下拉框-xm-select-demo源码

基于ＭＰＥＧ-７的图像检索

XM660K-XM669K多联冷柜电子控制器.pdf

XM430-W350-T舵机资料.rar

layui-多选下拉框-xm-select.rar

mb_bios_ga-f2a88xm-ds2_f8d-nvme（带dos刷新程序）

layui第三方插件xm-select

xm-select，一款简单多选的select插件

曼巴蛇鼠标驱动XM-M398 GAMING MOUSE V1.0

mb_bios_ga-f2a88xm-ds2_f8d.zip （bios文件）

Z20K11xM-Reference-Manual-RTM1.4.pdf

联想 H61-IS6XM-Q65-Q67通用BIOS 支持I7 E3等CPU

mb_bios_ga-f2a88xm-ds2_v.3.2_fc.zip

xm-select-v1.2.1

layui多选框JS包《xm-select-master》

xm-select下拉选择框 v1.2.4.zip

xm-select.js

Geant4 PhysicsReferenceManual

最新资源

基于ＭＰＥＧ-７的图像检索