Hash Authentication Algorithm of Compressed Domain Speech
Perception Based on MFCC and NMF
HUANG Yi-bo
1, a *
, ZHANG Qiu-yu
2,b
,YUAN Zhan-ting
3
and Liu Yang-wei
4
1
College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou,
China
2
School of Computer and Communication, Lanzhou University of Technology, Lanzhou, China
3
College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou,
China
4
School of Computer and Communication, Lanzhou University of Technology, Lanzhou, China
a
huang_yibo@foxmail.com,
b
zhangqylz@163.com
Keywords: Speech authentication; Perceptual hashing; Compressed domain; MFCC;NMF
Abstract. A Hash authentication algorithm of speech perception based on MDCT coefficients was
proposed to solve the problems of large amount of computation and bad real-time capability when
using traditional authentication algorithm in compressed domain speech. Firstly, the algorithm
extracts MDCT coefficients by partly decompressing speech sound in MP3 format. Then MDCT
coefficients of each frame of speech are processed by Mel filter in the compressed domain, forming
the 15-dimensional MFCC coefficient vector. Finally the perceptual Hash string is generated by
Hash structure. The perceptual Hash string can perceive the content of voice authentication.
Experimental results show that the algorithm keeping on content presents the strong robustness and
good real-time capability.
Introduction
The perceptual Hash function maps multimedia data sets to multimedia perception abstract sets
uniaxially based on the human psychoacoustic model
[1]
. The algorithm was first proposed by Ton
Kalker in 2001. Since the algorithm keeping on content presents the strong robustness and the
distinction of malicious tampering, the algorithm successfully realizes content integrity certification
of wideband audio and speech, gradually becoming a hot topic of multimedia information security
field. With the development of related research, computational efficiency of the algorithm is able to
meet the real-time requirement of certificating, indexing and identifying the audio in the open
network environment.
Since the parameters speech coding is completely different from the audio compression, the
audio Hashing algorithm does not apply to speech Hash algorithm. Traditional authentication
methods of speech content are mostly based on uncompressed format, so authentication methods of
speech content in the compressed domain are rare.
In order to solve problems in speech authentication of the compressed domain, Li Mingyu
[2]
from the Harbin Institute of Technology proposed an authentication method of compressed domain
audio content. MP3 files are partly decoded to get MDCT coefficients. The absolute value of
coefficients are added up after sub-band division and quantified as binary perceptual Hash value.
Then Jiao Yuhua researched and developed Li Mingyu's algorithm in-depth
[2,3]
, improving the
security and evaluation method of the algorithm. The method above has good robustness toward
compressed domain audio based on perceptual coding, providing a good idea for the compressed
domain speech content authentication. But authentication data is large and deficiency rate is high.
In this paper, in order to meet the requirements of efficiency and robustness of speech content
authentication in the compressed domain, a content authentication algorithm of compressed domain
speech perception based on MFCC and NMF was proposed for MP3 format audio files based on
MDCT coefficients, with a combination of speech and human auditory characteristics. Experiments
show that the algorithm keeping on content presents strong robustness and reduces the
authentication data, while the calculation efficiency is greatly improved.
Applied Mechanics and Materials Vols. 719-720 (2015) pp 1166-1170 Submitted: 2014-10-23
© (2015) Trans Tech Publications, Switzerland Accepted: 2014-11-10
doi:10.4028/www.scientific.net/AMM.719-720.1166 Online: 2015-01-13
All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of Trans
Tech Publications, www.ttp.net. (ID: 130.237.29.138, Kungliga Tekniska Hogskolan, Stockholm, Sweden-07/07/15,16:37:41)