Audiovisual Speaker Identification Based on Lip and Speech Modalities

Audiovisual Speaker Identification Based on Lip and Speech Modalities

Fatma Chelali and Amar Djeradi

Faculty of Electronics Engineering and Computer Science, University of Science and Technology HouariBoumedienne, Algiers 

Abstract: In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstra Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.

Keywords: Audiovisual speaker recognition, DCT, DWT, PLP, MFCC.

Full Tex t

 

 

 

Read 1431 times Last modified on Wednesday, 24 February 2016 08:31
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…