Home About IUP Magazines Journals Books Archives
     
A Guided Tour | Recommend | Links | Subscriber Services | Feedback | Subscribe Online
 
The IUP Journal of Computer Sciences :
Video-Based Person Authentication Using Face and Visual Speech
:
:
:
:
:
:
:
:
:
 
 
 
 
 
 
 

This paper proposes a facial and visual speech feature extraction method for automatic person authentication in video. The method proposed in Viola and Jones (2006) is used to detect the face region. Face region is processed in YCbCr color space to determine the locations of the eyes. The system models the non-lip region of the face using a Gaussian distribution, and it is used to locate the center of the mouth. Facial and visual speech features are extracted using multiscale morphological erosion and dilation operations, respectively. The facial features are extracted relative to the locations of the eyes and visual speech features are extracted relative to the locations of the eyes and mouth. Auto-Associative Neural Network (AANN) and Support Vector Machines (SVMs) are analyzed for person authentication. AANN models are used to capture the distribution of facial and visual speech features of a subject. SVMs are used to construct the optimal separating hyperplane for facial and visual speech features. The evidence from face and visual speech modalities are combined using a weighting rule, and the result is used to accept or reject the identity claim of the subject. The performance of the system is evaluated for XM2VTS database. It is seen that the system achieves an Equal Error Rate (EER) of about 0.41% and 0.37% for 50 subjects using AANN and SVM, respectively. Finally, the performance of the AANN and SVM models for person authentication are compared. Experimental results show that the SVM gives better performance than the AANN model.

Automatic person recognition by machine appears to be difficult, whereas it is done almost effortlessly by human beings. The main reason for this difficulty is that it is not possible to articulate the mechanism humans use. Person recognition can be categorized into person identification and authentication. The objective of a person identification system is to determine the identity of a test subject from the set of reference subjects. The performance of the person identification system is quantified in terms of identification rate or recognition rate. On the other hand, a person authentication system should accept or reject the identity claim of a subject, the performance, being measured in terms of Equal Error Rate (EER). Person authentication systems make use of one or more biometric modalities such as speech, face, fingerprint, signature, iris and hand geometry to accept or reject the identity claim of an individual. In this paper, face and visual speech modalities are used for person authentication. The terms facial and visual speech features refer to the features extracted from the face and mouth images of the person, respectively.

A comprehensive survey of still and video-based face recognition techniques can be found in Zhao et al. (2003). Several techniques have been proposed in literature for still-image-based face recognition, such as Eigenface or Principal Component Analysis (PCA) (Turk, 1991), Linear Discriminant Analysis (LDA) (Swets and Weng, 1996; and Belhumeur et al., 1997) and Independent Component Analysis (ICA) (Bartlett et al., 1998). Most of the video-based face recognition methods apply still-image-based recognition to selected frames (Zhao et al., 2003). The audio-video-based person recognition system described in Choudhury et al. (1999) used Eigenface for modeling the face image intensity values. The performance was evaluated for 26 subjects, and the method reported 88.4% recognition rate for face modality. The method described in Senior (1999) used Gabor filter response at 29 features in the face for face recognition. The performance was reported for TV broadcast news data (CNN) corresponding to 76 subjects. Recognition performance of 63.6% was reported for face modality. The audio-visual speaker recognition method proposed in Chaudhari et al. (2003) used Discrete Cosine Transform of mouth region as visual speech features. The method reported a recognition rate of 89.1% for visual speech modality. Audio-visual speaker recognition system described in Kanak et al. (2003) used Eigenspace of lip images for verification.

 
 
 

Video-Based Person Authentication Using Face and Visual Speech, Multimodal person authentication, Face detection, Eye location, Visual speech, Multiscale morphological dilation and erosion, Autoassociative neural network, Support vector machine.