Paper: ICASSP (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”

April 3rd, 2008 Irfan Essa Posted in Face and Gesture, James Rehg, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin, Thad Starner No Comments »

Pei Yin, Irfan Essa, James Rehg, Thad Starner (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”, ICASSP 2008 - March 30 - April 4, 2008 - Las Vegas, Nevada, U.S.A. (Paper: MLSP-P3.D8, Session: Pattern Recognition and Classification II, Time: Thursday, April 3, 15:30 - 17:30, Topic: Machine Learning for Signal Processing: Learning Theory and Modeling) (PDF|Project Site)

ABSTRACT

icassp08We address the feature selection problem for hidden Markov models (HMMs) in sequence classification. Temporal correlation in sequences often causes difficulty in applying feature selection techniques. Inspired by segmental k-means segmentation (SKS), we propose Segmentally Boosted HMMs (SBHMMs), where the state-optimized features are constructed in a segmental and discriminative manner. The contributions are twofold. First, we introduce a novel feature selection algorithm, where the temporal dynamics are decoupled from the static learning procedure by assuming that the sequential data are piecewise independent and identically distributed. Second, we show that the SBHMM consistently improves traditional HMM recognition in various domains. The reduction of error compared to traditional HMMs ranges from 17% to 70% in American Sign Language recognition, human gait identification, lip reading, and speech recognition.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2007) “Tree-based Classifiers for Bilayer Video Segmentation”

June 17th, 2007 Irfan Essa Posted in Antonio Crimisini, Computational Photography and Video, John Winn, Numerical Machine Learning, Papers, Pei Yin, Research No Comments »

Tree-based Classifiers for Bilayer Video Segmentation (IEEE Explor)

Yin, Pei Criminisi, Antonio Winn, John Essa, Irfan
School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
This paper appears in: Computer Vision and Pattern Recognition, 2007. CVPR ‘07. IEEE Conference on
Publication Date: 17-22 June 2007
On page(s): 1 - 8
Number of Pages: 1 - 8
Location: Minneapolis, MN, USA
ISBN: 1-4244-1180-7
Digital Object Identifier: 10.1109/CVPR.2007.383008
Posted online: 2007-07-16 13:18:42.0

Abstract

This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a novel motion representation, “motons”, inspired by research in object recognition. Second, we propose learning the segmentation likelihood from the spatial context of motion. The learning is efficiently performed by Random Forests. The third contribution is a general taxonomy of tree-based classifiers, which facilitates theoretical and experimental comparisons of several known classification algorithms, as well as spawning new ones. Diverse visual cues such as motion, motion context, colour, contrast and spatial priors are fused together by means of a Conditional Random Field (CRF) model. Segmentation is then achieved by binary min-cut. Our algorithm requires no initialization. Experiments on many video-chat type sequences demonstrate the effectiveness of our algorithm in a variety of scenes. The segmentation results are comparable to those obtained by stereo systems.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2004) “Asymmetrically boosted HMM for speech reading”

June 2nd, 2004 Irfan Essa Posted in James Rehg, Papers, Pei Yin No Comments »

Asymmetrically boosted HMM for speech reading (IEEEXplore#)

Pei Yin Essa, I. Rehg, J.M.
GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
Publication Date: 27 June-2 July 2004
Volume: 2
On page(s): II-755 - II-761 Vol.2
Number of Pages: 2001
ISSN: 1063-6919
ISBN: 0-7695-2158-4
INSPEC Accession Number:8161546
Digital Object Identifier: 10.1109/CVPR.2004.1315240
Posted online: 2004-07-19 11:09:26.0

Abstract

Speech reading, also known as lip reading, is aimed at extracting visual cues of lip and facial movements to aid in recognition of speech. The main hurdle for speech reading is that visual measurements of lip and facial motion lack information-rich features like the Mel frequency cepstral coefficients (MFCC), widely used in acoustic speech recognition. These MFCC are used with hidden Markov models (HMM) in most speech recognition systems at present. Speech reading could greatly benefit from automatic selection and formation of informative features from measurements in the visual domain. These new features can then be used with HMM to capture the dynamics of lip movement and eventual recognition of lip shapes. Towards this end, we use AdaBoost methods for automatic visual feature formation. Specifically, we design an asymmetric variant of AdaBoost M2 algorithm to deal with the ill-posed multi-class sample distribution inherent in our problem. Our experiments show that the boosted HMM approach outperforms conventional AdaBoost and HMM classifiers. Our primary contributions are in the design of (a) boosted HMM and (b) asymmetric multi-class boosting.

AddThis Social Bookmark Button

Paper: Asilomar Conference (2003) “Boosted audio-visual HMM for speech reading”

November 9th, 2003 Irfan Essa Posted in Face and Gesture, James Rehg, Numerical Machine Learning, Papers, Pei Yin No Comments »

Boosted audio-visual HMM for speech reading (IEEEXplore)

Yin, P. Essa, I. Rehg, J.M.
GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Signals, Systems and Computers, 2003. Conference Record of the Thirty-Seventh Asilomar Conference on
Publication Date: 9-12 Nov. 2003
Volume: 2
On page(s): 2013 - 2018 Vol.2
Number of Pages: 2361
ISSN:
ISBN: 0-7803-8104-1
INSPEC Accession Number:8555396
Digital Object Identifier: 10.1109/ACSSC.2003.1292334
Posted online: 2004-05-04 13:54:35.0
Abstract
We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.

AddThis Social Bookmark Button