Paper: ICASSP (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”

April 3rd, 2008 Irfan Essa Posted in Face and Gesture, James Rehg, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin, Thad Starner No Comments »

Pei Yin, Irfan Essa, James Rehg, Thad Starner (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”, ICASSP 2008 - March 30 - April 4, 2008 - Las Vegas, Nevada, U.S.A. (Paper: MLSP-P3.D8, Session: Pattern Recognition and Classification II, Time: Thursday, April 3, 15:30 - 17:30, Topic: Machine Learning for Signal Processing: Learning Theory and Modeling) (PDF|Project Site)

ABSTRACT

icassp08We address the feature selection problem for hidden Markov models (HMMs) in sequence classification. Temporal correlation in sequences often causes difficulty in applying feature selection techniques. Inspired by segmental k-means segmentation (SKS), we propose Segmentally Boosted HMMs (SBHMMs), where the state-optimized features are constructed in a segmental and discriminative manner. The contributions are twofold. First, we introduce a novel feature selection algorithm, where the temporal dynamics are decoupled from the static learning procedure by assuming that the sequential data are piecewise independent and identically distributed. Second, we show that the SBHMM consistently improves traditional HMM recognition in various domains. The reduction of error compared to traditional HMMs ranges from 17% to 70% in American Sign Language recognition, human gait identification, lip reading, and speech recognition.

AddThis Social Bookmark Button

Funding: NSF/SGER (2007) “Persistent, Adaptive, Collaborative Synthespians”

September 15th, 2007 Irfan Essa Posted in Charles Isbell, Numerical Machine Learning No Comments »

Award#0749181 - SGER Collaborative Research: Persistent, Adaptive, Collaborative Synthespians
ABSTRACT

This project explores the development of methodologies for populating worlds with persistent, adaptive, collaborative, believable synthetic actors, referred to as Synthespians. These methods are extensions of adaptive models of learning and planning to accommodate the complex, dynamic environments in massive multi-player online games. The intellectual merit includes the development and evaluation of: 1. A behavior development language, with discovery, machine learning, and adaptation of behaviors directly integrated into the language, allowing for the rapid development and deployment of Synthespians. 2. A framework for the actors to recognize and discover plans by observing and modeling the activities of the other agents. An expected outcome of this research is the ability to author complex virtual worlds with many participants that support intelligent and effective interaction between people and machines. Broader Impact: A scientific understanding of how we interact with each other and collaborate will benefit from our ability to simulate complex environments with dynamic and evolving individual and group behaviors. In this project, building and modeling such environments and behaviors is done within a gaming context. This work will in the long run effect and change the fields of education and entertainment. In addition, being able to model large collaborative and interactive scenarios will also help us understand and model large social dynamics phenomenon of interest to sociologists and economists.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2007) “Tree-based Classifiers for Bilayer Video Segmentation”

June 17th, 2007 Irfan Essa Posted in Antonio Crimisini, Computational Photography and Video, John Winn, Numerical Machine Learning, Papers, Pei Yin, Research No Comments »

Tree-based Classifiers for Bilayer Video Segmentation (IEEE Explor)

Yin, Pei Criminisi, Antonio Winn, John Essa, Irfan
School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
This paper appears in: Computer Vision and Pattern Recognition, 2007. CVPR ‘07. IEEE Conference on
Publication Date: 17-22 June 2007
On page(s): 1 - 8
Number of Pages: 1 - 8
Location: Minneapolis, MN, USA
ISBN: 1-4244-1180-7
Digital Object Identifier: 10.1109/CVPR.2007.383008
Posted online: 2007-07-16 13:18:42.0

Abstract

This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a novel motion representation, “motons”, inspired by research in object recognition. Second, we propose learning the segmentation likelihood from the spatial context of motion. The learning is efficiently performed by Random Forests. The third contribution is a general taxonomy of tree-based classifiers, which facilitates theoretical and experimental comparisons of several known classification algorithms, as well as spawning new ones. Diverse visual cues such as motion, motion context, colour, contrast and spatial priors are fused together by means of a Conditional Random Field (CRF) model. Segmentation is then achieved by binary min-cut. Our algorithm requires no initialization. Experiments on many video-chat type sequences demonstrate the effectiveness of our algorithm in a variety of scenes. The segmentation results are comparable to those obtained by stereo systems.

AddThis Social Bookmark Button

Paper: Asilomar Conference (2003) “Boosted audio-visual HMM for speech reading”

November 9th, 2003 Irfan Essa Posted in Face and Gesture, James Rehg, Numerical Machine Learning, Papers, Pei Yin No Comments »

Boosted audio-visual HMM for speech reading (IEEEXplore)

Yin, P. Essa, I. Rehg, J.M.
GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Signals, Systems and Computers, 2003. Conference Record of the Thirty-Seventh Asilomar Conference on
Publication Date: 9-12 Nov. 2003
Volume: 2
On page(s): 2013 - 2018 Vol.2
Number of Pages: 2361
ISSN:
ISBN: 0-7803-8104-1
INSPEC Accession Number:8555396
Digital Object Identifier: 10.1109/ACSSC.2003.1292334
Posted online: 2004-05-04 13:54:35.0
Abstract
We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.

AddThis Social Bookmark Button