April 3rd, 2008 Irfan Essa Posted in Face and Gesture, James Rehg, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin, Thad Starner No Comments »
Pei Yin, Irfan Essa, James Rehg, Thad Starner (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”, ICASSP 2008 - March 30 - April 4, 2008 - Las Vegas, Nevada, U.S.A. (Paper: MLSP-P3.D8, Session: Pattern Recognition and Classification II, Time: Thursday, April 3, 15:30 - 17:30, Topic: Machine Learning for Signal Processing: Learning Theory and Modeling) (PDF|Project Site)
ABSTRACT
We address the feature selection problem for hidden Markov models (HMMs) in sequence classification. Temporal correlation in sequences often causes difficulty in applying feature selection techniques. Inspired by segmental k-means segmentation (SKS), we propose Segmentally Boosted HMMs (SBHMMs), where the state-optimized features are constructed in a segmental and discriminative manner. The contributions are twofold. First, we introduce a novel feature selection algorithm, where the temporal dynamics are decoupled from the static learning procedure by assuming that the sequential data are piecewise independent and identically distributed. Second, we show that the SBHMM consistently improves traditional HMM recognition in various domains. The reduction of error compared to traditional HMMs ranges from 17% to 70% in American Sign Language recognition, human gait identification, lip reading, and speech recognition.

November 1st, 2007 Irfan Essa Posted in Activity Recognition, Health Systems, Papers, Research No Comments »
N. Padoy, T. Blum, I. Essa, H. Feußner, M.O. Berger, N. Navab A Boosted Segmentation Method for Surgical Workflow Analysis Proceedings of Medical Image Computing and Computer-Assisted Intervention (MICCAI 2007) (to appear), Brisbane, Australia, Oct. 29 - Nov. 2 2007 (bib)
Abstract
As demands on hospital efficiency increase, there is a stronger need for automatic analysis, recovery, and modification of surgical workflows. Even though most of the previous work has dealt with higher level and hospital-wide workflow including issues like document management, workflow is also an important issue within the surgery room. Its study has a high potential, e.g., for building context-sensitive operating rooms, evaluating and training surgical staff, optimizing surgeries and generating automatic reports. In this paper we propose an approach to segment the surgical workflow into phases based on temporal synchronization of multidimensional state vectors. Our method is evaluated on the example of laparoscopic cholecystectomy with state vectors representing tool usage during the surgeries. The discriminative power of each instrument in regard to each phase is estimated using AdaBoost. A boosted version of the Dynamic Time Warping (DTW) algorithm is used to create a surgical reference model and to segment a newly observed surgery. Full cross-validation on ten surgeries is performed and the method is compared to standard DTW and to Hidden Markov Models.

October 29th, 2007 Irfan Essa Posted in A. Dan Fisk, Activity Recognition, Aware Home, Papers, Wendy Rogers No Comments »
FEATURE AT A GLANCE: Technology in the home environment has the potential to support older adults in a variety of ways. We took an interdisciplinary approach (human factors/ergonomics and computer science) to develop a technology “coach” that could support older adults in learning to use a medical device. Our system provided a computer vision system to track the use of a blood glucose meter and provide users with feedback if they made an error. This research could support the development of an in-home personal assistant to coach individuals in a variety of tasks necessary for independent living.
KEYWORDS: home technology, medical devices, support for learning
October 28th, 2007 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »
D. Minnen, I. Essa, C.L. Isbell, and T. Starner “Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery” In IEEE Int. Conf. on Data Mining (ICDM) 2007, Omaha, NE, October 28-31, 2007. [PDF]
Abstract
Discovering recurring patterns in time series data is a fundamental problem for temporal data mining. This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the corresponding relevant dimensions. While many approaches to motif discovery have been developed, most are restricted to categorical data, univariate time series, or multivariate data in which the temporal patterns span all of the dimensions. In this paper, we present an expected linear-time algorithm that addresses a generalization of multivariate pattern discovery in which each motif may span only a subset of the dimensions. To validate our algorithm, we discuss its theoretical properties and empirically evaluate it using several data sets including synthetic data and motion capture data collected by an on-body inertial sensor.

October 15th, 2007 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, PAMI/ICCV/CVPR/ECCV, Papers, Raffay Hamid No Comments »
Abstract
Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational complexity, and ability to capture activity structure only up to some fixed temporal scale. In this work, we propose Suffix Trees as an activity representation to efficiently extract structure of activities by analyzing their constituent event-subsequences over multiple temporal scales. We empirically compare Suffix Trees with some of the previous approaches in terms of feature cardinality, discriminative prowess, noise sensitivity and activity-class discovery. Finally, exploiting properties of Suffix Trees, we present a novel perspective on anomalous subsequences of activities, and propose an algorithm to detect them in linear-time. We present comparative results over experimental data, collected from a kitchen environment to demonstrate the competence of our proposed framework.

September 15th, 2007 Irfan Essa Posted in Computational Journalism, Nick Diakopoulos, Papers, Research No Comments »
N. Diakopoulos, K. Luther, Y. Medynskiy, I. Essa (2007) The Evolution of Authorship in a Remix Society, ACM Hypertext 2007 Conference, Manchester, UK, September 2007 Abstract
Authorship entails the constrained selection or generation of media and the organization and layout of that media in a larger structure. But authorship is more than just selection and organization; it is a complex construct incorporating concepts of originality, authority, intertextuality, and attribution. In this paper we explore these concepts and ask how they are changing in light of modes of collaborative authorship in remix culture. We present a qualitative case study of an online video remixing site, illustrating how the constraints of that environment are impacting authorial constructs. We discuss users’ self-conceptions as authors, and how values related to authorship are reflected to users through the interface and design of the site’s tools. We also present some implications for the design of online communities for collaborative media creation and remixing.
- N. Diakopoulos, K. Luther, Y. Medynskiy, I. Essa. The Evolution of Authorship in a Remix Society. In Proceedings of Hypertext and Hypermedia. Manchester, UK, September 2007[PDF]
- N. Diakopoulos, K. Luther, Y. Medynskiy, I. Essa. Remixing Authorship: Reconfiguring the Author in Online Video Remix Culture. Georgia Tech, Technical Report. GIT-IC-07-05. 2007. [PDF]

August 24th, 2007 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »
Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning
Abstract
The problem of locating motifs in real-valued, multivariate time series data involves the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several non-overlapping subsequences and constitutes a motif because all of the included subsequences are similar. The ability to automatically discover such motifs allows intelligent systems to form endogenously meaningful representations of their environment through unsupervised sensor analysis. In this paper, we formulate a unifying view of motif discovery as a problem of locating regions of high density in the space of all time series subsequences. Our approach is efficient (sub-quadratic in the length of the data), requires fewer user-specified parameters than previous methods, and naturally allows variable length motif occurrences and nonlinear temporal warping. We evaluate the performance of our approach using four data sets from different domains including on-body inertial sensors and speech.

June 17th, 2007 Irfan Essa Posted in Antonio Crimisini, Computational Photography and Video, John Winn, Numerical Machine Learning, Papers, Pei Yin, Research No Comments »
Tree-based Classifiers for Bilayer Video Segmentation (IEEE Explor)
Yin, Pei Criminisi, Antonio Winn, John Essa, Irfan
School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
This paper appears in: Computer Vision and Pattern Recognition, 2007. CVPR ‘07. IEEE Conference on
Publication Date: 17-22 June 2007
On page(s): 1 - 8
Number of Pages: 1 - 8
Location: Minneapolis, MN, USA
ISBN: 1-4244-1180-7
Digital Object Identifier: 10.1109/CVPR.2007.383008
Posted online: 2007-07-16 13:18:42.0
Abstract
This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a novel motion representation, “motons”, inspired by research in object recognition. Second, we propose learning the segmentation likelihood from the spatial context of motion. The learning is efficiently performed by Random Forests. The third contribution is a general taxonomy of tree-based classifiers, which facilitates theoretical and experimental comparisons of several known classification algorithms, as well as spawning new ones. Diverse visual cues such as motion, motion context, colour, contrast and spatial priors are fused together by means of a Conditional Random Field (CRF) model. Segmentation is then achieved by binary min-cut. Our algorithm requires no initialization. Experiments on many video-chat type sequences demonstrate the effectiveness of our algorithm in a variety of scenes. The segmentation results are comparable to those obtained by stereo systems.

April 15th, 2007 Irfan Essa Posted in Audio Analysis, Mitch Parry, Papers, Research No Comments »
Incorporating Phase Information for Source Separation via Spectrogram Factorization
Parry, R.M. Essa, I.
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA
This paper appears in: Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Publication Date: 15-20 April 2007
Volume: 2
On page(s): II-661 - II-664
Number of Pages: II-661 - II-664
Location: Honolulu, HI
ISSN: 1520-6149
ISBN: 1-4244-0728-1
INSPEC Accession Number:9497202
Digital Object Identifier: 10.1109/ICASSP.2007.366322
Posted online: 2007-06-04 10:15:41.0
Abstract
Spectrogram factorization methods have been proposed for single channel source separation and audio analysis. Typically, the mixture signal is first converted into a time-frequency representation such as the short-time Fourier transform (STFT). The phase information is thrown away and this spectrogram matrix is then factored into the sum of rank-one source spectrograms. This approach incorrectly assumes the mixture spectrogram is the sum of the source spectrograms. In fact, the mixture spectrogram depends on the phase of the source STFTs. We investigate the consequences of this common assumption and introduce an approach that leverages a probabilistic representation of phase to improve the separation results

October 23rd, 2006 Irfan Essa Posted in AAAI/IJCAI/UAI, Aaron Bobick, Activity Recognition, Aware Home, Papers, Raffay Hamid, Siddhartha Maddi No Comments »
- R. Hamid, S. Maddi, A. Bobick, I. Essa. “Unsupervised Analysis of Activity Sequences Using Event Motifs”, In proceedings of 4th ACM International Workshop on Video Surveillance and Sensor Networks (in conjunction with ACM Multimedia 2006).
Abstract
We present an unsupervised framework to discover characterizations of everyday human activities, and demonstrate how such representations can be used to extract points of interest in event-streams. We begin with the usage of Suffix Trees as an efficient activity-representation to analyze the global structural information of activities, using their local event statistics over the entire continuum of their temporal resolution. Exploiting this representation, we discover characterizing event-subsequences and present their usage in an ensemble-based framework for activity classification. Finally, we propose a method to automatically detect subsequences of events that are locally atypical in a structural sense. Results over extensive data-sets, collected from multiple sensor-rich environments are presented, to show the competence and scalability of the proposed framework.

October 15th, 2006 Irfan Essa Posted in Computational Photography and Video, Nick Diakopoulos, Papers, Research No Comments »
Diakopoulos, N. and Essa, I. (2006). Videotater: an approach for pen-based digital video segmentation and tagging. In Proceedings of the 19th Annual ACM Symposium on User interface Software and Technology (Montreux, Switzerland, October 15 - 18, 2006). UIST ‘06. ACM Press, New York, NY, 221-224. [DOI]
Abstract
The continuous growth of media databases necessitates development of novel visualization and interaction techniques to support management of these collections. We present Videotater, an
experimental tool for a Tablet PC that supports the efficient and intuitive navigation, selection, segmentation, and tagging of video. Our veridical representation immediately signals to the user where appropriate segment boundaries should be placed and allows for rapid review and refinement of manually or automatically generated segments. Finally, we explore a distribution of modalities in the interface by using multiple timeline representations, pressure sensing, and a tag painting/erasing metaphor with the pen.

October 14th, 2006 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »
Discovering Characteristic Actions from On-Body Sensor Data (IEEEXplore)
Minnen, D. Starner, T. Essa, I. Isbell, C.
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332 USA. dminn@cc.gatech.edu
This paper appears in: Wearable Computers, 2006 10th IEEE International Symposium on
Publication Date: Oct. 2006
On page(s): 11 - 18
Number of Pages: 11 - 18
Location: Montreux, Switzerland
ISSN: 1550-4816
ISBN: 1-4244-0598-x
Digital Object Identifier: 10.1109/ISWC.2006.286337
Posted online: 2007-01-22 09:58:15.0
Abstract
We present an approach to activity discovery, the unsupervised identification and modeling of human actions embedded in a larger sensor stream. Activity discovery can be seen as the inverse of the activity recognition problem. Rather than learn models from hand-labeled sequences, we attempt to discover motifs, sets of similar subsequences within the raw sensor stream, without the benefit of labels or manual segmentation. These motifs are statistically unlikely and thus typically correspond to important or characteristic actions within the activity. The problem of activity discovery differs from typicalmotif discovery, such as locating protein binding sites, because of the nature of time series data representing human activity. For example, in activity data, motifs will tend to be sparsely distributed, vary in length, and may only exhibit intra-motif similarity after appropriate time warping. In this paper, we motivate the activity discovery problem and present our approach for efficient discovery of meaningful actions from sensor data representing human activity. We empirically evaluate the approach on an exercise data set captured by a wrist-mounted, three-axis inertial sensor. Our algorithm successfully discovers motifs that correspond to the real exercises with a recall rate of 96.3% and overall accuracy of 86.7% over six exercises and 864 occurrences.

September 30th, 2006 Irfan Essa Posted in Computational Photography and Video, Mitch Parry, Papers, Research No Comments »
Experiences with optimizing two stream-based applications for cluster execution Angelov, Y., Ramachandran, U., Mackenzie, K., Rehg, J. M., and Essa, I. 2005. “Experiences with optimizing two stream-based applications for cluster execution”. J. Parallel Distrib. Comput. 65, 6 (Jun. 2005), 678-691. [DOI]
Abstract
We explore optimization strategies and resulting performance of two stream-based video applications, video texture and color tracker, on a cluster of SMPs. The two applications are representative of a class of emerging applications, which we call “stream-based applications”, that are sensitive to both latency of individual results and overall throughput. Such applications require non-trivial parallelization techniques in order to improve both latency and throughput, given that the stream data emanates from a limited set of sources (exactly one in the two applications studied) and that the distribution of the data cannot be done a priori.We suggest techniques that address in a coordinated fashion the problems of data distribution and work partitioning. We believe the two problems are related and need to be addressed together. We have parallelized two applications using the Stampede cluster programming system that provides abstractions for implementing time-and throughput-sensitive applications elegantly and efficiently. For the Video Textures application we show that we can achieve a speedup of 24.26 on a 112 processor cluster. For the Color Tracker application, where latency is more crucial, we identify the extent of data parallelism that ensures that the slowest member of the pipeline is no longer the bottleneck for achieving a decent frame rate.

June 14th, 2006 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, Papers, Research, Yifan Shi No Comments »
Learning Temporal Sequence Model from Partially Labeled Data (IEEEXplore)
Yifan Shi Bobick, A. Essa, I.
Georgia Institute Of Technology, Atalanta
This paper appears in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
Publication Date: 2006
Volume: 2
On page(s): 1631 - 1638
ISSN: 1063-6919
ISBN: 0-7695-2597-0
Digital Object Identifier: 10.1109/CVPR.2006.174
Posted online: 2006-10-09 11:11:21.0
Abstract
Graphical models are often used to represent and recognize activities. Purely unsupervised methods (such as HMMs) can be trained automatically but yield models whose internal structure - the nodes - are difficult to interpret semantically. Manually constructed networks typically have nodes corresponding to sub-events, but the programming and training of these networks is tedious and requires extensive domain expertise. In this paper, we propose a semi-supervised approach in which a manually structured, Propagation Network (a form of a DBN) is initialized from a small amount of fully annotated data, and then refined by an EM-based learning method in an unsupervised fashion. During node refinement (the M step) a boosting-based algorithm is employed to train the evidence detectors of individual nodes. Experiments on a variety of data types - vision and inertial measurements - in several tasks demonstrate the ability to learn from as little as one fully annotated example accompanied by a small number of positive but non-annotated training examples. The system is applied to both recognition and anomaly detection tasks.

June 14th, 2006 Irfan Essa Posted in Greg Turk, Modeling and Animation, Papers, Research No Comments »
Element-Free Elastic Models for Volume Fitting and Capture (IEEEXplore)
Jaeil Choi Szymczak, A. Turk, G. Essa, I.
Georgia Institute of Technology
This paper appears in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
Publication Date: 2006
Volume: 2
On page(s): 2245 - 2252
ISSN: 1063-6919
ISBN: 0-7695-2597-0
Digital Object Identifier: 10.1109/CVPR.2006.110
Posted online: 2006-10-09 11:11:24.0
Abstract
We present a new method of fitting an element-free volumetric model to a sequence of deforming surfaces of a moving object. Given a sequence of visual hulls, we iteratively fit an element-free elastic model to the visual hull in order to extract the optimal pose of the captured volume. The fitting of the volumetric model is acheived by minimizing a combination of elastic potential energy, a surface distance measure, and a self-intersection penalty for each frame. A unique aspect of our work is that the model is mesh free - since the model is represented as a point cloud, it is easy to construct, manipulate and update the model as needed. Additionally, linear elasicity with rotation compensation makes it possible to handle local deformations and large rotations of body parts much more efficiently than other volume fitting approaches. Our experimental results for volume fitting and capture in a multi-view camera setting demonstrate the robustness of element-free elastic models against noise and self-occlusions.

May 14th, 2006 Irfan Essa Posted in Audio Analysis, Mitch Parry, Papers, Research No Comments »
Source Detection Using Repetitive Structure (IEEEXplore)
Parry, R.M. Essa, I.
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA
This paper appears in: Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Publication Date: 14-19 May 2006
Volume: 4
On page(s): IV - IV
Number of Pages: IV - IV
Location: Toulouse
ISSN: 1520-6149
ISBN: 1-4244-0469-X
INSPEC Accession Number:9154520
Digital Object Identifier: 10.1109/ICASSP.2006.1661163
Posted online: 2006-09-18 09:38:57.0
Abstract
Blind source separation algorithms typically require that the number of sources are known in advance. However, it is often the case that the number of sources change over time and that the total number is not known. Existing source separation techniques require source number estimation methods to determine how many sources are active within the mixture signals. These methods typically operate on the covariance matrix of mixture recordings and require fewer active sources than mixtures. When sources do not overlap in the time-frequency domain, more sources than mixtures may be detected and then separated. However, separating more sources than mixtures when sources overlap in time and frequency poses a particularly difficult problem. This paper addresses the issue of source detection when more sources than sensors overlap in time and frequency. We show that repetitive structure in the form of time-time correlation matrices can reveal when each source is active

July 25th, 2005 Irfan Essa Posted in Aaron Bobick, Computational Photography and Video, Nipun Kwatra, Papers, Research, SIGGRAPH/SCA/NPAR/EG, Vivek Kwatra No Comments »
Vivek Kwatra, Irfan Essa, Aaron Bobick, and Nipun Kwatra (2005), “Texture optimization for example-based synthesis” In ACM Transactions on Graphics (TOG) Volume 24 , Issue 3 (July 2005) Proceedings of ACM SIGGRAPH 2005, Pages: 795 - 802, ISSN:0730-0301 (DOI|PDF|Project Site|Video|Talk)
ABSTRACT
We present a novel technique for texture synthesis using optimization. We define a Markov Random Field (MRF)-based similarity metric for measuring the quality of synthesized texture with respect to a given input sample. This allows us to formulate the synthesis problem as minimization of an energy function, which is optimized using an Expectation Maximization (EM)-like algorithm. In contrast to most example-based techniques that do region-growing, ours is a joint optimization approach that progressively refines the entire texture. Additionally, our approach is ideally suited to allow for controllable synthesis of textures. Specifically, we demonstrate controllability by animating image textures using flow fields. We allow for general two-dimensional flow fields that may dynamically change over time. Applications of this technique include dynamic texturing of fluid animations and texture-based flow visualization.

June 22nd, 2005 Irfan Essa Posted in Computational Photography and Video, Papers No Comments »
Video-based nonphotorealistic and expressive illustration of motion (IEEEXplore)
Kim, B. Essa, I.
GVU Center & Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Computer Graphics International 2005
Publication Date: 22-24 June 2005
On page(s): 32 - 35
Number of Pages: xi 286
ISSN: 1530-1052
ISBN: 0-7803-9330-9
INSPEC Accession Number:8632735
Digital Object Identifier: 10.1109/CGI.2005.1500363
Posted online: 2005-08-29 08:56:32.0
Abstract
We present a semi-automatic approach for adding expressive renderings to images and videos that highlight motions and movement. Our technique relies on motion analysis of video where the motion information from the image sequence is used to add expressive information. The first step in our approach is to extract a moving region of the video by segmenting and then grouping regions of compatible motions. In the second step, a user can interactively choose or refine a grouping region that represents the moving object of interest. In the third and final stage, the user can apply various visual effects such as a temporal-flare, time-lapse, and particle-effects. We have implemented a prototype system that can be used to illustrate and expressively render motions in videos and images, with simple user interaction. Our system can deal with most translational and rotational motions without a need for a fixed background.

June 20th, 2005 Irfan Essa Posted in Activity Recognition, Aware Home, PAMI/ICCV/CVPR/ECCV, Papers, Yan Huang No Comments »
Tracking multiple objects through occlusions (IEEEXplore#)
Huang, Y. Essa, I.
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
Publication Date: 20-25 June 2005
Volume: 2
On page(s): 1051 - 1058 vol. 2
Number of Pages: 2 vol. (xxxvii 1216)
ISSN: 1063-6919
ISBN: 0-7695-2372-2
INSPEC Accession Number:8633324
Digital Object Identifier: 10.1109/CVPR.2005.350
Posted online: 2005-07-25 08:18:55.0
Abstract
We present an approach for tracking varying number of objects through both temporally and spatially significant occlusions. Our method builds on the idea of object permanence to reason about occlusions. To this end, tracking is performed at both the region level and the object level. At the region level, a customized genetic algorithm is used to search for optimal region tracks. This limits the scope of object trajectories. At the object level, each object is located based on adaptive appearance models, spatial distributions and inter-occlusion relationships. The proposed architecture is capable of tracking objects even in the presence of long periods of full occlusions. We demonstrate the viability of this approach by experimenting on several videos of a user interacting with a variety of objects on a desktop.

September 30th, 2004 Irfan Essa Posted in Aware Home, Papers, Security No Comments »
June 7th, 2004 Irfan Essa Posted in Computational Photography and Video, James Hays, Non-Photorealism, Papers, SIGGRAPH/SCA/NPAR/EG No Comments »
James Hays and Irfan Essa (2004) “Image and video based painterly animation” In Proceedings of the 3rd international symposium on Non-photorealistic animation and rendering (NPAR 2004), Annecy, France, June 7-9, 2004, pages, 113 - 120, ISBN:1-58113-887-3, 2004 (DOI|PDF|Project Web Site).
ABSTRACT
We present techniques for transforming images and videos into painterly animati
ons depicting different artistic styles. Our techniques rely on image and video analysis to compute appearance and motion properties. We also determine and apply motion information from different (user-specified) sources to static and moving images. These properties that encode spatio-temporal variations are then used to render (or paint) effects of selected styles to generate images and videos with a painted look. Painterly animations are generated using a mesh of brush stroke objects with dynamic spatio-temporal properties. Styles govern the behavior of these brush strokes as well as their rendering to a virtual canvas. We present methods for modifying the properties of these brush strokes according to the input images, videos, or motions. Brush stroke color, length, orientation, opacity, and motion are determined and the brush strokes are regenerated to fill the canvas as the video changes. All brush stroke properties are temporally constrained to guarantee temporally coherent non-photorealistic animations.

June 2nd, 2004 Irfan Essa Posted in James Rehg, Papers, Pei Yin No Comments »
Asymmetrically boosted HMM for speech reading (IEEEXplore#)
Pei Yin Essa, I. Rehg, J.M.
GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
Publication Date: 27 June-2 July 2004
Volume: 2
On page(s): II-755 - II-761 Vol.2
Number of Pages: 2001
ISSN: 1063-6919
ISBN: 0-7695-2158-4
INSPEC Accession Number:8161546
Digital Object Identifier: 10.1109/CVPR.2004.1315240
Posted online: 2004-07-19 11:09:26.0
Abstract
Speech reading, also known as lip reading, is aimed at extracting visual cues of lip and facial movements to aid in recognition of speech. The main hurdle for speech reading is that visual measurements of lip and facial motion lack information-rich features like the Mel frequency cepstral coefficients (MFCC), widely used in acoustic speech recognition. These MFCC are used with hidden Markov models (HMM) in most speech recognition systems at present. Speech reading could greatly benefit from automatic selection and formation of informative features from measurements in the visual domain. These new features can then be used with HMM to capture the dynamics of lip movement and eventual recognition of lip shapes. Towards this end, we use AdaBoost methods for automatic visual feature formation. Specifically, we design an asymmetric variant of AdaBoost M2 algorithm to deal with the ill-posed multi-class sample distribution inherent in our problem. Our experiments show that the boosted HMM approach outperforms conventional AdaBoost and HMM classifiers. Our primary contributions are in the design of (a) boosted HMM and (b) asymmetric multi-class boosting.

June 2nd, 2004 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, David Minnen, Papers, Yan Huang, Yifan Shi No Comments »
Propagation networks for recognition of partially ordered sequential action (IEEEXplore)
Yifan Shi Yan Huang Minnen, D. Bobick, A. Essa, I.
GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
Publication Date: 27 June-2 July 2004
Volume: 2
On page(s): II-862 - II-869 Vol.2
Number of Pages: 2001
ISSN: 1063-6919
ISBN: 0-7695-2158-4
INSPEC Accession Number:8161557
Digital Object Identifier: 10.1109/CVPR.2004.1315255
Posted online: 2004-07-19 11:09:30.0
Abstract
We present propagation networks (P-nets), a novel approach for representing and recognizing sequential activities that include parallel streams of action. We represent each activity using partially ordered intervals. Each interval is restricted by both temporal and logical constraints, including information about its duration and its temporal relationship with other intervals. P-nets associate one node with each temporal interval. Each node is triggered according to a probability density function that depends on the state of its parent nodes. Each node also has an associated observation function that characterizes supporting perceptual evidence. To facilitate real-time analysis, we introduce a particle filter framework to explore the conditional state space. We modify the original condensation algorithm to more efficiently sample a discrete state space (D-condensation). Experiments in the domain of blood glucose monitor calibration demonstrate both the representational power of P-nets and the effectiveness of the D-condensation algorithm.

November 9th, 2003 Irfan Essa Posted in Face and Gesture, James Rehg, Numerical Machine Learning, Papers, Pei Yin No Comments »
Boosted audio-visual HMM for speech reading (IEEEXplore)
Yin, P. Essa, I. Rehg, J.M.
GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Signals, Systems and Computers, 2003. Conference Record of the Thirty-Seventh Asilomar Conference on
Publication Date: 9-12 Nov. 2003
Volume: 2
On page(s): 2013 - 2018 Vol.2
Number of Pages: 2361
ISSN:
ISBN: 0-7803-8104-1
INSPEC Accession Number:8555396
Digital Object Identifier: 10.1109/ACSSC.2003.1292334
Posted online: 2004-05-04 13:54:35.0
Abstract
We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.

October 20th, 2003 Irfan Essa Posted in Collaborators, Computational Journalism, Dick Lipton, Jim Xu, Papers, Research No Comments »
Mandatory human participation: a new authentication scheme for building secure systems (IEEEXplore#)
Xu, J. Lipton, R. Essa, I. Sung, M. Zhu, Y.
In Proceedings. The 12th International Conference on Computer Communications and Networks, 2003. ICCCN 2003. , 20-22 Oct. 2003, pp 547 - 552, ISSN: 1095-2055, ISBN: 0-7803-7945-4, DOI: 10.1109/ICCCN.2003.1284222
Abstract
Mandatory human participation (MHP) is a novel authentication scheme that asks the question “are you human?” (Instead of “who are you?”), and upon the correct answer to this question, can prove a principal to be a human being instead of a computer program. MHP helps solve old and new problems in computer security that existing security measures cannot address properly, including password (or PIN number) guessing attacks and application-level denial of service. A key component of this “are you human?” authentication process is a character morphing algorithm that transforms a character string into its graphical form in such a way that a human being won’t have any problem recognizing the original string, while a computer program (e.g., an optical character recognition program), will not be able to decipher it or make a correct guess with nonnegligible probability. The basic idea of the MHP scheme is to ask an agent to recognize the string before its login attempts or transaction requests can be honored. Here a protocol is needed to send a puzzle to an agent, check if the answer supplied by the agent is correct, and most importantly make sure that the agent cannot cheat in the process. A number of system and security issues that relate to the protocol need to be addressed for the protocol to be secure, efficient, robust, and user-friendly. The MHP scheme contributes to the foundation of the computer security by faithfully implementing novel security semantics, “human,” which existing cryptographic measures cannot express accurately. As many real-world security applications involve the interaction between a human and a computer, which naturally contains “human” as a part of its protocol semantics, we believe that the MHP scheme will find many new applications in the future.

July 25th, 2003 Irfan Essa Posted in Aaron Bobick, Arno Schödl, Computational Photography and Video, Greg Turk, Papers, SIGGRAPH/SCA/NPAR/EG, Vivek Kwatra No Comments »
Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, Aaron Bobick (2003), “Graphcut textures: image and video synthesis using graph cuts” In ACM Transactions on Graphics (TOG), Volume 22 , Issue 3, Proceedings of ACM SIGGRAPH 2003, Pages: 277 - 286, July 2003, ISSN:0730-0301. (DOI|Paper| SIGGRAPH Video (160 MB, 50 MB) |