August 19th, 2008 Irfan Essa Posted in DVFX, Frank Dellaert No Comments »
I am very pleased that my colleague (and friend) Professor Frank Dellaert has taken over my DVFX class that I have been teaching since 1999 (see site here). It is clear already that this new edition of the DVFX class will be even more exciting then the previous editions. Can’t wait to see the final videos. Check out the info on the class at CS 4480 DVFX, Fall 08.
August 7th, 2008 Irfan Essa Posted in Computational Journalism, Nick Diakopoulos No Comments »
Audio Puzzler Alpha (ONLINE DEMO)
By Nick Diakopoulos (My PhD Student)
Audio Puzzler is a new kind of puzzle game based on unauthored content found online. The audio for the puzzles is taken from popular or interesting video clips from different genres such as news, documentary, or television. The audio puzzler is the type of game that harnesses people’s play to also provide valuable data which enriches the content played with. This is in the same vein as the ESPGame, the Listen Game, and PhotoPlay, which are all games which gather data in the process of game play. But while the data collected by these other games is useful for machine learning, the data collected with audio puzzler is immediately valuable as a transcription of the speech in the video. A similar effort (but in a much grander domain) is the Fold It project which seeks to harness playtime to solve protein folding problems. Much more detailed information about the evaluation of the technology will be forthcoming in a paper to be published at ACM Multimedia in October.

April 3rd, 2008 Irfan Essa Posted in Face and Gesture, James Rehg, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin, Thad Starner No Comments »
Pei Yin, Irfan Essa, James Rehg, Thad Starner (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”, ICASSP 2008 - March 30 - April 4, 2008 - Las Vegas, Nevada, U.S.A. (Paper: MLSP-P3.D8, Session: Pattern Recognition and Classification II, Time: Thursday, April 3, 15:30 - 17:30, Topic: Machine Learning for Signal Processing: Learning Theory and Modeling) (PDF|Project Site)
ABSTRACT
We address the feature selection problem for hidden Markov models (HMMs) in sequence classification. Temporal correlation in sequences often causes difficulty in applying feature selection techniques. Inspired by segmental k-means segmentation (SKS), we propose Segmentally Boosted HMMs (SBHMMs), where the state-optimized features are constructed in a segmental and discriminative manner. The contributions are twofold. First, we introduce a novel feature selection algorithm, where the temporal dynamics are decoupled from the static learning procedure by assuming that the sequential data are piecewise independent and identically distributed. Second, we show that the SBHMM consistently improves traditional HMM recognition in various domains. The reduction of error compared to traditional HMMs ranges from 17% to 70% in American Sign Language recognition, human gait identification, lip reading, and speech recognition.

March 30th, 2008 Irfan Essa Posted in Events, Greg Turk, SIGGRAPH/SCA/NPAR/EG No Comments »
ACM SIGGRAPH 2008 Paper’s Committee Meeting was held at GA Tech in Atlanta, March 29-30, under the leadership of Greg Turk. Following is a picture of all of us at work, with our sigs, as a note of thanks for Greg

Original Photo by myself, this version with sigs by Fredo Durand.
February 15th, 2008 Irfan Essa Posted in Events, Nick Diakopoulos No Comments »
Working with Brad Stenger (Wired), Nick Diakopoulos (GA Tech), Sergio Goldenberg (GA Tech), we are organizing a Symposium on computation+journalism, to bring together computationalists, internet/media experts, and journalists together for a series of panels, presentations, and discussion around how computing technologies are effecting (and changing) journalism practices. We have over 180 people registered and it promise to be a great first-of-its-kind event. This event is being hosted by the GVU Center at Georgia Tech.
October 29th, 2007 Irfan Essa Posted in A. Dan Fisk, Activity Recognition, Aware Home, Papers, Wendy Rogers No Comments »
FEATURE AT A GLANCE: Technology in the home environment has the potential to support older adults in a variety of ways. We took an interdisciplinary approach (human factors/ergonomics and computer science) to develop a technology “coach” that could support older adults in learning to use a medical device. Our system provided a computer vision system to track the use of a blood glucose meter and provide users with feedback if they made an error. This research could support the development of an in-home personal assistant to coach individuals in a variety of tasks necessary for independent living.
KEYWORDS: home technology, medical devices, support for learning
October 28th, 2007 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »
D. Minnen, I. Essa, C.L. Isbell, and T. Starner “Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery” In IEEE Int. Conf. on Data Mining (ICDM) 2007, Omaha, NE, October 28-31, 2007. [PDF]
Abstract
Discovering recurring patterns in time series data is a fundamental problem for temporal data mining. This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the corresponding relevant dimensions. While many approaches to motif discovery have been developed, most are restricted to categorical data, univariate time series, or multivariate data in which the temporal patterns span all of the dimensions. In this paper, we present an expected linear-time algorithm that addresses a generalization of multivariate pattern discovery in which each motif may span only a subset of the dimensions. To validate our algorithm, we discuss its theoretical properties and empirically evaluate it using several data sets including synthetic data and motion capture data collected by an on-body inertial sensor.

October 15th, 2007 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, PAMI/ICCV/CVPR/ECCV, Papers, Raffay Hamid No Comments »
Abstract
Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational complexity, and ability to capture activity structure only up to some fixed temporal scale. In this work, we propose Suffix Trees as an activity representation to efficiently extract structure of activities by analyzing their constituent event-subsequences over multiple temporal scales. We empirically compare Suffix Trees with some of the previous approaches in terms of feature cardinality, discriminative prowess, noise sensitivity and activity-class discovery. Finally, exploiting properties of Suffix Trees, we present a novel perspective on anomalous subsequences of activities, and propose an algorithm to detect them in linear-time. We present comparative results over experimental data, collected from a kitchen environment to demonstrate the competence of our proposed framework.

October 9th, 2007 Irfan Essa Posted in Audio Analysis, Mitch Parry, PhD, Thesis No Comments »
Mitch Parry (2007), Separation and Analysis of Multichannel Signals PhD Thesis [PDF], Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Irfan Essa)
Abstract
This thesis examines a large and growing class of digital signals that capture the combined effect of multiple underlying factors. In order to better understand these signals, we would like to separate and analyze the underlying factors independently. Although source separation applies to a wide variety of signals, this thesis focuses on separating individual instruments from a musical recording. In particular, we propose novel algorithms for separating instrument recordings given only their mixture. When the number of source signals does not exceed the number of mixture signals, we focus on a subclass of source separation algorithms based on joint diagonalization. Each approach leverages a different form of source structure. We introduce repetitive structure as an alternative that leverages unique repetition patterns in music and compare its performance against the other techniques.
When the number of source signals exceeds the number of mixtures (i.e., the underdetermined problem), we focus on spectrogram factorization techniques for source separation. We extend single-channel techniques to utilize the additional spatial information in multichannel recordings, and use phase information to improve the estimation of the underlying components.

September 15th, 2007 Irfan Essa Posted in Computational Journalism, Nick Diakopoulos, Papers, Research No Comments »
N. Diakopoulos, K. Luther, Y. Medynskiy, I. Essa (2007) The Evolution of Authorship in a Remix Society, ACM Hypertext 2007 Conference, Manchester, UK, September 2007 Abstract
Authorship entails the constrained selection or generation of media and the organization and layout of that media in a larger structure. But authorship is more than just selection and organization; it is a complex construct incorporating concepts of originality, authority, intertextuality, and attribution. In this paper we explore these concepts and ask how they are changing in light of modes of collaborative authorship in remix culture. We present a qualitative case study of an online video remixing site, illustrating how the constraints of that environment are impacting authorial constructs. We discuss users’ self-conceptions as authors, and how values related to authorship are reflected to users through the interface and design of the site’s tools. We also present some implications for the design of online communities for collaborative media creation and remixing.
- N. Diakopoulos, K. Luther, Y. Medynskiy, I. Essa. The Evolution of Authorship in a Remix Society. In Proceedings of Hypertext and Hypermedia. Manchester, UK, September 2007[PDF]
- N. Diakopoulos, K. Luther, Y. Medynskiy, I. Essa. Remixing Authorship: Reconfiguring the Author in Online Video Remix Culture. Georgia Tech, Technical Report. GIT-IC-07-05. 2007. [PDF]

September 15th, 2007 Irfan Essa Posted in Charles Isbell, Numerical Machine Learning No Comments »
Award#0749181 - SGER Collaborative Research: Persistent, Adaptive, Collaborative Synthespians
ABSTRACT
This project explores the development of methodologies for populating worlds with persistent, adaptive, collaborative, believable synthetic actors, referred to as Synthespians. These methods are extensions of adaptive models of learning and planning to accommodate the complex, dynamic environments in massive multi-player online games. The intellectual merit includes the development and evaluation of: 1. A behavior development language, with discovery, machine learning, and adaptation of behaviors directly integrated into the language, allowing for the rapid development and deployment of Synthespians. 2. A framework for the actors to recognize and discover plans by observing and modeling the activities of the other agents. An expected outcome of this research is the ability to author complex virtual worlds with many participants that support intelligent and effective interaction between people and machines. Broader Impact: A scientific understanding of how we interact with each other and collaborate will benefit from our ability to simulate complex environments with dynamic and evolving individual and group behaviors. In this project, building and modeling such environments and behaviors is done within a gaming context. This work will in the long run effect and change the fields of education and entertainment. In addition, being able to model large collaborative and interactive scenarios will also help us understand and model large social dynamics phenomenon of interest to sociologists and economists.

August 24th, 2007 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »
Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning
Abstract
The problem of locating motifs in real-valued, multivariate time series data involves the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several non-overlapping subsequences and constitutes a motif because all of the included subsequences are similar. The ability to automatically discover such motifs allows intelligent systems to form endogenously meaningful representations of their environment through unsupervised sensor analysis. In this paper, we formulate a unifying view of motif discovery as a problem of locating regions of high density in the space of all time series subsequences. Our approach is efficient (sub-quadratic in the length of the data), requires fewer user-specified parameters than previous methods, and naturally allows variable length motif occurrences and nonlinear temporal warping. We evaluate the performance of our approach using four data sets from different domains including on-body inertial sensors and speech.

June 17th, 2007 Irfan Essa Posted in Antonio Crimisini, Computational Photography and Video, John Winn, Numerical Machine Learning, Papers, Pei Yin, Research No Comments »
Tree-based Classifiers for Bilayer Video Segmentation (IEEE Explor)
Yin, Pei Criminisi, Antonio Winn, John Essa, Irfan
School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
This paper appears in: Computer Vision and Pattern Recognition, 2007. CVPR ‘07. IEEE Conference on
Publication Date: 17-22 June 2007
On page(s): 1 - 8
Number of Pages: 1 - 8
Location: Minneapolis, MN, USA
ISBN: 1-4244-1180-7
Digital Object Identifier: 10.1109/CVPR.2007.383008
Posted online: 2007-07-16 13:18:42.0
Abstract
This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a novel motion representation, “motons”, inspired by research in object recognition. Second, we propose learning the segmentation likelihood from the spatial context of motion. The learning is efficiently performed by Random Forests. The third contribution is a general taxonomy of tree-based classifiers, which facilitates theoretical and experimental comparisons of several known classification algorithms, as well as spawning new ones. Diverse visual cues such as motion, motion context, colour, contrast and spatial priors are fused together by means of a Conditional Random Field (CRF) model. Segmentation is then achieved by binary min-cut. Our algorithm requires no initialization. Experiments on many video-chat type sequences demonstrate the effectiveness of our algorithm in a variety of scenes. The segmentation results are comparable to those obtained by stereo systems.

April 15th, 2007 Irfan Essa Posted in Audio Analysis, Mitch Parry, Papers, Research No Comments »
Incorporating Phase Information for Source Separation via Spectrogram Factorization
Parry, R.M. Essa, I.
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA
This paper appears in: Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Publication Date: 15-20 April 2007
Volume: 2
On page(s): II-661 - II-664
Number of Pages: II-661 - II-664
Location: Honolulu, HI
ISSN: 1520-6149
ISBN: 1-4244-0728-1
INSPEC Accession Number:9497202
Digital Object Identifier: 10.1109/ICASSP.2007.366322
Posted online: 2007-06-04 10:15:41.0
Abstract
Spectrogram factorization methods have been proposed for single channel source separation and audio analysis. Typically, the mixture signal is first converted into a time-frequency representation such as the short-time Fourier transform (STFT). The phase information is thrown away and this spectrogram matrix is then factored into the sum of rank-one source spectrograms. This approach incorrectly assumes the mixture spectrogram is the sum of the source spectrograms. In fact, the mixture spectrogram depends on the phase of the source STFTs. We investigate the consequences of this common assumption and introduce an approach that leverages a probabilistic representation of phase to improve the separation results

October 23rd, 2006 Irfan Essa Posted in AAAI/IJCAI/UAI, Aaron Bobick, Activity Recognition, Aware Home, Papers, Raffay Hamid, Siddhartha Maddi No Comments »
- R. Hamid, S. Maddi, A. Bobick, I. Essa. “Unsupervised Analysis of Activity Sequences Using Event Motifs”, In proceedings of 4th ACM International Workshop on Video Surveillance and Sensor Networks (in conjunction with ACM Multimedia 2006).
Abstract
We present an unsupervised framework to discover characterizations of everyday human activities, and demonstrate how such representations can be used to extract points of interest in event-streams. We begin with the usage of Suffix Trees as an efficient activity-representation to analyze the global structural information of activities, using their local event statistics over the entire continuum of their temporal resolution. Exploiting this representation, we discover characterizing event-subsequences and present their usage in an ensemble-based framework for activity classification. Finally, we propose a method to automatically detect subsequences of events that are locally atypical in a structural sense. Results over extensive data-sets, collected from multiple sensor-rich environments are presented, to show the competence and scalability of the proposed framework.

October 15th, 2006 Irfan Essa Posted in Computational Photography and Video, Nick Diakopoulos, Papers, Research No Comments »
Diakopoulos, N. and Essa, I. (2006). Videotater: an approach for pen-based digital video segmentation and tagging. In Proceedings of the 19th Annual ACM Symposium on User interface Software and Technology (Montreux, Switzerland, October 15 - 18, 2006). UIST ‘06. ACM Press, New York, NY, 221-224. [DOI]
Abstract
The continuous growth of media databases necessitates development of novel visualization and interaction techniques to support management of these collections. We present Videotater, an
experimental tool for a Tablet PC that supports the efficient and intuitive navigation, selection, segmentation, and tagging of video. Our veridical representation immediately signals to the user where appropriate segment boundaries should be placed and allows for rapid review and refinement of manually or automatically generated segments. Finally, we explore a distribution of modalities in the interface by using multiple timeline representations, pressure sensing, and a tag painting/erasing metaphor with the pen.

October 14th, 2006 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »
Discovering Characteristic Actions from On-Body Sensor Data (IEEEXplore)
Minnen, D. Starner, T. Essa, I. Isbell, C.
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332 USA. dminn@cc.gatech.edu
This paper appears in: Wearable Computers, 2006 10th IEEE International Symposium on
Publication Date: Oct. 2006
On page(s): 11 - 18
Number of Pages: 11 - 18
Location: Montreux, Switzerland
ISSN: 1550-4816
ISBN: 1-4244-0598-x
Digital Object Identifier: 10.1109/ISWC.2006.286337
Posted online: 2007-01-22 09:58:15.0
Abstract
We present an approach to activity discovery, the unsupervised identification and modeling of human actions embedded in a larger sensor stream. Activity discovery can be seen as the inverse of the activity recognition problem. Rather than learn models from hand-labeled sequences, we attempt to discover motifs, sets of similar subsequences within the raw sensor stream, without the benefit of labels or manual segmentation. These motifs are statistically unlikely and thus typically correspond to important or characteristic actions within the activity. The problem of activity discovery differs from typicalmotif discovery, such as locating protein binding sites, because of the nature of time series data representing human activity. For example, in activity data, motifs will tend to be sparsely distributed, vary in length, and may only exhibit intra-motif similarity after appropriate time warping. In this paper, we motivate the activity discovery problem and present our approach for efficient discovery of meaningful actions from sensor data representing human activity. We empirically evaluate the approach on an exercise data set captured by a wrist-mounted, three-axis inertial sensor. Our algorithm successfully discovers motifs that correspond to the real exercises with a recall rate of 96.3% and overall accuracy of 86.7% over six exercises and 864 occurrences.

September 30th, 2006 Irfan Essa Posted in Computational Photography and Video, Mitch Parry, Papers, Research No Comments »
Experiences with optimizing two stream-based applications for cluster execution Angelov, Y., Ramachandran, U., Mackenzie, K., Rehg, J. M., and Essa, I. 2005. “Experiences with optimizing two stream-based applications for cluster execution”. J. Parallel Distrib. Comput. 65, 6 (Jun. 2005), 678-691. [DOI]
Abstract
We explore optimization strategies and resulting performance of two stream-based video applications, video texture and color tracker, on a cluster of SMPs. The two applications are representative of a class of emerging applications, which we call “stream-based applications”, that are sensitive to both latency of individual results and overall throughput. Such applications require non-trivial parallelization techniques in order to improve both latency and throughput, given that the stream data emanates from a limited set of sources (exactly one in the two applications studied) and that the distribution of the data cannot be done a priori.We suggest techniques that address in a coordinated fashion the problems of data distribution and work partitioning. We believe the two problems are related and need to be addressed together. We have parallelized two applications using the Stampede cluster programming system that provides abstractions for implementing time-and throughput-sensitive applications elegantly and efficiently. For the Video Textures application we show that we can achieve a speedup of 24.26 on a 112 processor cluster. For the Color Tracker application, where latency is more crucial, we identify the extent of data parallelism that ensures that the slowest member of the pipeline is no longer the bottleneck for achieving a decent frame rate.

June 14th, 2006 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, Papers, Research, Yifan Shi No Comments »
Learning Temporal Sequence Model from Partially Labeled Data (IEEEXplore)
Yifan Shi Bobick, A. Essa, I.
Georgia Institute Of Technology, Atalanta
This paper appears in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
Publication Date: 2006
Volume: 2
On page(s): 1631 - 1638
ISSN: 1063-6919
ISBN: 0-7695-2597-0
Digital Object Identifier: 10.1109/CVPR.2006.174
Posted online: 2006-10-09 11:11:21.0
Abstract
Graphical models are often used to represent and recognize activities. Purely unsupervised methods (such as HMMs) can be trained automatically but yield models whose internal structure - the nodes - are difficult to interpret semantically. Manually constructed networks typically have nodes corresponding to sub-events, but the programming and training of these networks is tedious and requires extensive domain expertise. In this paper, we propose a semi-supervised approach in which a manually structured, Propagation Network (a form of a DBN) is initialized from a small amount of fully annotated data, and then refined by an EM-based learning method in an unsupervised fashion. During node refinement (the M step) a boosting-based algorithm is employed to train the evidence detectors of individual nodes. Experiments on a variety of data types - vision and inertial measurements - in several tasks demonstrate the ability to learn from as little as one fully annotated example accompanied by a small number of positive but non-annotated training examples. The system is applied to both recognition and anomaly detection tasks.

June 14th, 2006 Irfan Essa Posted in Greg Turk, Modeling and Animation, Papers, Research No Comments »
Element-Free Elastic Models for Volume Fitting and Capture (IEEEXplore)
Jaeil Choi Szymczak, A. Turk, G. Essa, I.
Georgia Institute of Technology
This paper appears in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
Publication Date: 2006
Volume: 2
On page(s): 2245 - 2252
ISSN: 1063-6919
ISBN: 0-7695-2597-0
Digital Object Identifier: 10.1109/CVPR.2006.110
Posted online: 2006-10-09 11:11:24.0
Abstract
We present a new method of fitting an element-free volumetric model to a sequence of deforming surfaces of a moving object. Given a sequence of visual hulls, we iteratively fit an element-free elastic model to the visual hull in order to extract the optimal pose of the captured volume. The fitting of the volumetric model is acheived by minimizing a combination of elastic potential energy, a surface distance measure, and a self-intersection penalty for each frame. A unique aspect of our work is that the model is mesh free - since the model is represented as a point cloud, it is easy to construct, manipulate and update the model as needed. Additionally, linear elasicity with rotation compensation makes it possible to handle local deformations and large rotations of body parts much more efficiently than other volume fitting approaches. Our experimental results for volume fitting and capture in a multi-view camera setting demonstrate the robustness of element-free elastic models against noise and self-occlusions.

May 14th, 2006 Irfan Essa Posted in Audio Analysis, Mitch Parry, Papers, Research No Comments »
Source Detection Using Repetitive Structure (IEEEXplore)
Parry, R.M. Essa, I.
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA
This paper appears in: Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Publication Date: 14-19 May 2006
Volume: 4
On page(s): IV - IV
Number of Pages: IV - IV
Location: Toulouse
ISSN: 1520-6149
ISBN: 1-4244-0469-X
INSPEC Accession Number:9154520
Digital Object Identifier: 10.1109/ICASSP.2006.1661163
Posted online: 2006-09-18 09:38:57.0
Abstract
Blind source separation algorithms typically require that the number of sources are known in advance. However, it is often the case that the number of sources change over time and that the total number is not known. Existing source separation techniques require source number estimation methods to determine how many sources are active within the mixture signals. These methods typically operate on the covariance matrix of mixture recordings and require fewer active sources than mixtures. When sources do not overlap in the time-frequency domain, more sources than mixtures may be detected and then separated. However, separating more sources than mixtures when sources overlap in time and frequency poses a particularly difficult problem. This paper addresses the issue of source detection when more sources than sensors overlap in time and frequency. We show that repetitive structure in the form of time-time correlation matrices can reveal when each source is active

July 25th, 2005 Irfan Essa Posted in Aaron Bobick, Computational Photography and Video, Nipun Kwatra, Papers, Research, SIGGRAPH/SCA/NPAR/EG, Vivek Kwatra No Comments »
Vivek Kwatra, Irfan Essa, Aaron Bobick, and Nipun Kwatra (2005), “Texture optimization for example-based synthesis” In ACM Transactions on Graphics (TOG) Volume 24 , Issue 3 (July 2005) Proceedings of ACM SIGGRAPH 2005, Pages: 795 - 802, ISSN:0730-0301 (DOI|PDF|Project Site|Video|Talk)
ABSTRACT
We present a novel technique for texture synthesis using optimization. We define a Markov Random Field (MRF)-based similarity metric for measuring the quality of synthesized texture with respect to a given input sample. This allows us to formulate the synthesis problem as minimization of an energy function, which is optimized using an Expectation Maximization (EM)-like algorithm. In contrast to most example-based techniques that do region-growing, ours is a joint optimization approach that progressively refines the entire texture. Additionally, our approach is ideally suited to allow for controllable synthesis of textures. Specifically, we demonstrate controllability by animating image textures using flow fields. We allow for general two-dimensional flow fields that may dynamically change over time. Applications of this technique include dynamic texturing of fluid animations and texture-based flow visualization.

July 19th, 2005 Irfan Essa Posted in Computational Photography and Video, PhD, Thesis, Vivek Kwatra No Comments »
Vivek Kwatra (2005), “Example-based Rendering of Textural Phenomena”PhD Thesis, Georgia Institute of Technology, College of Computing (Advisors: Aaron Bobick, Irfan Essa) [URI], 19-Jul-2005
Abstract
This thesis explores synthesis by example as a paradigm for rendering real-world phenomena. In particular, phenomena that can be visually described as texture are considered. We exploit, for synthesis, the self-repeating nature of the visual elements constituting these texture exemplars. Techniques for unconstrained as well as constrained/controllable synthesis of both image and video textures are presented. For unconstrained synthesis, we present two robust techniques that can perform spatio-temporal extension, editing, and merging of image as well as video textures. In one of these techniques, large patches of input texture are automatically aligned and seamless stitched with each other to generate realistic looking images and videos. The second technique is based on iterative optimization of a global energy function that measures the quality of the synthesized texture with respect to the given input exemplar. We also present a technique for controllable texture synthesis. In particular, it allows for generation of motion-controlled texture animations that follow a specified flow field. Animations synthesized in this fashion maintain the structural properties like local shape, size, and orientation of the input texture even as they move according to the specified flow. We cast this problem into an optimization framework that tries to simultaneously satisfy the two (potentially competing) objectives of similarity to the input texture and consistency with the flow field. This optimization is a simple extension of the approach used for unconstrained texture synthesis. A general framework for example-based synthesis and rendering is also presented. This framework provides a design space for constructing example-based rendering algorithms. The goal of such algorithms would be to use texture exemplars to render animations for which certain behavioral characteristics need to be controlled. Our motion-controlled texture synthesis technique is an instantiation of this framework where the characteristic being controlled is motion represented as a flow field.

June 20th, 2005 Irfan Essa Posted in Activity Recognition, Aware Home, PAMI/ICCV/CVPR/ECCV, Papers, Yan Huang No Comments »
Tracking multiple objects through occlusions (IEEEXplore#)
Huang, Y. Essa, I.
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
Publication Date: 20-25 June 2005
Volume: 2
On page(s): 1051 - 1058 vol. 2
Number of Pages: 2 vol. (xxxvii 1216)
ISSN: 1063-6919
ISBN: 0-7695-2372-2
INSPEC Accession Number:8633324
Digital Object Identifier: 10.1109/CVPR.2005.350
Posted online: 2005-07-25 08:18:55.0
Abstract
We present an approach for tracking varying number of objects through both temporally and spatially significant occlusions. Our method builds on the idea of object permanence to reason about occlusions. To this end, tracking is performed at both the region level and the object level. At the region level, a customized genetic algorithm is used to search for optimal region tracks. This limits the scope of object trajectories. At the object level, each object is located based on adaptive appearance models, spatial distributions and inter-occlusion relationships. The proposed architecture is capable of tracking objects even in the presence of long periods of full occlusions. We demonstrate the viability of this approach by experimenting on several videos of a user interacting with a variety of objects on a desktop.

December 9th, 2004 Irfan Essa Posted in Computational Photography and Video, Drew Steedly, PhD, Thesis No Comments »
Drew Steedly (2004)“Rigid Partitioning Techniques for Efficiently Generating 3D Reconstructions from Images”PhD Thesis, Georgia Institute of Technology, College of Computing. (Advisor: Irfan Essa) [PDF] [URI]
Abstract

This thesis explores efficient techniques for generating 3D reconstructions from imagery. Non-linear optimization is one of the core techniques used when computing a reconstruction and is a computational bottleneck for large sets of images. Since non-linear optimization requires a good initialization to avoid getting stuck in local minima, robust systems for generating reconstructions from images build up the reconstruction incrementally. A hierarchical approach is to split up the images into small subsets, reconstruct each subset independently and then hierarchically merge the subsets. Rigidly locking together portions of the reconstructions reduces the number of parameters needed to represent them when merging, thereby lowering the computational cost of the optimization. We present two techniques that involve optimizing with parts of the reconstruction rigidly locked together. In the first, we start by rigidly grouping the cameras and scene features from each of the reconstructions being merged into separate groups. Cameras and scene features are then incrementally unlocked and optimized until the reconstruction is close to the minimum energy. This technique is most effective when the influence of the new measurements is restricted to a small set of parameters. Measurements that stitch together weakly coupled portions of the reconstruction, though, tend to cause deformations in the low error modes of the reconstruction and cannot be efficiently incorporated with the previous technique. To address this, we present a spectral technique for clustering the tightly coupled portions of a reconstruction into rigid groups. Reconstructions partitioned in this manner can closely mimic the poorly conditioned, low error modes, and therefore efficiently incorporate measurements that stitch together weakly coupled portions of the reconstruction. We explain how this technique can be used to scalably and efficiently generate reconstructions from large sets of images.

June 7th, 2004 Irfan Essa Posted in Computational Photography and Video, James Hays, Non-Photorealism, Papers, SIGGRAPH/SCA/NPAR/EG No Comments »
James Hays and Irfan Essa (2004) “Image and video based painterly animation” In Proceedings of the 3rd international symposium on Non-photorealistic animation and rendering (NPAR 2004), Annecy, France, June 7-9, 2004, pages, 113 - 120, ISBN:1-58113-887-3, 2004 (DOI|PDF|Project Web Site).
ABSTRACT
We present techniques for transforming images and videos into painterly animati
ons depicting different artistic styles. Our techniques rely on image and video analysis to compute appearance and motion properties. We also determine and apply motion information from different (user-specified) sources to static and moving images. These properties that encode spatio-temporal variations are then used to render (or paint) effects of selected styles to generate images and videos with a painted look. Painterly animations are generated using a mesh of brush stroke objects with dynamic spatio-temporal properties. Styles govern the behavior of these brush strokes as well as their rendering to a virtual canvas. We present methods for modifying the properties of these brush strokes according to the input images, videos, or motions. Brush stroke color, length, orientation, opacity, and motion are determined and the brush strokes are regenerated to fill the canvas as the video changes. All brush stroke properties are temporally constrained to guarantee temporally coherent non-photorealistic animations.
