Paper: MICCAI (2007) “A Boosted Segmentation Method for Surgical Workflow Analysis”

November 1st, 2007 Irfan Essa Posted in Activity Recognition, Health Systems, Papers, Research No Comments »

N. Padoy, T. Blum, I. Essa, H. Feußner, M.O. Berger, N. Navab A Boosted Segmentation Method for Surgical Workflow Analysis Proceedings of Medical Image Computing and Computer-Assisted Intervention (MICCAI 2007) (to appear), Brisbane, Australia, Oct. 29 - Nov. 2 2007 (bib)

Abstract

As demands on hospital efficiency increase, there is a stronger need for automatic analysis, recovery, and modification of surgical workflows. Even though most of the previous work has dealt with higher level and hospital-wide workflow including issues like document management, workflow is also an important issue within the surgery room. Its study has a high potential, e.g., for building context-sensitive operating rooms, evaluating and training surgical staff, optimizing surgeries and generating automatic reports. In this paper we propose an approach to segment the surgical workflow into phases based on temporal synchronization of multidimensional state vectors. Our method is evaluated on the example of laparoscopic cholecystectomy with state vectors representing tool usage during the surgeries. The discriminative power of each instrument in regard to each phase is estimated using AdaBoost. A boosted version of the Dynamic Time Warping (DTW) algorithm is used to create a surgical reference model and to segment a newly observed surgery. Full cross-validation on ten surgeries is performed and the method is compared to standard DTW and to Hidden Markov Models.

AddThis Social Bookmark Button

Paper: Ergonomics in Design (2007), “Designing a Technology Coach”

October 29th, 2007 Irfan Essa Posted in A. Dan Fisk, Activity Recognition, Aware Home, Papers, Wendy Rogers No Comments »

RogerEssaFisk IconFEATURE AT A GLANCE: Technology in the home environment has the potential to support older adults in a variety of ways. We took an interdisciplinary approach (human factors/ergonomics and computer science) to develop a technology “coach” that could support older adults in learning to use a medical device. Our system provided a computer vision system to track the use of a blood glucose meter and provide users with feedback if they made an error. This research could support the development of an in-home personal assistant to coach individuals in a variety of tasks necessary for independent living.

KEYWORDS: home technology, medical devices, support for learning

AddThis Social Bookmark Button

Paper: IEEE Data Mining Conference 2007 “Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery”

October 28th, 2007 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »

D. Minnen, I. Essa, C.L. Isbell, and T. Starner “Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery” In IEEE Int. Conf. on Data Mining (ICDM) 2007, Omaha, NE, October 28-31, 2007. [PDF]

Abstract

ICDMPaper Discovering recurring patterns in time series data is a fundamental problem for temporal data mining. This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the corresponding relevant dimensions. While many approaches to motif discovery have been developed, most are restricted to categorical data, univariate time series, or multivariate data in which the temporal patterns span all of the dimensions. In this paper, we present an expected linear-time algorithm that addresses a generalization of multivariate pattern discovery in which each motif may span only a subset of the dimensions. To validate our algorithm, we discuss its theoretical properties and empirically evaluate it using several data sets including synthetic data and motion capture data collected by an on-body inertial sensor.

AddThis Social Bookmark Button

Paper: ICCV 2007, “Structure from Statistics - Unsupervised Activity Analysis using Suffix Trees”

October 15th, 2007 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, PAMI/ICCV/CVPR/ECCV, Papers, Raffay Hamid No Comments »

Abstract

Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational complexity, and ability to capture activity structure only up to some fixed temporal scale. In this work, we propose Suffix Trees as an activity representation to efficiently extract structure of activities by analyzing their constituent event-subsequences over multiple temporal scales. We empirically compare Suffix Trees with some of the previous approaches in terms of feature cardinality, discriminative prowess, noise sensitivity and activity-class discovery. Finally, exploiting properties of Suffix Trees, we present a novel perspective on anomalous subsequences of activities, and propose an algorithm to detect them in linear-time. We present comparative results over experimental data, collected from a kitchen environment to demonstrate the competence of our proposed framework.

AddThis Social Bookmark Button

Paper: AAAI 2007: “Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning”

August 24th, 2007 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »

Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning

Abstract

The problem of locating motifs in real-valued, multivariate time series data involves the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several non-overlapping subsequences and constitutes a motif because all of the included subsequences are similar. The ability to automatically discover such motifs allows intelligent systems to form endogenously meaningful representations of their environment through unsupervised sensor analysis. In this paper, we formulate a unifying view of motif discovery as a problem of locating regions of high density in the space of all time series subsequences. Our approach is efficient (sub-quadratic in the length of the data), requires fewer user-specified parameters than previous methods, and naturally allows variable length motif occurrences and nonlinear temporal warping. We evaluate the performance of our approach using four data sets from different domains including on-body inertial sensors and speech.

AddThis Social Bookmark Button

Paper: ACM IWVSSN (2006) “Unsupervised Analysis of Activity Sequences Using Event Motifs”

October 23rd, 2006 Irfan Essa Posted in AAAI/IJCAI/UAI, Aaron Bobick, Activity Recognition, Aware Home, Papers, Raffay Hamid, Siddhartha Maddi No Comments »

  • R. Hamid, S. Maddi, A. Bobick, I. Essa. “Unsupervised Analysis of Activity Sequences Using Event Motifs”, In proceedings of 4th ACM International Workshop on Video Surveillance and Sensor Networks (in conjunction with ACM Multimedia 2006).

Abstract

We present an unsupervised framework to discover characterizations of everyday human activities, and demonstrate how such representations can be used to extract points of interest in event-streams. We begin with the usage of Suffix Trees as an efficient activity-representation to analyze the global structural information of activities, using their local event statistics over the entire continuum of their temporal resolution. Exploiting this representation, we discover characterizing event-subsequences and present their usage in an ensemble-based framework for activity classification. Finally, we propose a method to automatically detect subsequences of events that are locally atypical in a structural sense. Results over extensive data-sets, collected from multiple sensor-rich environments are presented, to show the competence and scalability of the proposed framework.

AddThis Social Bookmark Button

Paper: IEEE ISWC (2006) “Discovering Characteristic Actions from On-Body Sensor Data”

October 14th, 2006 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »

Discovering Characteristic Actions from On-Body Sensor Data (IEEEXplore)

Minnen, D. Starner, T. Essa, I. Isbell, C.
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332 USA. dminn@cc.gatech.edu
This paper appears in: Wearable Computers, 2006 10th IEEE International Symposium on
Publication Date: Oct. 2006
On page(s): 11 - 18
Number of Pages: 11 - 18
Location: Montreux, Switzerland
ISSN: 1550-4816
ISBN: 1-4244-0598-x
Digital Object Identifier: 10.1109/ISWC.2006.286337
Posted online: 2007-01-22 09:58:15.0

Abstract

We present an approach to activity discovery, the unsupervised identification and modeling of human actions embedded in a larger sensor stream. Activity discovery can be seen as the inverse of the activity recognition problem. Rather than learn models from hand-labeled sequences, we attempt to discover motifs, sets of similar subsequences within the raw sensor stream, without the benefit of labels or manual segmentation. These motifs are statistically unlikely and thus typically correspond to important or characteristic actions within the activity. The problem of activity discovery differs from typicalmotif discovery, such as locating protein binding sites, because of the nature of time series data representing human activity. For example, in activity data, motifs will tend to be sparsely distributed, vary in length, and may only exhibit intra-motif similarity after appropriate time warping. In this paper, we motivate the activity discovery problem and present our approach for efficient discovery of meaningful actions from sensor data representing human activity. We empirically evaluate the approach on an exercise data set captured by a wrist-mounted, three-axis inertial sensor. Our algorithm successfully discovers motifs that correspond to the real exercises with a recall rate of 96.3% and overall accuracy of 86.7% over six exercises and 864 occurrences.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2006) “Learning Temporal Sequence Model from Partially Labeled Data”

June 14th, 2006 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, Papers, Research, Yifan Shi No Comments »

Learning Temporal Sequence Model from Partially Labeled Data (IEEEXplore)

Yifan Shi Bobick, A. Essa, I.
Georgia Institute Of Technology, Atalanta
This paper appears in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
Publication Date: 2006
Volume: 2
On page(s): 1631 - 1638
ISSN: 1063-6919
ISBN: 0-7695-2597-0
Digital Object Identifier: 10.1109/CVPR.2006.174
Posted online: 2006-10-09 11:11:21.0

Abstract

Graphical models are often used to represent and recognize activities. Purely unsupervised methods (such as HMMs) can be trained automatically but yield models whose internal structure - the nodes - are difficult to interpret semantically. Manually constructed networks typically have nodes corresponding to sub-events, but the programming and training of these networks is tedious and requires extensive domain expertise. In this paper, we propose a semi-supervised approach in which a manually structured, Propagation Network (a form of a DBN) is initialized from a small amount of fully annotated data, and then refined by an EM-based learning method in an unsupervised fashion. During node refinement (the M step) a boosting-based algorithm is employed to train the evidence detectors of individual nodes. Experiments on a variety of data types - vision and inertial measurements - in several tasks demonstrate the ability to learn from as little as one fully annotated example accompanied by a small number of positive but non-annotated training examples. The system is applied to both recognition and anomaly detection tasks.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2005) “Tracking multiple objects through occlusions”

June 20th, 2005 Irfan Essa Posted in Activity Recognition, Aware Home, PAMI/ICCV/CVPR/ECCV, Papers, Yan Huang No Comments »

Tracking multiple objects through occlusions (IEEEXplore#)

Huang, Y. Essa, I.
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
Publication Date: 20-25 June 2005
Volume: 2
On page(s): 1051 - 1058 vol. 2
Number of Pages: 2 vol. (xxxvii 1216)
ISSN: 1063-6919
ISBN: 0-7695-2372-2
INSPEC Accession Number:8633324
Digital Object Identifier: 10.1109/CVPR.2005.350
Posted online: 2005-07-25 08:18:55.0

Abstract

We present an approach for tracking varying number of objects through both temporally and spatially significant occlusions. Our method builds on the idea of object permanence to reason about occlusions. To this end, tracking is performed at both the region level and the object level. At the region level, a customized genetic algorithm is used to search for optimal region tracks. This limits the scope of object trajectories. At the object level, each object is located based on adaptive appearance models, spatial distributions and inter-occlusion relationships. The proposed architecture is capable of tracking objects even in the presence of long periods of full occlusions. We demonstrate the viability of this approach by experimenting on several videos of a user interacting with a variety of objects on a desktop.

AddThis Social Bookmark Button

Talk at USC’s IRIS (2004): “Temporal Reasoning from Video to Temporal Synthesis of Video”

October 30th, 2004 Irfan Essa Posted in Activity Recognition, Aware Home, Computational Photography and Video, Presentations No Comments »

Temporal Reasoning from Video to Temporal Synthesis of Video

Abstract

In this talk, I will present some ongoing work on extracting spatio-temporal cues from video for both synthesis of novel video sequences, and recognition of complex activities. I will start off with some of our earlier work on Video Textures, where repeating information is extracted to generate extended sequences of videos. I will then describe some of our extensions to this approach that allow for controlled generation of animations of video sprites. We have developed various learning and optimization techniques that allow for video-based animations of photo-realistic characters. Then I will describe our new approach for image and video synthesis that builds on optimal patch-based copying of samples. I will show how our method allows for iterative refinement and extends to synthesis of both images and video from very limited samples. In the next part of my talk, I will describe how a similar analysis of video can be used to recognize what a person is doing in a scene. Such an analysis of video, aimed at recognition, requires more contextual information about the environment. I will show how we leverage contextual information shared between actions and objects to recognize what is happening in complex environments. I will also show that by adding some form of grammar (we use Stochastic Context Free Grammar) we can recognize very complex, multi-tasked activities.

 

If time permits, I will describe (very briefly) the Aware Home project at Georgia Tech, which is one primary area of ongoing and future research for me and my group. Further information on my work with videos is available from my webpage at http://www.cc.gatech.edu/~irfan

AddThis Social Bookmark Button

Paper: IEEE CVPR (2004) “Propagation networks for recognition of partially ordered sequential action”

June 2nd, 2004 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, David Minnen, Papers, Yan Huang, Yifan Shi No Comments »

Propagation networks for recognition of partially ordered sequential action (IEEEXplore)

Yifan Shi Yan Huang Minnen, D. Bobick, A. Essa, I.
GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA
This paper appears in: Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
Publication Date: 27 June-2 July 2004
Volume: 2
On page(s): II-862 - II-869 Vol.2
Number of Pages: 2001
ISSN: 1063-6919
ISBN: 0-7695-2158-4
INSPEC Accession Number:8161557
Digital Object Identifier: 10.1109/CVPR.2004.1315255
Posted online: 2004-07-19 11:09:30.0

Abstract

We present propagation networks (P-nets), a novel approach for representing and recognizing sequential activities that include parallel streams of action. We represent each activity using partially ordered intervals. Each interval is restricted by both temporal and logical constraints, including information about its duration and its temporal relationship with other intervals. P-nets associate one node with each temporal interval. Each node is triggered according to a probability density function that depends on the state of its parent nodes. Each node also has an associated observation function that characterizes supporting perceptual evidence. To facilitate real-time analysis, we introduce a particle filter framework to explore the conditional state space. We modify the original condensation algorithm to more efficiently sample a discrete state space (D-condensation). Experiments in the domain of blood glucose monitor calibration demonstrate both the representational power of P-nets and the effectiveness of the D-condensation algorithm.

AddThis Social Bookmark Button

Thesis: Gabriel Brostow’s PhD (2004): “Novel Skeletal Representation for Articulated Creatures”

April 9th, 2004 Irfan Essa Posted in Activity Recognition, Gabriel Brostow, Modeling and Animation, Research, Thesis No Comments »

Gabriel Brostow (2004), “Novel Skeletal Representation for Articulated Creatures” PhD Thesis, Georgia Institute of Technology, College of Computing. (Advisor: Irfan Essa) [PDF] [URI]AbstractThis research examines an approach for capturing 3D surface and structural data of moving articulated creatures. Given the task of non-invasively and automatically capturing such data, a methodology and the associated experiments are presented, that apply to multiview videos of the subjects motion. Our thesis states: A functional structure and the timevarying surface of an articulated creature subject are contained in a sequence of its 3D data. A functional structure is one example of the possible arrangements of internal mechanisms (kinematic joints, springs, etc.) that is capable of performing the motions observed in the input data. Volumetric structures are frequently used as shape descriptors for 3D data. The capture of such data is being facilitated by developments in multi-view video and range scanning, extending to subjects that are alive and moving. In this research, we examine vision-based modeling and the related representation of moving articulated creatures using Spines. We define a Spine as a branching axial structure representing the shape and topology of a 3D objects limbs, and capturing the limbs correspondence and motion over time. The Spine concept builds on skeletal representations often used to describe the internal structure of an articulated object and the significant protrusions. Our representation of a Spine provides for enhancements over a 3D skeleton. These enhancements form temporally consistent limb hierarchies that contain correspondence information about real motion data. We present a practical implementation that approximates a Spines joint probability function to reconstruct Spines for synthetic and real subjects that move. In general, our approach combines the objectives of generalized cylinders, 3D scanning, and markerless motion capture to generate baseline models from real puppets, animals, and human subjects.

AddThis Social Bookmark Button

Talk: Invited Speaker at CMU’s Robotics Institute (2002): “Temporal Reasoning from Video to Temporal Synthesis of Video”

February 12th, 2002 Irfan Essa Posted in Activity Recognition, Aware Home, Computational Photography and Video, Presentations No Comments »

Temporal Reasoning from Video to Temporal Synthesis of Video

Abstract

In this talk, I will present some ongoing work on extracting spatio-temporal cues from video for both synthesis of novel video sequences, and recognition of complex activities. First I will discuss (in brief) our work on Video Textures, where repeating information is extracted to generate extended sequences of videos. I will then describe some our extensions to this approach that allows for controlled generation of animations of video sprites. We have developed various learning and optimization techniques that allow for video-based animations of photo-realistic characters. Then I will describe our new approach for image and video synthesis that builds on optimal patch-based copying of samples. I will show how our method allows for iterative refinement and extend to synthesis of both images and video from very limited samples. In the next part of my talk, I will describe how a similar analysis of video can be used to recognize what a person is doing in a scene. Such an analysis of video, aimed at recognition, requires more contextual information about the environment. I will show how we leverage off contextual information shared between actions and objects to recognize what is happening in complex environments. I will also show that by adding some form of grammar (we use Stochastic Context Free Grammar) we can recognize very complex, multi-tasked activities. Finally, I will describe (very briefly) the Aware Home project at Georgia Tech, which is one primary area of ongoing and future research for me and my group. Further information on my work with videos is available from my webpage at http://www.cc.gatech.edu/~irfan

AddThis Social Bookmark Button

Funding: NSF (1998) Experimental Software Systems “Automated Understanding of Captured Experience”

September 1st, 1998 Irfan Essa Posted in Activity Recognition, Audio Analysis, Aware Home, Funding, Gregory Abowd, Intelligent Environments No Comments »

Award#9806822 - Experimental Software Systems: Automated Understanding of Captured Experience
ABSTRACT

9806822 Essa, Irfan A. Abowd, Gregory D. Georgia Institute of Technology Experimental Software Systems: Automated Understanding of Captured Experience The objective of this research is to reduce substantially the human input necessary for creating and accessing large collections of multimedia, particularly multimedia created by capturing what is happening in an environment. The existing software system which is being used as the starting point for this investigation is Classroom 2000, a system designed to capture what happens in classrooms, meetings, and offices. Classroom 2000 integrates and synchronizes multiple streams of captured text, images, handwritten annotations, audio, and video. In a sense, it automates note-taking for a lecture or meeting. The research challenge is to make sense of this flood of captured data. The project explores how the output of Classroom 2000 can be automatically structured, segmented, indexed, and linked. Machine learning and statistical approaches to language are used to attempt to understand the captured data. Techniques from computational perception are used to try to find structure in the captured data. An important component of this research is the experimental analysis of the software system being built. The expectation is that this research will have a dramatic impact on how humans work and learn, as technology aids humans by capturing and making accessible what happens in an environment.

AddThis Social Bookmark Button

Paper: IEEE PAMI (1996) “Task-specific gesture analysis in real-time using interpolated views”

December 14th, 1996 Irfan Essa Posted in Activity Recognition, Face and Gesture, PAMI/ICCV/CVPR/ECCV, Papers, Research, Sandy Pentland No Comments »

Darrell, T.J.; Essa, I.A.; Pentland, A.P., “Task-specific gesture analysis in real-time using interpolated views” Transactions on Pattern Analysis and Machine Intelligence , vol.18, no.12, pp.1236-1242, Dec 1996
URL: [ieeexplore.ieee.org] [DOI]

Abstract

Hand and face gestures are modeled using an appearance-based approach in which patterns are represented as a vector of similarity scores to a set of view models defined in space and time. These view models are learned from examples using unsupervised clustering techniques. A supervised teaming paradigm is then used to interpolate view scores into a task-dependent coordinate system appropriate for recognition and control tasks. We apply this analysis to the problem of context-specific gesture interpolation and recognition, and demonstrate real-time systems which perform these tasks

AddThis Social Bookmark Button