Paper in CVPR (2010): “Motion Field to Predict Play Evolution in Dynamic Sport Scenes

June 13th, 2010 Irfan Essa Posted in Activity Recognition, Jessica Hodgins, Kihwan Kim, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, Sports Visualization No Comments »

Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa (2010) “Motion Field to Predict Play Evolution in Dynamic Sport Scenes” in Proceedings of IEEE Computer Vision and Pattern Recognition Conference (CVPR), San Francisco, CA, USA, June 2010 [PDF][Website][DOI][Video (Youtube)].

Abstract

Videos of multi-player team sports provide a challenging domain for dynamic scene analysis. Player actions and interactions are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. We start by extracting the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. We evaluate our approach by analyzing videos of a variety of complex soccer plays.

CVPR 2010 Paper on Play Evolution

AddThis Social Bookmark Button

Paper in CVPR (2010): “Discontinuous Seam-Carving for Video Retargeting”

June 13th, 2010 Irfan Essa Posted in Computational Photography and Video, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra No Comments »

Matthias GrundmannVivek KwatraMei HanIrfan Essa (2010) “Discontinuous Seam-Carving for Video Retargeting” in Proceedings of IEEE Computer Vision and Pattern Recognition Conference (CVPR), San Francisco, CA, USA, June 2010 [PDF][Website][DOI][Video (Youtube)].

Abstract

We introduce a new algorithm for video retargeting that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. This formulation optimizes the difference in appearance of the resultant retargeted frame to the optimal temporally coherent one, and allows for carving around fast moving salient regions.

Additionally, we generalize the idea of appearance-based coherence to the spatial domain by introducing piece-wise spatial seams. Our spatial coherence measure minimizes the change in gradients during retargeting, which preserves spatial detail better than minimization of color difference alone. We also show that per-frame saliency (gradient- based or feature-based) does not always produce desirable retargeting results and propose a novel automatically computed measure of spatio-temporal saliency. As needed, a user may also augment the saliency by interactive region-brushing. Our retargeting algorithm processes the video sequentially, making it conducive for streaming applications.

Examples from our CVPR 2010 Paper

AddThis Social Bookmark Button

Paper in CVPR (2010): “Efficient Hierarchical Graph-Based Video Segmentation

June 13th, 2010 Irfan Essa Posted in Computational Photography and Video, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Vivek Kwatra No Comments »

Matthias GrundmannVivek KwatraMei Han, Irfan Essa (2010) “Efficient Hierarchical Graph-Based Video Segmentation” in Proceedings of IEEE Computer Vision and Pattern Recognition Conference (CVPR), San Francisco, CA, USA, June 2010 [PDF][Website][DOI][Video (Youtube)].

Abstract

We present an efficient and scalable technique for spatio- temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over- segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subse- quent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph.

We also propose two novel approaches to improve the scalability of our technique: (a) a parallel out- of-core algorithm that can process volumes much larger than an in-core algorithm, and (b) a clip-based process- ing algorithm that divides the video into overlapping clips in time, and segments them successively while enforcing consistency.

We demonstrate hierarchical segmentations on video shots as long as 40 seconds, and even support a streaming mode for arbitrarily long videos, albeit without the ability to process them hierarchically.

VideoSegmentation Teaser

AddThis Social Bookmark Button

Paper in CVPR (2010): “Player Localization Using Multiple Static Cameras for Sports Visualization”

June 13th, 2010 Irfan Essa Posted in Activity Recognition, Jessica Hodgins, Kihwan Kim, Matthias Grundmann, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Raffay Hamid, Sports Visualization No Comments »

Raffay Hamid, Ram Krishan Kumar, Matthias Grundmann, Kihwan Kim, Irfan Essa, Jessica Hodgins (2010), “Player Localization Using Multiple Static Cameras for Sports Visualization” In Proceedings of IEEE Computer Vision and Pattern Recognition Conference (CVPR), San Francisco, CA, USA, June 2010 [PDF][Website][DOI][Video (Youtube)].

Abstract

We present a novel approach for robust localization of multiple people observed using multiple cameras. We usethis location information to generate sports visualizations,which include displaying a virtual offside line in soccer games, and showing players’ positions and motion patterns.Our main contribution is the modeling and analysis for the problem of fusing corresponding players’ positional informationas finding minimum weight K-length cycles in complete K-partite graphs. To this end, we use a dynamic programmingbased approach that varies over a continuum of being maximally to minimally greedy in terms of the numberof paths explored at each iteration. We present an end-to-end sports visualization framework that employs our proposed algorithm-class. We demonstrate the robustness of our framework by testing it on 60; 000 frames of soccerfootage captured over 5 different illumination conditions, play types, and team attire.

Teaser Image from CVPR 2010 paper

AddThis Social Bookmark Button

CVPR 2010: Accepted Papers

April 1st, 2010 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Jessica Hodgins, Kihwan Kim, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra No Comments »

We have the following 4 papers that have been accepted for publications in IEEE CVPR 2010. More details forthcoming, with links to more details.
  • Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan Essa (2010) “Discontinuous Seam-Carving for Video Retargeting” (a GA Tech, Google Collaboration)
  • Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan Essa (2010) “Efficient Hierarchical Graph-Based Video Segmentation” (a GA Tech, Google Collaboration)
  • Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, and Irfan Essa (2010) “Motion Fields to Predict Play Evolution in Dynamic Sport Scenes” (a GA Tech, Disney Collaboration)
  • Raffay Hamid, Ramkrishan Kumar, Matthias Grundmann, Kihwan Kim, Irfan Essa, and Jessica Hodgins (2010) “Player Localization Using Multiple Static Cameras for Sports Visualization” (a GA Tech, Disney Collaboration)
AddThis Social Bookmark Button

Paper in Advanced Robotics (2009): “Human Action Recognition Using Global Point Feature Histograms and Action Shapes”

October 29th, 2009 Irfan Essa Posted in Activity Recognition, Franzi Meier, Intelligent Environments, Michael Beetz, Papers No Comments »

Radu Bogdan Rusu, Jan Bandouch, Franziska Meier, Irfan Essa and Michael Beetz (2009) “Human Action Recognition Using Global Point Feature Histograms and Action Shapes”, in Journal of Advanced Robotics, volume 23, pages 1873–1908, Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009. [ DOI | PDF]

Abstract

This paper investigates the recognition of human actions from three-dimensional (3-D) point clouds that encode the motions of people acting in sensor-distributed indoor environments. Data streams are time sequences of silhouettes extracted from cameras in the environment. From the 2-D silhouette contours we generate space–time streams by continuously aligning and stacking the contours along the time axis as third spatial dimension. The space–time stream of an observation sequence is segmented into parts corresponding to subactions using a pattern matching technique based on suffix trees and interval scheduling. Then, the segmented space–time shapes are processed by treating the shapes as 3-D point clouds and estimating global point feature histograms for them. The resultant models are clustered using statistical analysis and our experimental results indicate that the presented methods robustly derive different action classes. This holds despite large intra-class variance in the recorded datasets due to performances from different persons at different time intervals.

© Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009

Overview of the approach.

Overview of the approach.

Keywords: Action recognition, point cloud, global features, action segmentation

AddThis Social Bookmark Button

Paper ISMAR 2009 (IEEE International Symposium on Mixed and Augmented Reality): “Augmenting Aerial Earth Maps with Dynamic Information”

October 20th, 2009 Irfan Essa Posted in Computational Journalism, Computational Photography and Video, Kihwan Kim, Modeling and Animation, Papers No Comments »

Kihwan Kim, Sangmin Oh, Jeonggyu Lee and Irfan Essa (2009), “Augmenting Aerial Earth Maps with Dynamic Information,” In Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Orlando, FL, USA, October 2009 [Project Site, Video (AVI/DiVX), Video (Youtube) Paper (pdf)].

Abstract

We introduce methods for augmenting aerial visualizations of Earth (from tools such as Google Earth or Microsoft Virtual Earth) with dynamic information obtained from videos. Our goal is to make Augmented Earth Maps that visualize the live broadcast of dynamic sceneries within a city. We propose different approaches to analyze videos of pedestrians and cars, under differing conditions and then augment Aerial Earth Maps (AEMs) with live and dynamic information. We also analyze natural phenomenon (clouds) and project information from these to the AEMs to add the visual reality.

AddThis Social Bookmark Button

In the News (2009): CNN.com “Augmenting Earth Maps”

October 13th, 2009 Irfan Essa Posted in In The News, Kihwan Kim No Comments »

Video – Breaking News Videos from CNN.com.

Check out the media coverage of our new paper to appear in ISMAR 2009, in October.

Also see

  • “Latest videos makes Google Earth cities bustle” New Scientist (Sep 30, 2009 Issue)
  • “Video: Google Earth animated with real time human and vehicular traffic” Endgadget (Sep 30, 2009)
AddThis Social Bookmark Button

Paper in Artificial Intelligence (2009): “A novel sequence representation for unsupervised analysis of human activities”

September 20th, 2009 Irfan Essa Posted in AAAI/IJCAI/UAI, Aaron Bobick, Activity Recognition, Charles Isbell, Papers, Raffay Hamid, Siddhartha Maddi No Comments »

Raffay Hamid, Siddhartha Maddi, Amos Johnson, Aaron Bobick, Irfan Essa and Charles Isbell (2009) “A novel sequence representation for unsupervised analysis of human activities” in Artificial Intelligence, Volume 173, Issue 14, September 2009, Pages 1221-1244. [PDF][DOI][Science Direct]

Abstract

Formalizing computational models for everyday human activities remains an open challenge. Many previous approaches towards this end assume prior knowledge about the structure of activities, using which explicitly defined models are learned in a completely supervised manner. For a majority of everyday environments however, the structure of the in situ activities is generally not known a priori. In this paper we investigate knowledge representations and manipulation techniques that facilitate learning of human activities in a minimally supervised manner. The key contribution of this work is the idea that global structural information of human activities can be encoded using a subset of their local event subsequences, and that this encoding is sufficient for activity-class discovery and classification.

In particular, we investigate modeling activity sequences in terms of their constituent subsequences that we call event n-grams. Exploiting this representation, we propose a computational framework to automatically discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding characterizations of these discovered classes from a holistic as well as a by-parts perspective. Using such characterizations, we present a method to classify a new activity to one of the discovered activity-classes, and to automatically detect whether it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our approach in a variety of everyday environments.

Keywords: Temporal reasoning; Scene analysis; Computer vision

Hamid et al AIJ Paper

AddThis Social Bookmark Button

Paper (2009) In IEEE Transactions on Visualization and CG “Fluid Simulation with Articulated Bodies”

June 10th, 2009 Irfan Essa Posted in Greg Turk, Modeling and Animation, Nipun Kwatra No Comments »

Nipun Kwatra, Chris Wojtan, Mark Carlson, Irfan A. Essa, Peter J. Mucha, Greg Turk (2009), “Fluid Simulation with Articulated Bodies“, IEEE Transactions on Visualization and Computer Graphics, 10 Jun. 2009. IEEE computer Society Digital Library. IEEE Computer Society. [DOI | PDF (see copyright) | Video | Website]

Abstract

We present an algorithm for creating realistic animations of characters that are swimming through fluids. Our approach combines dynamic simulation with data-driven kinematic motions (motion capture data) to produce realistic animation in a fluid. The interaction of the articulated body with the fluid is performed by incorporating joint constraints with rigid animation and by extending a solid/fluid coupling method to handle articulated chains. Our solver takes as input the current state of the simulation and calculates the angular and linear accelerations of the connected bodies needed to match a particular motion sequence for the articulated body. These accelerations are used to estimate the forces and torques that are then applied to each joint. Based on this approach, we demonstrate simulated swimming results for a variety of different strokes, including crawl, backstroke, breaststroke and butterfly. The ability to have articulated bodies interact with fluids also allows us to generate simulations of simple water creatures that are driven by simple controllers.

teaser

AddThis Social Bookmark Button

Paper (2009) ACM CHI: “Videolyzer: Quality Analysis of Online Informational Video for Bloggers and Journalists”

March 4th, 2009 Irfan Essa Posted in ACM UIST/CHI, Computational Journalism, Computational Photography and Video, Nick Diakopoulos No Comments »

N. Diakopoulos, S. Goldenberg, I. Essa (2009). “Videolyzer: Quality Analysis of Online Informational Video for Bloggers and Journalists.” ACM Conference on Human Factors in Computing Systems (CHI). April, 2009. [PDF] [Project Site] [Video(CHI 2009 – Digital Life New World – CHI 2009 Advance Program)

Abstract

Screen Shot of Videolyzer

Tools to aid people in making sense of the information quality of online informational video are essential for media consumers seeking to be well informed. Our application, Videolyzer, addresses the information quality problem in video by allowing politically motivated bloggers or journalists to analyze, collect, and share criticisms of the information quality of online political videos. Our interface innovates by providing a fine-grained and tightly coupled interaction paradigm between the timeline, the time-synced transcript, and annotations. We also incorporate automatic textual and video content analysis to suggest areas of interest for further assessment by a person. We present an evaluation of Videolyzer looking at the user experience, usefulness, and behavior around the novel features of the UI as well as report on the collaborative dynamic of the discourse generated with the tool.
AddThis Social Bookmark Button

Paper (2009) In ACM Symposium on Interactive 3D Graphics “Human Video Textures”

March 1st, 2009 Irfan Essa Posted in ACM SIGGRAPH, Computational Photography and Video, James Rehg, Matt Flagg, Modeling and Animation, Papers, Sing Bing Kang No Comments »

Matthew FlaggAtsushi Nakazawa, Qiushuang Zhang, Sing Bing Kang, Young Kee Ryu, Irfan EssaJames M. Rehg (2009), Human Video Textures In Proceedings of the ACM Symposium on Interactive 3D Graphics and Games 2009 (I3D ’09), Boston, MA, February 27-March 1 (Fri-Sun), 2009 [PDF (see Copyright) | Video in DiVx | Website ]

Abstract

This paper describes a data-driven approach for generating photorealistic animations of human motion. Each animation sequence follows a user-choreographed path and plays continuously by seamlessly transitioning between different segments of the captured data. To produce these animations, we capitalize on the complementary characteristics of motion capture data and video. We customize our capture system to record motion capture data that are synchronized with our video source. Candidate transition points in video clips are identified using a new similarity metric based on 3-D marker trajectories and their 2-D projections into video. Once the transitions have been identified, a video-based motion graph is constructed. We further exploit hybrid motion and video data to ensure that the transitions are seamless when generating animations. Motion capture marker projections serve as control points for segmentation of layers and nonrigid transformation of regions. This allows warping and blending to generate seamless in-between frames for animation. We show a series of choreographed animations of walks and martial arts scenes as validation of our approach.

Example Image from Project

Human Video Textures (Output Rendered as a Collage!)

AddThis Social Bookmark Button

Paper (2009): ICASSP “Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection”

February 4th, 2009 Irfan Essa Posted in Face and Gesture, Funding, ICASSP, James Rehg, NSF (0205507), Numerical Machine Learning, Pei Yin, Thad Starner No Comments »

Pei Yin, Thad Starner, Harley Hamilton, Irfan Essa, James M. Rehg (2009), ”Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection” in IEEE Conference on Acoustics, Speech, and Signal Processing 2009 (ICASSP 2009). Session: Spoken Language Understanding I, Tuesday, April 21, 11:00 – 13:00, Taipei, Taiwan.

ABSTRACT

The natural language for most deaf signers in the United States is American Sign Language (ASL). ASL has internal structure like spoken languages, and ASL linguists have introduced several phonemic models. The study of ASL phonemes is not only interesting to linguists, but also useful for scalability in recognition by machines. Since machine perception is different than human perception, this paper learns the basic units for ASL directly from data. Comparing with previous studies, our approach computes a set of data-driven units (fenemes) discriminatively from the results of segmental feature selection. The learning iterates the following two steps: first apply discriminative feature selection segmentally to the signs, and then tie the most similar temporal segments to re-train. Intuitively, the sign parts indistinguishable to machines are merged to form basic units, which we call ASL fenemes. Experiments on publicly available ASL recognition data show that the extracted data-driven fenemes are meaningful, and recognition using those fenemes achieves improved accuracy at reduced model complexity

AddThis Social Bookmark Button

Paper: ICPR (2008) “3D Shape Context and Distance Transform for Action Recognition”

December 8th, 2008 Irfan Essa Posted in Activity Recognition, Aware Home, Face and Gesture, Franzi Meier, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers 1 Comment »

M. Grundmann, F. Meier, and I. Essa (2008) “3D Shape Context and Distance Transform for Action Recognition”, In Proceedings of International Conference on Pattern Recognition (ICPR) 2008, Tampa, FL. [Project Page | DOI | PDF]

ABSTRACT

We propose the use of 3D (2D+time) Shape Context to recognize the spatial and temporal details inherent in human actions. We represent an action in a video sequence by a 3D point cloud extracted by sampling 2D silhouettes over time. A non-uniform sampling method is introduced that gives preference to fast moving body parts using a Euclidean 3D Distance Transform. Actions are then classified by matching the extracted point clouds. Our proposed approach is based on a global matching and does not require specific training to learn the model. We test the approach thoroughly on two publicly available datasets and compare to several state-of-the-art methods. The achieved classification accuracy is on par with or superior to the best results reported to date.

AddThis Social Bookmark Button

Disney Research, Pittsburgh

October 23rd, 2008 Irfan Essa Posted in Jessica Hodgins No Comments »

This academic year, I am spending some time working with the newly formed Disney Research, Pittsburgh, (Directed by Jessica Hodgins) formed next to CMU.  The press release is announcing this lab is here (Carnegie Mellon SCS Press Release). I am also hanging out with folks at the CMU Robotics Institute and have started some new collaborations.  So now depending on when, you can find me either in Atlanta (at GA Tech) or in Pittsburgh (at Disney Lab or CMU) [OR on a airplane between Pittsburgh and Atlanta].

AddThis Social Bookmark Button

Paper: ACM Multimedia (2008) “Audio Puzzler: Piecing Together Time-Stamped Speech Transcripts with a Puzzle Game”

October 18th, 2008 Irfan Essa Posted in ACM MM, Computational Journalism, Multimedia, Nick Diakopoulos, Papers No Comments »

N. Diakopoulos, K. Luther, I. Essa (2008), “Audio Puzzler: Piecing Together Time-Stamped Speech Transcripts with a Puzzle Game.” In Proceedings of  ACM International Conference on Multimedia 2008. Vancouver, BC, CANANDA  [Project Link]

ABSTRACT

We have developed an audio-based casual puzzle game which produces a time-stamped transcription of spokenapaudio as a by-product of play. Our evaluation of the game indicates that it is both fun and challenging. The transcripts generated using the game are more accurate than those produced using a standard automatic transcription system and the time-stamps of words are within several hundred milliseconds of ground truth.

AddThis Social Bookmark Button

Research: Videolyzer (Online DEMO, try it out!)

October 15th, 2008 Irfan Essa Posted in Collaborators, Computational Journalism, Nick Diakopoulos, Projects No Comments »

An Online DEMO of Videolyzer, a project by my PhD Student, Nick Diakopolous.

Videolyzer is a tool designed to help journalists and bloggers collect, organize, and present information about the quality (i.e. validity, reliability, etc.) of online videos. It makes it possible to evaluate and make sense of things like comments, claims, and sources as they relate to the video. Users can comment and annotate pieces of the video (called “anchors”) to provide a more fine-grained description of the information in the video. The interface also incorporates a tightly integrated transcript of what’s spoken in the video to make it easier to navigate the dense information there. Finally, Videolyzer allows for collaboration among many people. Users can build off of each other’s annotations and rate each other in a form of distributed vetting and peer-evaluation.

AddThis Social Bookmark Button

Paper: ISWC (2008) “Localization and 3D Reconstruction of Urban Scenes Using GPS”

September 28th, 2008 Irfan Essa Posted in ISWC, Kihwan Kim, Mobile Computing, Papers, Thad Starner No Comments »

Kihwan Kim, Jay Summet, Thad Starner, Daniel Ashbrook, Mrunal Kapade and Irfan Essa  (2008) “Localization and 3D Reconstruction of Urban Scenes Using GPS” In Proceedings of IEEE Symposium on Wearable Computing (ISWC) 2008 (To Appear). [PDF]

ABSTRACT

research_gpsray

Using off-the-shelf Global Positioning System (GPS) units, we reconstruct buildings in 3D by exploiting the reduction in signal to noise ratio (SNR) that occurs when the buildings obstruct the line-of-sight between the moving units and the orbiting satellites. We measure the size and height of skyscrapers as well as automatically constructing a density map representing the location of multiple buildings in an urban landscape.  If deployed on a large scale, via a cellular service provider’s GPS-enabled mobile phones or GPS-tracked delivery vehicles, the system could provide an inexpensive means of continuously creating and updating 3D maps of urban environments.

AddThis Social Bookmark Button

Paper: Pragmatic Web (2008) “An Annotation Model for Making Sense of Information Quality in Online Videos”

September 28th, 2008 Irfan Essa Posted in Computational Journalism, Multimedia, Nick Diakopoulos, Papers No Comments »

N. Diakopoulos, I. Essa. (2008) “An Annotation Model for Making Sense of Information Quality in Online Videos.” Proceedings of the International Conference on the Pragmatic Web. 28–30 Sept. 2008, Uppsala, Sweden (To Appear)

ABSTRACT

Making sense of the information quality of online media including things such as the accuracy and validity of claims and the reliability of sources is essential for people to be well-informed. We are developing Videolyzer to address the challenge of information quality sense-making by allowing motivated individuals to analyze, collect, share, and respond to criticisms of the information quality of online political videos and their transcripts. In this paper specifically we present a model of how the annotation ontology and collaborative dynamics embedded in Videolyzer can enhance information quality.

AddThis Social Bookmark Button

Funding (2007): NSF “Web on Demand – Bridging the Gap Between Social Networks and Ad Hoc Networking”

September 1st, 2008 Irfan Essa Posted in Computational Journalism, Kishore Ramachandran, Mobile Computing No Comments »

Award#0834545 – CSR-DMSS, SM: Web on Demand – Bridging the Gap Between Social Networks and Ad Hoc Networking

Investigator(s): Umakishore Ramachandran, (Principal Investigator), Irfan Essa (Co-Principal Investigator)

Dates: September 1, 2008 – August 31, 2009 (Estimated)

Abstract

From the western world to the third world, the use of handheld devices (cellphones, PDAs) has proliferated. The world of users is becoming both wireless and mobile. Web 2.0 has ushered in an age wherein the web is viewed as a provider of services and not just a repository of documents and/or information. Despite this advance, the web remains just that, a single web with an inherent assumption that a powerful computing and communication infrastructure supports it. Couldn’t mobile wireless devices in close proximity form a web of their own? This is the vision behind this project, the Web on Demand (WoD). WoD aims at bridging the gap between social networks and ad hoc networking. In other words, it aims to rethink the system software stack all the way from application to networking that would allow the creation and management of social networks without any assumption of infrastructure support. The core of the research is to develop software technologies for mobile devices that would allow the dynamic creation of thematic ad hoc overlay networks empowering (a) mobile people with similar interests (e.g., weather forecast), (b) friends and family (e.g., in a theme park), and (c) participants in mission critical applications (e.g., search and rescue), stay connected. WoD complements the World Wide Web (WWW) and leverages it when it is available, such as exploiting the ambient computing infrastructure to enhance user experience, and managing the dynamic creation of User Generated Content (UGC) by mobile users. The vision behind this project is to democratize access to services that are currently offered through WWW. In this sense, the results from this research can have far-reaching technological and societal consequences. Most importantly, the research will help breed a new class of computer scientists who are connected with societal causes in addition to advancing technology.

AddThis Social Bookmark Button

Teaching: CS 4480 DVFX, Fall 08 “viral edition”

August 19th, 2008 Irfan Essa Posted in DVFX, Frank Dellaert No Comments »

I am very pleased that my colleague (and friend) Professor Frank Dellaert has taken over my DVFX class that I have been teaching since 1999 (see site here).  It is clear already that this new edition of the DVFX class will be even more exciting then the previous editions.  Can’t wait to see the final videos. Check out the info on the class at CS 4480 DVFX, Fall 08.

AddThis Social Bookmark Button

Research: Audio Puzzler Alpha

August 7th, 2008 Irfan Essa Posted in Computational Journalism, Nick Diakopoulos No Comments »

Audio Puzzler Alpha (ONLINE DEMO)

By Nick Diakopoulos (My PhD Student)

Audio Puzzler is a new kind of puzzle game based on unauthored content found online. The audio for the puzzles is taken from popular or interesting video clips from different genres such as news, documentary, or television. The audio puzzler is the type of game that harnesses people’s play to also provide valuable data which enriches the content played with. This is in the same vein as the ESPGame, the Listen Game, and PhotoPlay, which are all games which gather data in the process of game play. But while the data collected by these other games is useful for machine learning, the data collected with audio puzzler is immediately valuable as a transcription of the speech in the video. A similar effort (but in a much grander domain) is the Fold It project which seeks to harness playtime to solve protein folding problems. Much more detailed information about the evaluation of the technology will be forthcoming in a paper to be published at ACM Multimedia in October.

AddThis Social Bookmark Button

Thesis Raffay Hamid PhD (2008): “A Computational Framework For Unsupervised Analysis of Everyday Human Activities”

June 18th, 2008 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Numerical Machine Learning, PhD, Raffay Hamid No Comments »

M. Raffay Hamid PhD (2008), “A Computational Framework For Unsupervised Analysis of Everyday Human Activities“, PhD Thesis, Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Aaron Bobick & Irfan Essa)

Abstract

In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner. 

A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity.

Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify

a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments

AddThis Social Bookmark Button

Thesis David Minnen PhD (2008): “Unsupervised Discovery of Activity Primitives from Multivariate Sensor Data”

June 18th, 2008 Irfan Essa Posted in Activity Recognition, David Minnen, PhD, Thad Starner No Comments »

David Minnen PhD (2008): “Unsupervised Discovery of Activity Primitives from Multivariate Sensor Data“ Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Thad Starner & Irfan Essa)

Abstract

This research addresses the problem of temporal pattern discovery in real-valued, multivariate sensor data. Several algorithms were developed, and subsequent evaluation demonstrates that they can efficiently and accurately discover unknown recurring patterns in time series data taken from many different domains. Different data representations and motif models were investigated in order to design an algorithm with an improved balance between run-time and detection accuracy. The different data representations are used to quickly filter large data sets in order to detect potential patterns that form the basis of a more detailed analysis. The representations include global discretization, which can be efficiently analyzed using a suffix tree, local discretization with a corresponding random projection algorithm for locating similar pairs of subsequences, and a density-based detection method that operates on the original, real-valued data. In addition, a new variation of the multivariate motif discovery problem is proposed in which each pattern may span only a subset of the input features. An algorithm that can efficiently discover such “subdimensional” patterns was developed and evaluated. The discovery algorithms are evaluated by measuring the detection accuracy of discovered patterns relative to a set of expected patterns for each data set. The data sets used for evaluation are drawn from a variety of domains including speech, on-body inertial sensors, music, American Sign Language video, and GPS tracks.

AddThis Social Bookmark Button

Paper: ICASSP (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”

April 3rd, 2008 Irfan Essa Posted in Face and Gesture, Funding, James Rehg, NSF (0205507), Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin, Thad Starner No Comments »

Pei Yin, Irfan Essa, James Rehg, Thad Starner (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”, ICASSP 2008 – March 30 – April 4, 2008 – Las Vegas, Nevada, U.S.A. (Paper: MLSP-P3.D8, Session: Pattern Recognition and Classification II, Time: Thursday, April 3, 15:30 – 17:30, Topic: Machine Learning for Signal Processing: Learning Theory and Modeling) (PDF|Project Site)

ABSTRACT

icassp08We address the feature selection problem for hidden Markov models (HMMs) in sequence classification. Temporal correlation in sequences often causes difficulty in applying feature selection techniques. Inspired by segmental k-means segmentation (SKS), we propose Segmentally Boosted HMMs (SBHMMs), where the state-optimized features are constructed in a segmental and discriminative manner. The contributions are twofold. First, we introduce a novel feature selection algorithm, where the temporal dynamics are decoupled from the static learning procedure by assuming that the sequential data are piecewise independent and identically distributed. Second, we show that the SBHMM consistently improves traditional HMM recognition in various domains. The reduction of error compared to traditional HMMs ranges from 17% to 70% in American Sign Language recognition, human gait identification, lip reading, and speech recognition.

AddThis Social Bookmark Button

Event: SIGGRAPH PC Meeting at GA Tech

March 30th, 2008 Irfan Essa Posted in Events, Greg Turk, SIGGRAPH/SCA/NPAR/EG No Comments »

ACM SIGGRAPH 2008 Paper’s Committee Meeting was held at GA Tech in Atlanta, March 29-30, under the leadership of Greg Turk. Following is a picture of all of us at work, with our sigs, as a note of thanks for Greg

20080330-at-17h07m27-mg-9450bw.jpg

Original Photo by myself, this version with sigs by Fredo Durand.

AddThis Social Bookmark Button

Event: Journalism 3G The Future of Technology in the Field

February 23rd, 2008 Irfan Essa Posted in Computational Journalism, Events, Nick Diakopoulos No Comments »

Journalism 3G: The Future of Technology in the Field (A Symposium on Computation and Journalism) was a huge success. CJ Logo

  • We had over 230 registered attendees. Thanks to all participants, panelists, and speakers.
  • Use our Social Network (http://cj.crowdvine.com/) to continue the conversation.
  • Join the FACEBOOK group (http://git.facebook.com/group.php?gid=18427444784)
  • Use the tag “CnJ” on all blog posts and photo/video posts on the web, so we can collect them
  • Videos of the event are now available here.

20080223_0351-0355-pano-200p.jpg

AddThis Social Bookmark Button

Event: Symposium on computation+journalism (Feb 22-23, 2008, Atlanta, GA)

February 15th, 2008 Irfan Essa Posted in Events, Nick Diakopoulos No Comments »

CJ LogoWorking with Brad Stenger (Wired), Nick Diakopoulos (GA Tech), Sergio Goldenberg (GA Tech), we are organizing a Symposium on computation+journalism, to bring together computationalists, internet/media experts, and journalists together for a series of panels, presentations, and discussion around how computing technologies are effecting (and changing) journalism practices. We have over 180 people registered and it promise to be a great first-of-its-kind event. This event is being hosted by the GVU Center at Georgia Tech.

AddThis Social Bookmark Button

Paper: Ergonomics in Design (2007), “Designing a Technology Coach”

October 29th, 2007 Irfan Essa Posted in A. Dan Fisk, Activity Recognition, Aware Home, Papers, Wendy Rogers No Comments »

RogerEssaFisk IconFEATURE AT A GLANCE: Technology in the home environment has the potential to support older adults in a variety of ways. We took an interdisciplinary approach (human factors/ergonomics and computer science) to develop a technology “coach” that could support older adults in learning to use a medical device. Our system provided a computer vision system to track the use of a blood glucose meter and provide users with feedback if they made an error. This research could support the development of an in-home personal assistant to coach individuals in a variety of tasks necessary for independent living.

KEYWORDS: home technology, medical devices, support for learning

AddThis Social Bookmark Button

Paper: IEEE Data Mining Conference 2007 “Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery”

October 28th, 2007 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »

D. Minnen, I. Essa, C.L. Isbell, and T. Starner “Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery” In IEEE Int. Conf. on Data Mining (ICDM) 2007, Omaha, NE, October 28-31, 2007. [PDF]

Abstract

ICDMPaper Discovering recurring patterns in time series data is a fundamental problem for temporal data mining. This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the corresponding relevant dimensions. While many approaches to motif discovery have been developed, most are restricted to categorical data, univariate time series, or multivariate data in which the temporal patterns span all of the dimensions. In this paper, we present an expected linear-time algorithm that addresses a generalization of multivariate pattern discovery in which each motif may span only a subset of the dimensions. To validate our algorithm, we discuss its theoretical properties and empirically evaluate it using several data sets including synthetic data and motion capture data collected by an on-body inertial sensor.

AddThis Social Bookmark Button

Poster: ACM UIST (2007) “NARC: The News Article Revision Comparator.”

October 26th, 2007 Irfan Essa Posted in ACM UIST/CHI, Computational Journalism, Nick Diakopoulos No Comments »

A. St. Clair, M. Fong, N. Diakopoulos, I. Essa. (2007) “NARC: The News Article Revision Comparator.” In Proceedings addendum of User Interface Software Technology (UIST). Newport, Rhode Island, October 2007 [Abstract] [Poster]

ABSTRACT

Currency of information in news consumption is an important facet of information quality which involves both the journalist providing updated information and the consumer being aware of updates and changes to the news stream. We are addressing information quality and currency in online news articles from the viewpoint of news consumption with the intent of reducing the consumption effort involved in getting the most up-to-date information on a breaking news story. The goal of this research is thus to develop a web-based user interface which (1) allows users to easily and quickly see updates to news articles online and (2) blends into existing consumption patterns by integrating into news websites. We have built NARC to address these issues by providing an integrated interface which allows users to quickly perceive changes to news
articles using an inline text visualization.

AddThis Social Bookmark Button

Paper: ICCV 2007, “Structure from Statistics – Unsupervised Activity Analysis using Suffix Trees”

October 15th, 2007 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, PAMI/ICCV/CVPR/ECCV, Papers, Raffay Hamid No Comments »

Abstract

Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational complexity, and ability to capture activity structure only up to some fixed temporal scale. In this work, we propose Suffix Trees as an activity representation to efficiently extract structure of activities by analyzing their constituent event-subsequences over multiple temporal scales. We empirically compare Suffix Trees with some of the previous approaches in terms of feature cardinality, discriminative prowess, noise sensitivity and activity-class discovery. Finally, exploiting properties of Suffix Trees, we present a novel perspective on anomalous subsequences of activities, and propose an algorithm to detect them in linear-time. We present comparative results over experimental data, collected from a kitchen environment to demonstrate the competence of our proposed framework.

AddThis Social Bookmark Button

Thesis: Mitch Parry PhD (2007), “Separation and Analysis of Multichannel Signals”

October 9th, 2007 Irfan Essa Posted in Audio Analysis, Funding, Mitch Parry, NSF (0205507), PhD, Thesis No Comments »

Mitch Parry (2007), Separation and Analysis of Multichannel Signals PhD Thesis [PDF], Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Irfan Essa)

Abstract

This thesis examines a large and growing class of digital signals that capture the combined effect of multiple underlying factors. In order to better understand these signals, we would like to separate and analyze the underlying factors independently. Although source separation applies to a wide variety of signals, this thesis focuses on separating individual instruments from a musical recording. In particular, we propose novel algorithms for separating instrument recordings given only their mixture. When the number of source signals does not exceed the number of mixture signals, we focus on a subclass of source separation algorithms based on joint diagonalization. Each approach leverages a different form of source structure. We introduce repetitive structure as an alternative that leverages unique repetition patterns in music and compare its performance against the other techniques.

When the number of source signals exceeds the number of mixtures (i.e., the underdetermined problem), we focus on spectrogram factorization techniques for source separation. We extend single-channel techniques to utilize the additional spatial information in multichannel recordings, and use phase information to improve the estimation of the underlying components.

AddThis Social Bookmark Button

Paper: ACM HyperText (2007) “The Evolution of Authorship in a Remix Society”

September 15th, 2007 Irfan Essa Posted in Computational Journalism, Nick Diakopoulos, Papers, Research No Comments »

N. Diakopoulos, K. Luther, Y. Medynskiy, I. Essa (2007) The Evolution of Authorship in a Remix Society, ACM Hypertext 2007 Conference, Manchester, UK, September 2007 Abstract

Authorship entails the constrained selection or generation of media and the organization and layout of that media in a larger structure. But authorship is more than just selection and organization; it is a complex construct incorporating concepts of originality, authority, intertextuality, and attribution. In this paper we explore these concepts and ask how they are changing in light of modes of collaborative authorship in remix culture. We present a qualitative case study of an online video remixing site, illustrating how the constraints of that environment are impacting authorial constructs. We discuss users’ self-conceptions as authors, and how values related to authorship are reflected to users through the interface and design of the site’s tools. We also present some implications for the design of online communities for collaborative media creation and remixing.

  • N. Diakopoulos, K. Luther, Y. Medynskiy, I. Essa. The Evolution of Authorship in a Remix Society. In Proceedings of Hypertext and Hypermedia. Manchester, UK, September 2007[PDF]
  • N. Diakopoulos, K. Luther, Y. Medynskiy, I. Essa. Remixing Authorship: Reconfiguring the Author in Online Video Remix Culture. Georgia Tech, Technical Report. GIT-IC-07-05. 2007. [PDF]
AddThis Social Bookmark Button

Funding: NSF/SGER (2007) “Persistent, Adaptive, Collaborative Synthespians”

September 15th, 2007 Irfan Essa Posted in Charles Isbell, Numerical Machine Learning No Comments »

Award#0749181 – SGER Collaborative Research: Persistent, Adaptive, Collaborative Synthespians
ABSTRACT

This project explores the development of methodologies for populating worlds with persistent, adaptive, collaborative, believable synthetic actors, referred to as Synthespians. These methods are extensions of adaptive models of learning and planning to accommodate the complex, dynamic environments in massive multi-player online games. The intellectual merit includes the development and evaluation of: 1. A behavior development language, with discovery, machine learning, and adaptation of behaviors directly integrated into the language, allowing for the rapid development and deployment of Synthespians. 2. A framework for the actors to recognize and discover plans by observing and modeling the activities of the other agents. An expected outcome of this research is the ability to author complex virtual worlds with many participants that support intelligent and effective interaction between people and machines. Broader Impact: A scientific understanding of how we interact with each other and collaborate will benefit from our ability to simulate complex environments with dynamic and evolving individual and group behaviors. In this project, building and modeling such environments and behaviors is done within a gaming context. This work will in the long run effect and change the fields of education and entertainment. In addition, being able to model large collaborative and interactive scenarios will also help us understand and model large social dynamics phenomenon of interest to sociologists and economists.

AddThis Social Bookmark Button

Paper: AAAI 2007: “Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning”

August 24th, 2007 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »

Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning

Abstract

The problem of locating motifs in real-valued, multivariate time series data involves the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several non-overlapping subsequences and constitutes a motif because all of the included subsequences are similar. The ability to automatically discover such motifs allows intelligent systems to form endogenously meaningful representations of their environment through unsupervised sensor analysis. In this paper, we formulate a unifying view of motif discovery as a problem of locating regions of high density in the space of all time series subsequences. Our approach is efficient (sub-quadratic in the length of the data), requires fewer user-specified parameters than previous methods, and naturally allows variable length motif occurrences and nonlinear temporal warping. We evaluate the performance of our approach using four data sets from different domains including on-body inertial sensors and speech.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2007) “Tree-based Classifiers for Bilayer Video Segmentation”

June 17th, 2007 Irfan Essa Posted in Antonio Crimisini, Computational Photography and Video, Funding, John Winn, NSF (0205507), Numerical Machine Learning, Papers, Pei Yin, Research No Comments »

Yin, Pei Criminisi, Antonio Winn, John Essa, Irfan (2007), Tree-based Classifiers for Bilayer Video Segmentation In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR ’07, 17-22 June 2007, page(s): 1 – 8, Location: Minneapolis, MN, USA, ISBN: 1-4244-1180-7, Digital Object Identifier: 10.1109/CVPR.2007.383008

Abstract

This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a novel motion representation, “motons”, inspired by research in object recognition. Second, we propose learning the segmentation likelihood from the spatial context of motion. The learning is efficiently performed by Random Forests. The third contribution is a general taxonomy of tree-based classifiers, which facilitates theoretical and experimental comparisons of several known classification algorithms, as well as spawning new ones. Diverse visual cues such as motion, motion context, colour, contrast and spatial priors are fused together by means of a Conditional Random Field (CRF) model. Segmentation is then achieved by binary min-cut. Our algorithm requires no initialization. Experiments on many video-chat type sequences demonstrate the effectiveness of our algorithm in a variety of scenes. The segmentation results are comparable to those obtained by stereo systems.

AddThis Social Bookmark Button

Paper: IEEE ICASSP (2007) “Incorporating Phase Information for Source Separation via Spectrogram Factorization”

April 15th, 2007 Irfan Essa Posted in Audio Analysis, Funding, Mitch Parry, NSF (0205507), Papers, Research No Comments »

Parry, R.M. Essa, I. (2007) “Incorporating Phase Information for Source Separation via Spectrogram Factorization.” In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. 15-20 April 2007, Volume: 2, page(s): II-661 – II-66, Honolulu, HI, ISSN: 1520-6149, ISBN: 1-4244-0728-1, INSPEC Accession Number:9497202, Digital Object Identifier: 10.1109/ICASSP.2007.366322

Abstract

Spectrogram factorization methods have been proposed for single channel source separation and audio analysis. Typically, the mixture signal is first converted into a time-frequency representation such as the short-time Fourier transform (STFT). The phase information is thrown away and this spectrogram matrix is then factored into the sum of rank-one source spectrograms. This approach incorrectly assumes the mixture spectrogram is the sum of the source spectrograms. In fact, the mixture spectrogram depends on the phase of the source STFTs. We investigate the consequences of this common assumption and introduce an approach that leverages a probabilistic representation of phase to improve the separation results

AddThis Social Bookmark Button

Paper: ACM IWVSSN (2006) “Unsupervised Analysis of Activity Sequences Using Event Motifs”

October 23rd, 2006 Irfan Essa Posted in AAAI/IJCAI/UAI, Aaron Bobick, Activity Recognition, Aware Home, Papers, Raffay Hamid, Siddhartha Maddi No Comments »

  • R. Hamid, S. Maddi, A. Bobick, I. Essa. “Unsupervised Analysis of Activity Sequences Using Event Motifs”, In proceedings of 4th ACM International Workshop on Video Surveillance and Sensor Networks (in conjunction with ACM Multimedia 2006).

Abstract

We present an unsupervised framework to discover characterizations of everyday human activities, and demonstrate how such representations can be used to extract points of interest in event-streams. We begin with the usage of Suffix Trees as an efficient activity-representation to analyze the global structural information of activities, using their local event statistics over the entire continuum of their temporal resolution. Exploiting this representation, we discover characterizing event-subsequences and present their usage in an ensemble-based framework for activity classification. Finally, we propose a method to automatically detect subsequences of events that are locally atypical in a structural sense. Results over extensive data-sets, collected from multiple sensor-rich environments are presented, to show the competence and scalability of the proposed framework.

AddThis Social Bookmark Button

Paper in ACM Multimedia (2006): “Interactive mosaic generation for video navigation”

October 22nd, 2006 Irfan Essa Posted in ACM MM, Computational Photography and Video, Gregory Abowd, Kihwan Kim, Multimedia, Papers No Comments »

K. Kim, I. Essa, and G. Abowd (2006) “Interactive mosaic generation for video navigation.” in Proceedings of the 14th annual ACM international conference on Multimedia, pages 655-658, 2006. [Project Page | DOI | PDF]

Abstract

Navigation through large multimedia collections that include videos and images still remains cumbersome. In this paper, we introduce a novel method to visualize and navigate through the collection by creating a mosaic image that visually represents the compilation. This image is generated by a labeling-based layout algorithm using various sizes of sample tile images from the collection. Each tile represents both the photographs and video files representing scenes selected by matching algorithms. This generated mosaic image provides a new way for thematic video and visually summarizes the videos. Users can generate these mosaics with some predefined themes and layouts, or base it on the results of their queries. Our approach supports automatic generation of these layouts by using meta-information such as color, time-line and existence of faces or manually generated annotated information from existing systems (e.g., the Family Video Archive).

Interactive Video Mosaic

Interactive Video Mosaic

AddThis Social Bookmark Button

Paper: ACM UIST (2006) “Videotater: an approach for pen-based digital video segmentation and tagging”

October 15th, 2006 Irfan Essa Posted in Computational Photography and Video, Nick Diakopoulos, Papers, Research No Comments »

Diakopoulos, N. and Essa, I. (2006). Videotater: an approach for pen-based digital video segmentation and tagging. In Proceedings of the 19th Annual ACM Symposium on User interface Software and Technology (Montreux, Switzerland, October 15 – 18, 2006). UIST ’06. ACM Press, New York, NY, 221-224. [DOI]

Abstract

The continuous growth of media databases necessitates development of novel visualization and interaction techniques to support management of these collections. We present Videotater, an experimental tool for a Tablet PC that supports the efficient and intuitive navigation, selection, segmentation, and tagging of video. Our veridical representation immediately signals to the user where appropriate segment boundaries should be placed and allows for rapid review and refinement of manually or automatically generated segments. Finally, we explore a distribution of modalities in the interface by using multiple timeline representations, pressure sensing, and a tag painting/erasing metaphor with the pen.

AddThis Social Bookmark Button

Paper: IEEE ISWC (2006) “Discovering Characteristic Actions from On-Body Sensor Data”

October 14th, 2006 Irfan Essa Posted in Activity Recognition, Charles Isbell, David Minnen, Papers, Research, Thad Starner No Comments »

Discovering Characteristic Actions from On-Body Sensor Data (IEEEXplore)

Minnen, D. Starner, T. Essa, I. Isbell, C.
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332 USA. dminn@cc.gatech.edu
This paper appears in: Wearable Computers, 2006 10th IEEE International Symposium on
Publication Date: Oct. 2006
On page(s): 11 – 18
Number of Pages: 11 – 18
Location: Montreux, Switzerland
ISSN: 1550-4816
ISBN: 1-4244-0598-x
Digital Object Identifier: 10.1109/ISWC.2006.286337
Posted online: 2007-01-22 09:58:15.0

Abstract

We present an approach to activity discovery, the unsupervised identification and modeling of human actions embedded in a larger sensor stream. Activity discovery can be seen as the inverse of the activity recognition problem. Rather than learn models from hand-labeled sequences, we attempt to discover motifs, sets of similar subsequences within the raw sensor stream, without the benefit of labels or manual segmentation. These motifs are statistically unlikely and thus typically correspond to important or characteristic actions within the activity. The problem of activity discovery differs from typicalmotif discovery, such as locating protein binding sites, because of the nature of time series data representing human activity. For example, in activity data, motifs will tend to be sparsely distributed, vary in length, and may only exhibit intra-motif similarity after appropriate time warping. In this paper, we motivate the activity discovery problem and present our approach for efficient discovery of meaningful actions from sensor data representing human activity. We empirically evaluate the approach on an exercise data set captured by a wrist-mounted, three-axis inertial sensor. Our algorithm successfully discovers motifs that correspond to the real exercises with a recall rate of 96.3% and overall accuracy of 86.7% over six exercises and 864 occurrences.

AddThis Social Bookmark Button

Paper: J. Parallel Distrib. Computing (2005): “Experiences with optimizing two stream-based applications for cluster execution”

September 30th, 2006 Irfan Essa Posted in Computational Photography and Video, James Rehg, Kishore Ramachandran, Papers, Research No Comments »

Experiences with optimizing two stream-based applications for cluster execution Angelov, Y., Ramachandran, U., Mackenzie, K., Rehg, J. M., and Essa, I. 2005. “Experiences with optimizing two stream-based applications for cluster execution”. J. Parallel Distrib. Comput. 65, 6 (Jun. 2005), 678-691. [DOI]

Abstract

We explore optimization strategies and resulting performance of two stream-based video applications, video texture and color tracker, on a cluster of SMPs. The two applications are representative of a class of emerging applications, which we call “stream-based applications”, that are sensitive to both latency of individual results and overall throughput. Such applications require non-trivial parallelization techniques in order to improve both latency and throughput, given that the stream data emanates from a limited set of sources (exactly one in the two applications studied) and that the distribution of the data cannot be done a priori.We suggest techniques that address in a coordinated fashion the problems of data distribution and work partitioning. We believe the two problems are related and need to be addressed together. We have parallelized two applications using the Stampede cluster programming system that provides abstractions for implementing time-and throughput-sensitive applications elegantly and efficiently. For the Video Textures application we show that we can achieve a speedup of 24.26 on a 112 processor cluster. For the Color Tracker application, where latency is more crucial, we identify the extent of data parallelism that ensures that the slowest member of the pipeline is no longer the bottleneck for achieving a decent frame rate.

AddThis Social Bookmark Button

Home | Journalism 3G: The Future of Technology in the Field

September 17th, 2006 Irfan Essa Posted in Brad Stenger, Computational Journalism, Nick Diakopoulos, Projects No Comments »

Computational and Journalism (Journalism 3G)
Between the advent of the printing press and the rise of the Internet, more than 500 years passed without another technological advancement that significantly empowered the voice of the people and changed the nature of journalism. Now, with the rise of blogs, digital video and citizen journalists, computing technologies continue to usher in monumental change – affecting the field of journalism right down to its core. Who’s ready for this? How is the field adapting? And what are the implications for journalistic integrity?

AddThis Social Bookmark Button

Paper: IEEE CVPR (2006) “Learning Temporal Sequence Model from Partially Labeled Data”

June 14th, 2006 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Aware Home, Papers, Research, Yifan Shi No Comments »

Yifan Shi, Bobick, A. Essa, I. (2006), “Learning Temporal Sequence Model from Partially Labeled Data” Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006
Volume: 2, page(s): 1631 – 1638, ISSN: 1063-6919, ISBN: 0-7695-2597-0, Digital Object Identifier: 10.1109/CVPR.2006.174 [IEEEXplore]

Abstract

Graphical models are often used to represent and recognize activities. Purely unsupervised methods (such as HMMs) can be trained automatically but yield models whose internal structure – the nodes – are difficult to interpret semantically. Manually constructed networks typically have nodes corresponding to sub-events, but the programming and training of these networks is tedious and requires extensive domain expertise. In this paper, we propose a semi-supervised approach in which a manually structured, Propagation Network (a form of a DBN) is initialized from a small amount of fully annotated data, and then refined by an EM-based learning method in an unsupervised fashion. During node refinement (the M step) a boosting-based algorithm is employed to train the evidence detectors of individual nodes. Experiments on a variety of data types – vision and inertial measurements – in several tasks demonstrate the ability to learn from as little as one fully annotated example accompanied by a small number of positive but non-annotated training examples. The system is applied to both recognition and anomaly detection tasks.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2006) Element-Free Elastic Models for Volume Fitting and Capture”

June 14th, 2006 Irfan Essa Posted in Greg Turk, Modeling and Animation, Papers, Research No Comments »

Element-Free Elastic Models for Volume Fitting and Capture (IEEEXplore)

Jaeil Choi Szymczak, A. Turk, G. Essa, I.
Georgia Institute of Technology
This paper appears in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
Publication Date: 2006
Volume: 2
On page(s): 2245 – 2252
ISSN: 1063-6919
ISBN: 0-7695-2597-0
Digital Object Identifier: 10.1109/CVPR.2006.110
Posted online: 2006-10-09 11:11:24.0

Abstract

We present a new method of fitting an element-free volumetric model to a sequence of deforming surfaces of a moving object. Given a sequence of visual hulls, we iteratively fit an element-free elastic model to the visual hull in order to extract the optimal pose of the captured volume. The fitting of the volumetric model is acheived by minimizing a combination of elastic potential energy, a surface distance measure, and a self-intersection penalty for each frame. A unique aspect of our work is that the model is mesh free – since the model is represented as a point cloud, it is easy to construct, manipulate and update the model as needed. Additionally, linear elasicity with rotation compensation makes it possible to handle local deformations and large rotations of body parts much more efficiently than other volume fitting approaches. Our experimental results for volume fitting and capture in a multi-view camera setting demonstrate the robustness of element-free elastic models against noise and self-occlusions.

AddThis Social Bookmark Button

Paper: IEEE ICASSP (2006) “Source Detection Using Repetitive Structure”

May 14th, 2006 Irfan Essa Posted in Audio Analysis, Funding, Mitch Parry, NSF (0205507), Papers, Research No Comments »

Parry, R.M. Essa, I. (2006) “Source Detection Using Repetitive Structure (IEEEXplore).” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006, Publication Date: 14-19 May 2006, Volume: 4, page(s): IV – IV, Location: Toulouse, ISSN: 1520-6149, ISBN: 1-4244-0469-X, INSPEC Accession Number:9154520, Digital Object Identifier: 10.1109/ICASSP.2006.1661163

Abstract

Blind source separation algorithms typically require that the number of sources are known in advance. However, it is often the case that the number of sources change over time and that the total number is not known. Existing source separation techniques require source number estimation methods to determine how many sources are active within the mixture signals. These methods typically operate on the covariance matrix of mixture recordings and require fewer active sources than mixtures. When sources do not overlap in the time-frequency domain, more sources than mixtures may be detected and then separated. However, separating more sources than mixtures when sources overlap in time and frequency poses a particularly difficult problem. This paper addresses the issue of source detection when more sources than sensors overlap in time and frequency. We show that repetitive structure in the form of time-time correlation matrices can reveal when each source is active

AddThis Social Bookmark Button

Paper: ACM SIGGRAPH (2005) “Texture optimization for example-based synthesis”

July 25th, 2005 Irfan Essa Posted in ACM SIGGRAPH, Aaron Bobick, Computational Photography and Video, Nipun Kwatra, Papers, Research, Vivek Kwatra No Comments »

Vivek Kwatra, Irfan Essa, Aaron Bobick, and Nipun Kwatra (2005), “Texture optimization for example-based synthesis” In ACM Transactions on Graphics (TOG) Volume 24 , Issue 3 (July 2005) Proceedings of ACM SIGGRAPH 2005, Pages: 795 – 802, ISSN:0730-0301 (DOI|PDF|Project Site|Video|Talk)

ABSTRACT

TextureOptimizationWe present a novel technique for texture synthesis using optimization. We define a Markov Random Field (MRF)-based similarity metric for measuring the quality of synthesized texture with respect to a given input sample. This allows us to formulate the synthesis problem as minimization of an energy function, which is optimized using an Expectation Maximization (EM)-like algorithm. In contrast to most example-based techniques that do region-growing, ours is a joint optimization approach that progressively refines the entire texture. Additionally, our approach is ideally suited to allow for controllable synthesis of textures. Specifically, we demonstrate controllability by animating image textures using flow fields. We allow for general two-dimensional flow fields that may dynamically change over time. Applications of this technique include dynamic texturing of fluid animations and texture-based flow visualization.

AddThis Social Bookmark Button

Thesis: Vivek Kwatra’s PhD Thesis (2005) “Example-based Rendering of Textural Phenomena”

July 19th, 2005 Irfan Essa Posted in Computational Photography and Video, PhD, Thesis, Vivek Kwatra No Comments »

Vivek Kwatra (2005), “Example-based Rendering of Textural Phenomena”PhD Thesis, Georgia Institute of Technology, College of Computing (Advisors: Aaron Bobick, Irfan Essa) [URI], 19-Jul-2005

Abstract

This thesis explores synthesis by example as a paradigm for rendering real-world phenomena. In particular, phenomena that can be visually described as texture are considered. We exploit, for synthesis, the self-repeating nature of the visual elements constituting these texture exemplars. Techniques for unconstrained as well as constrained/controllable synthesis of both image and video textures are presented. For unconstrained synthesis, we present two robust techniques that can perform spatio-temporal extension, editing, and merging of image as well as video textures. In one of these techniques, large patches of input texture are automatically aligned and seamless stitched with each other to generate realistic looking images and videos. The second technique is based on iterative optimization of a global energy function that measures the quality of the synthesized texture with respect to the given input exemplar. We also present a technique for controllable texture synthesis. In particular, it allows for generation of motion-controlled texture animations that follow a specified flow field. Animations synthesized in this fashion maintain the structural properties like local shape, size, and orientation of the input texture even as they move according to the specified flow. We cast this problem into an optimization framework that tries to simultaneously satisfy the two (potentially competing) objectives of similarity to the input texture and consistency with the flow field. This optimization is a simple extension of the approach used for unconstrained texture synthesis. A general framework for example-based synthesis and rendering is also presented. This framework provides a design space for constructing example-based rendering algorithms. The goal of such algorithms would be to use texture exemplars to render animations for which certain behavioral characteristics need to be controlled. Our motion-controlled texture synthesis technique is an instantiation of this framework where the characteristic being controlled is motion represented as a flow field.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2005) “Tracking multiple objects through occlusions”

June 20th, 2005 Irfan Essa Posted in Activity Recognition, Aware Home, PAMI/ICCV/CVPR/ECCV, Papers, Yan Huang No Comments »

Huang, Y and Essa, I. (2005) “Tracking multiple objects through occlusions”,  In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005 (CVPR 2005), Volume: 2 page(s): 1051 – 1058 vol. 2, ISSN: 1063-6919, ISBN: 0-7695-2372-2, INSPEC Accession Number:8633324 DOI: 10.1109/CVPR.2005.350, [IEEEXplore#] 20-25 June 2005

ABSTRACT

We present an approach for tracking varying number of objects through both temporally and spatially significant occlusions. Our method builds on the idea of object permanence to reason about occlusions. To this end, tracking is performed at both the region level and the object level. At the region level, a customized genetic algorithm is used to search for optimal region tracks. This limits the scope of object trajectories. At the object level, each object is located based on adaptive appearance models, spatial distributions and inter-occlusion relationships. The proposed architecture is capable of tracking objects even in the presence of long periods of full occlusions. We demonstrate the viability of this approach by experimenting on several videos of a user interacting with a variety of objects on a desktop.

AddThis Social Bookmark Button