Videos of multi-player team sports provide a challenging domain for dynamic scene analysis. Player actions and interactions are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. We start by extracting the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. We evaluate our approach by analyzing videos of a variety of complex soccer plays.
We introduce a new algorithm for video retargeting that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. This formulation optimizes the difference in appearance of the resultant retargeted frame to the optimal temporally coherent one, and allows for carving around fast moving salient regions.
Additionally, we generalize the idea of appearance-based coherence to the spatial domain by introducing piece-wise spatial seams. Our spatial coherence measure minimizes the change in gradients during retargeting, which preserves spatial detail better than minimization of color difference alone. We also show that per-frame saliency (gradient- based or feature-based) does not always produce desirable retargeting results and propose a novel automatically computed measure of spatio-temporal saliency. As needed, a user may also augment the saliency by interactive region-brushing. Our retargeting algorithm processes the video sequentially, making it conducive for streaming applications.
We present an efficient and scalable technique for spatio- temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over- segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subse- quent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph.
We also propose two novel approaches to improve the scalability of our technique: (a) a parallel out- of-core algorithm that can process volumes much larger than an in-core algorithm, and (b) a clip-based process- ing algorithm that divides the video into overlapping clips in time, and segments them successively while enforcing consistency.
We demonstrate hierarchical segmentations on video shots as long as 40 seconds, and even support a streaming mode for arbitrarily long videos, albeit without the ability to process them hierarchically.
We present a novel approach for robust localization of multiple people observed using multiple cameras. We usethis location information to generate sports visualizations,which include displaying a virtual offside line in soccer games, and showing players’ positions and motion patterns.Our main contribution is the modeling and analysis for the problem of fusing corresponding players’ positional informationas finding minimum weight K-length cycles in complete K-partite graphs. To this end, we use a dynamic programmingbased approach that varies over a continuum of being maximally to minimally greedy in terms of the numberof paths explored at each iteration. We present an end-to-end sports visualization framework that employs our proposed algorithm-class. We demonstrate the robustness of our framework by testing it on 60; 000 frames of soccerfootage captured over 5 different illumination conditions, play types, and team attire.
We have the following 4 papers that have been accepted for publications in IEEE CVPR 2010. More details forthcoming, with links to more details.
Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan Essa (2010) “Discontinuous Seam-Carving for Video Retargeting” (a GA Tech, Google Collaboration)
Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan Essa (2010) “Efficient Hierarchical Graph-Based Video Segmentation” (a GA Tech, Google Collaboration)
Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, and Irfan Essa (2010) “Motion Fields to Predict Play Evolution in Dynamic Sport Scenes” (a GA Tech, Disney Collaboration)
Raffay Hamid, Ramkrishan Kumar, Matthias Grundmann, Kihwan Kim, Irfan Essa, and Jessica Hodgins (2010) “Player Localization Using Multiple Static Cameras for Sports Visualization” (a GA Tech, Disney Collaboration)
Check out the list of final projects for this term’s (Spring 2010) class on Computational Journalism. Final reports expected in last week of April. Stay tuned.
This class is aimed at understanding the computational and technological advancements in the area of journalism. Primary focus is on the study of technologies for developing new tools for (a) sense-making from diverse news information sources, (b) the impact of more and cheaper networked sensors (c) collaborative human models for information aggregation and sense-making, (d) mashups and the use of programming in journalism, (e) the impact of mobile computing and data gathering, (f) computational approaches to information quality, (g) data mining for personalization and aggregation, and (h) citizen journalism.
Although Computing, Society and Professionalism is a required course for CS majors, it is not a typical computer science course. Rather than dealing with the technical content of computing, it addresses the effects of computing on individuals, organizations, and society, and on what yourresponsibilities are as a computing professional in light of those impacts. The topic is a very broad one and one that you will have to deal with almost every day of your professional life. The issues are sometimes as intellectually deep as some of the greatest philosophical writings in history – and sometimes as shallow as a report on the evening TV news. This course can do little more than introduce you to the topics, but, if successful, will change the way you view the technology with which you work.
Radu Bogdan Rusu, Jan Bandouch, Franziska Meier, Irfan Essa and Michael Beetz (2009) “Human Action Recognition Using Global Point Feature Histograms and Action Shapes”, in Journal of Advanced Robotics, volume 23, pages 1873–1908, Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009. [ DOI | PDF]
Abstract
This paper investigates the recognition of human actions from three-dimensional (3-D) point clouds that encode the motions of people acting in sensor-distributed indoor environments. Data streams are time sequences of silhouettes extracted from cameras in the environment. From the 2-D silhouette contours we generate space–time streams by continuously aligning and stacking the contours along the time axis as third spatial dimension. The space–time stream of an observation sequence is segmented into parts corresponding to subactions using a pattern matching technique based on suffix trees and interval scheduling. Then, the segmented space–time shapes are processed by treating the shapes as 3-D point clouds and estimating global point feature histograms for them. The resultant models are clustered using statistical analysis and our experimental results indicate that the presented methods robustly derive different action classes. This holds despite large intra-class variance in the recorded datasets due to performances from different persons at different time intervals.
Kihwan Kim, Sangmin Oh, Jeonggyu Lee and Irfan Essa (2009), “Augmenting Aerial Earth Maps with Dynamic Information,” In Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Orlando, FL, USA, October 2009 [Project Site, Video (AVI/DiVX), Video (Youtube) Paper (pdf)].
Abstract
We introduce methods for augmenting aerial visualizations of Earth (from tools such as Google Earth or Microsoft Virtual Earth) with dynamic information obtained from videos. Our goal is to make Augmented Earth Maps that visualize the live broadcast of dynamic sceneries within a city. We propose different approaches to analyze videos of pedestrians and cars, under differing conditions and then augment Aerial Earth Maps (AEMs) with live and dynamic information. We also analyze natural phenomenon (clouds) and project information from these to the AEMs to add the visual reality.
I am co-organizing the First IEEE Workshop on Computer Vision for Humanoids in conjunction with ICCV Conference in Kyoto, Japan. This workshop will be held September 27, 2009. (9:30am – 6:00pm).
The goal of this workshop is to bring together experts from the fields of computer vision and robotics that are working on humanoid robots with vision as one of the primary modalities. Topics of interest include and are not limited to:
Visual Learning in Robots
Human Robot Interaction
Grasping and Manipulation
Learning by Demonstration
Task Learning for Robots
Activity Recognition and Discovery for Robot
Humanoid Navigation in Real Environments
Vision Devices and Systems for Robot Applications
Application of Humanoid Robots (Indoor/Outdoor, Entertainment)
This is the first attempt at a workshop that crosses from Humanoids Research to Computer Vision Research.
The workshop includes six invited talks as well as an open poster session, where all participants are expected to present a poster describing their recent work.
Location: Kyoto University, Faculty of Engineering Bldg.#3, 2F, Room W201, in conjunction with ICCV. (See http://www.iccv2009.org/workshops/index.html).
For schedule, abstracts and other information, see the workshop website. More information about ICCV at http://www.iccv2009.org/.
Invited Speakers and Organizers after the Workshop
Raffay Hamid, Siddhartha Maddi, Amos Johnson, Aaron Bobick, Irfan Essaand Charles Isbell (2009) “A novel sequence representation for unsupervised analysis of human activities” in Artificial Intelligence, Volume 173, Issue 14, September 2009, Pages 1221-1244. [PDF][DOI][Science Direct]
Abstract
Formalizing computational models for everyday human activities remains an open challenge. Many previous approaches towards this end assume prior knowledge about the structure of activities, using which explicitly defined models are learned in a completely supervised manner. For a majority of everyday environments however, the structure of the in situ activities is generally not known a priori. In this paper we investigate knowledge representations and manipulation techniques that facilitate learning of human activities in a minimally supervised manner. The key contribution of this work is the idea that global structural information of human activities can be encoded using a subset of their local event subsequences, and that this encoding is sufficient for activity-class discovery and classification.
In particular, we investigate modeling activity sequences in terms of their constituent subsequences that we call event n-grams. Exploiting this representation, we propose a computational framework to automatically discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding characterizations of these discovered classes from a holistic as well as a by-parts perspective. Using such characterizations, we present a method to classify a new activity to one of the discovered activity-classes, and to automatically detect whether it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our approach in a variety of everyday environments.
Keywords: Temporal reasoning; Scene analysis; Computer vision
Nipun Kwatra, Chris Wojtan, Mark Carlson, Irfan A. Essa, Peter J. Mucha, Greg Turk (2009), “Fluid Simulation with Articulated Bodies“, IEEE Transactions on Visualization and Computer Graphics, 10 Jun. 2009. IEEE computer Society Digital Library. IEEE Computer Society. [DOI | PDF (see copyright) | Video | Website]
Abstract
We present an algorithm for creating realistic animations of characters that are swimming through fluids. Our approach combines dynamic simulation with data-driven kinematic motions (motion capture data) to produce realistic animation in a fluid. The interaction of the articulated body with the fluid is performed by incorporating joint constraints with rigid animation and by extending a solid/fluid coupling method to handle articulated chains. Our solver takes as input the current state of the simulation and calculates the angular and linear accelerations of the connected bodies needed to match a particular motion sequence for the articulated body. These accelerations are used to estimate the forces and torques that are then applied to each joint. Based on this approach, we demonstrate simulated swimming results for a variety of different strokes, including crawl, backstroke, breaststroke and butterfly. The ability to have articulated bodies interact with fluids also allows us to generate simulations of simple water creatures that are driven by simple controllers.
“At the Georgia Institute of Technology in Atlanta, a three-year-old program in “computational journalism” helps computer-science majors study how journalists gather, organize and utilize information, then take these workflows and see how technology can make the processes easier.”
I will present a variety of temporal models of video that we have been studying (and developing on) for analysis and synthesis of video. Forsynthesis of videos, we have been developing representations that support example-based re-synthesis and spatio-temporal re-targeting. These approaches build on graph-based methods and we present techniques for similarity metrics for video, segmentation in video, and merging of different video streams. I will showcase a series of examples of these approaches applied to generate new videos.
For analysis of videos, we have developed a series of representations to observe and model activities in videos. Building on low-level measures of movement and motion in videos, we have incorporated higher-level temporal generative models to represent and recognize observed activities. I will discuss the strengths of a variety of State-based, Markovian, Grammar-based and Network-based representations that we have employed for recognizing activities from video. I will also discuss approaches for unsupervised discovery and recognition of activities.
Time permitting, I will describe some new efforts that move towards understanding mobile imaging and video, and video authoring and video on the web, Within these I will discuss issues of collaborative imaging, collective authoring, ad-hoc sensor networks, and peer production with images and videos. Using these concepts, to focus the conversation, I will discuss how all of these issues are impacting the field Journalism and Reporting and how we have started on a new interdisciplinary research and education effort, we call Computational Journalism.
Our consumption of images (photography/video) continues to grow with the pervasiveness of computing (networking, mobile and media) technologies into our daily lives. Everyone now has a mobile camera, and digital image capture, processing, and sharing has become ubiquitous in our society. This has led to a significant impact on we want to (a) create novel scenes, (b) share our experiences with images, and (c) interact with large amounts of images and videos from many sources. In this talk, I will start with a brief overview of series of ongoing efforts in the analysis of images and videos for rendering novel scenes, interacting with images/videos and collaboratively authoring new content. I will describe some work on video-based rendering and synthesizing novel videos (and scenes) and highlight the technical contributions being made in areas of Computational Photography and Video.
Using these sets of efforts as a foundation I will showcase where things are headed in terms of user generated content, media sharing, annotation, and reuse with large scale networks. In essence, everybody is a content, producer, distributor, and consumer. I will describe some new efforts that move towards understanding mobile imaging and video, and also discuss issues of collaborative imaging, collective authoring, ad-hoc sensor networks, and peer production with images and videos. Using these concepts I will discuss how all of these issues are impacting the field Journalism and Reporting and how we have started on a new interdisciplinary research and education effort, we call Computational Journalism. The concept of Computational Journalism includes more than just imaging, and relates to media and information in general and is aimed at the study of how we remain informed in this connected world. I will outline this new field and relate it back to imaging, with examples from some of our recent work in this new area.
Tools to aid people in making sense of the information quality of online informational video are essential for media consumers seeking to be well informed. Our application, Videolyzer, addresses the information quality problem in video by allowing politically motivated bloggers or journalists to analyze, collect, and share criticisms of the information quality of online political videos. Our interface innovates by providing a fine-grained and tightly coupled interaction paradigm between the timeline, the time-synced transcript, and annotations. We also incorporate automatic textual and video content analysis to suggest areas of interest for further assessment by a person. We present an evaluation of Videolyzer looking at the user experience, usefulness, and behavior around the novel features of the UI as well as report on the collaborative dynamic of the discourse generated with the tool.
This paper describes a data-driven approach for generating photorealistic animations of human motion. Each animation sequence follows a user-choreographed path and plays continuously by seamlessly transitioning between different segments of the captured data. To produce these animations, we capitalize on the complementary characteristics of motion capture data and video. We customize our capture system to record motion capture data that are synchronized with our video source. Candidate transition points in video clips are identified using a new similarity metric based on 3-D marker trajectories and their 2-D projections into video. Once the transitions have been identified, a video-based motion graph is constructed. We further exploit hybrid motion and video data to ensure that the transitions are seamless when generating animations. Motion capture marker projections serve as control points for segmentation of layers and nonrigid transformation of regions. This allows warping and blending to generate seamless in-between frames for animation. We show a series of choreographed animations of walks and martial arts scenes as validation of our approach.
Human Video Textures (Output Rendered as a Collage!)
Pei Yin, Thad Starner, Harley Hamilton, Irfan Essa, James M. Rehg (2009), ”Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection” in IEEE Conference on Acoustics, Speech, and Signal Processing 2009 (ICASSP 2009). Session: Spoken Language Understanding I, Tuesday, April 21, 11:00 – 13:00, Taipei, Taiwan.
ABSTRACT
The natural language for most deaf signers in the United States is American Sign Language (ASL). ASL has internal structure like spoken languages, and ASL linguists have introduced several phonemic models. The study of ASL phonemes is not only interesting to linguists, but also useful for scalability in recognition by machines. Since machine perception is different than human perception, this paper learns the basic units for ASL directly from data. Comparing with previous studies, our approach computes a set of data-driven units (fenemes) discriminatively from the results of segmental feature selection. The learning iterates the following two steps: first apply discriminative feature selection segmentally to the signs, and then tie the most similar temporal segments to re-train. Intuitively, the sign parts indistinguishable to machines are merged to form basic units, which we call ASL fenemes. Experiments on publicly available ASL recognition data show that the extracted data-driven fenemes are meaningful, and recognition using those fenemes achieves improved accuracy at reduced model complexity
Computation & Journalism: The Impact of Technology on Journalism, Information Quality, and Civic Literacy
Irfan Essa
Georgia Institute of Technology School of Interactive Computing, GVU and RIM Centers
Fundamentally, journalism is the process of collecting news information and disseminating that information with a layer of contextualization and understanding provided by journalists in the form of a news story. Recent advances in computational technology are rapidly affecting how news is gathered, reported, and distributed, and how stories are authored and told. New technologies for aggregating, visualizing, summarizing, consuming, and collaborating on news are becoming increasingly popular. Theses advances are challenging the traditional practices of journalism and directly affecting the future of news production and consumption. Both computation and journalism share a deep interest in information and the value it provides to society, and they are deeply involved in the future of storytelling in various contexts, especially current events. This requires us to consider how both Computation and Journalism can help each other.
In this talk, I will present a vision for a new area of research and education that brings together the fields of computation and journalism together to enhance both these disciplines and supports a creation of a “Computationalist-Journalist.,” a new kind of participant in the public conversation. I will start by describing how imaging, video, and media production and consumption has changed with technology and then how similar technologies can be used for Journalism and related Civic Literacy issues. I will describe new technologies that have changed the landscape of both Computation and Journalism and use these developments to showcase, where we are headed to with both Computation and Journalism, and technologists and journalists together to create new computing tools that further the aims of journalism.
“If you pay passing attention to the media landscape, you know that most mainstream news outlets have had their business models undermined by the digital revolution. As their general-interest monopolies have been pillaged by niche online competitors, traditional news organizations have lost revenue and cachet, laying off journalists in waves that have grown into tsunamis. This process has created dire prospects for the future of investigative reporting, often seen as the most costly of journalistic forms.”
Goes on to mention Computational Journalism and our (at GA Tech) and recent Duke University’s efforts in this space and few others.
We propose the use of 3D (2D+time) Shape Context to recognize the spatial and temporal details inherent in human actions. We represent an action in a video sequence by a 3D point cloud extracted by sampling 2D silhouettes over time. A non-uniform sampling method is introduced that gives preference to fast moving body parts using a Euclidean 3D Distance Transform. Actions are then classified by matching the extracted point clouds. Our proposed approach is based on a global matching and does not require specific training to learn the model. We test the approach thoroughly on two publicly available datasets and compare to several state-of-the-art methods. The achieved classification accuracy is on par with or superior to the best results reported to date.
This academic year, I am spending some time working with the newly formed Disney Research, Pittsburgh, (Directed by Jessica Hodgins) formed next to CMU. The press release is announcing this lab is here (Carnegie Mellon SCS Press Release). I am also hanging out with folks at the CMU Robotics Institute and have started some new collaborations. So now depending on when, you can find me either in Atlanta (at GA Tech) or in Pittsburgh (at Disney Lab or CMU) [OR on a airplane between Pittsburgh and Atlanta].
We have developed an audio-based casual puzzle game which produces a time-stamped transcription of spokenaudio as a by-product of play. Our evaluation of the game indicates that it is both fun and challenging. The transcripts generated using the game are more accurate than those produced using a standard automatic transcription system and the time-stamps of words are within several hundred milliseconds of ground truth.
Videolyzer is a tool designed to help journalists and bloggers collect, organize, and present information about the quality (i.e. validity, reliability, etc.) of online videos. It makes it possible to evaluate and make sense of things like comments, claims, and sources as they relate to the video. Users can comment and annotate pieces of the video (called “anchors”) to provide a more fine-grained description of the information in the video. The interface also incorporates a tightly integrated transcript of what’s spoken in the video to make it easier to navigate the dense information there. Finally, Videolyzer allows for collaboration among many people. Users can build off of each other’s annotations and rate each other in a form of distributed vetting and peer-evaluation.
Kihwan Kim, Jay Summet, Thad Starner, Daniel Ashbrook, Mrunal Kapade and Irfan Essa (2008) “Localization and 3D Reconstruction of Urban Scenes Using GPS” In Proceedings of IEEE Symposium on Wearable Computing (ISWC) 2008 (To Appear). [PDF]
ABSTRACT
Using off-the-shelf Global Positioning System (GPS) units, we reconstruct buildings in 3D by exploiting the reduction in signal to noise ratio (SNR) that occurs when the buildings obstruct the line-of-sight between the moving units and the orbiting satellites. We measure the size and height of skyscrapers as well as automatically constructing a density map representing the location of multiple buildings in an urban landscape. If deployed on a large scale, via a cellular service provider’s GPS-enabled mobile phones or GPS-tracked delivery vehicles, the system could provide an inexpensive means of continuously creating and updating 3D maps of urban environments.
N. Diakopoulos, I. Essa. (2008) “An Annotation Model for Making Sense of Information Quality in Online Videos.” Proceedings of the International Conference on the Pragmatic Web. 28–30 Sept. 2008, Uppsala, Sweden (To Appear)
ABSTRACT
Making sense of the information quality of online media including things such as the accuracy and validity of claims and the reliability of sources is essential for people to be well-informed. We are developing Videolyzer to address the challenge of information quality sense-making by allowing motivated individuals to analyze, collect, share, and respond to criticisms of the information quality of online political videos and their transcripts. In this paper specifically we present a model of how the annotation ontology and collaborative dynamics embedded in Videolyzer can enhance information quality.
From Computational Photography and Video to Computational Journalism
Abstract
Digital image capture, processing, and sharing has become pervasive in our society. This has had significant impact on how we create novel scenes, how we share our experiences, and how we interact with images and videos. In this talk, I will present an overview of series of ongoing efforts in the analysis of images and videos for rendering novel scenes. First I will discuss (in brief) our work on Video Textures, where repeating information is extracted to generate extended sequences of videos. I will also describe some our extensions to this approach that allows for controlled generation of animations of video sprites. We have developed various learning and optimization techniques that allow for video-based animations of photo-realistic characters. Using these sets of approaches as a foundation, then I will show how new images and videos can be generated. I will show examples of Photorealistic and Non-photorealistic Renderings of Scenes (Videos and Images) and how these methods support the media reuse culture, so common these days with user generated content. I will then describe some of our new efforts that move towards understanding mobile imaging and video, and also discuss issues of collaborative imaging and authoring and ad-hoc sensor networks, and peer production with images and videos, leading to a new concepts of how computation has impacted journalism. Time permitting, I will also share some of our efforts on video annotation and how we have taken some of these new concepts of video analysis to classrooms.
Investigator(s): Umakishore Ramachandran, (Principal Investigator), Irfan Essa (Co-Principal Investigator)
Dates: September 1, 2008 – August 31, 2009 (Estimated)
Abstract
From the western world to the third world, the use of handheld devices (cellphones, PDAs) has proliferated. The world of users is becoming both wireless and mobile. Web 2.0 has ushered in an age wherein the web is viewed as a provider of services and not just a repository of documents and/or information. Despite this advance, the web remains just that, a single web with an inherent assumption that a powerful computing and communication infrastructure supports it. Couldn’t mobile wireless devices in close proximity form a web of their own? This is the vision behind this project, the Web on Demand (WoD). WoD aims at bridging the gap between social networks and ad hoc networking. In other words, it aims to rethink the system software stack all the way from application to networking that would allow the creation and management of social networks without any assumption of infrastructure support. The core of the research is to develop software technologies for mobile devices that would allow the dynamic creation of thematic ad hoc overlay networks empowering (a) mobile people with similar interests (e.g., weather forecast), (b) friends and family (e.g., in a theme park), and (c) participants in mission critical applications (e.g., search and rescue), stay connected. WoD complements the World Wide Web (WWW) and leverages it when it is available, such as exploiting the ambient computing infrastructure to enhance user experience, and managing the dynamic creation of User Generated Content (UGC) by mobile users. The vision behind this project is to democratize access to services that are currently offered through WWW. In this sense, the results from this research can have far-reaching technological and societal consequences. Most importantly, the research will help breed a new class of computer scientists who are connected with societal causes in addition to advancing technology.
I am very pleased that my colleague (and friend) Professor Frank Dellaert has taken over my DVFX class that I have been teaching since 1999 (see site here). It is clear already that this new edition of the DVFX class will be even more exciting then the previous editions. Can’t wait to see the final videos. Check out the info on the class at CS 4480 DVFX, Fall 08.
Date and Time: Wednesday, 13 August 2008 | 1:45 pm – 5:30 pm
Location: Room 502 A, Los Angeles Convention Center, Los Angeles, CA, USA
Fundamentally, journalism is the process of collecting news information and disseminating that information with a layer of contextualization and understanding provided by journalists in the form of a news story. Recent advances in computational technology are rapidly affecting how news is gathered, reported, and distributed, and how stories are authored and told. New technologies for aggregating, visualizing, summarizing, consuming, and collaborating on news are becoming increasingly popular. They are challenging the traditional practices of journalism and directly affecting the future of news production and consumption. Computation and journalism share a deep interest in information and the value it provides to society, and they are deeply involved in the future of storytelling in various contexts, especially current events. This class summarizes how these new technologies affect journalism, both at the core of the journalism discipline and in its practice and business. Topics include: the technologies that have empowered citizen journalism and related citizen media production and authoring; mobile and sensing technologies that allow journalism to become ubiquitous and pervasive; the changes in photo, video, and broadcast journalism; and how web, online, and science journalism are changing the basic processes of reporting. Instructors focus especially on areas of special interest to the SIGGRAPH community: photography and video, large-scale information visualization, and social networking.
Audio Puzzler is a new kind of puzzle game based on unauthored content found online. The audio for the puzzles is taken from popular or interesting video clips from different genres such as news, documentary, or television. The audio puzzler is the type of game that harnesses people’s play to also provide valuable data which enriches the content played with. This is in the same vein as the ESPGame, the Listen Game, and PhotoPlay, which are all games which gather data in the process of game play. But while the data collected by these other games is useful for machine learning, the data collected with audio puzzler is immediately valuable as a transcription of the speech in the video. A similar effort (but in a much grander domain) is the Fold It project which seeks to harness playtime to solve protein folding problems. Much more detailed information about the evaluation of the technology will be forthcoming in a paper to be published at ACM Multimedia in October.
In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner.
A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity.
Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify
a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments
This research addresses the problem of temporal pattern discovery in real-valued, multivariate sensor data. Several algorithms were developed, and subsequent evaluation demonstrates that they can efficiently and accurately discover unknown recurring patterns in time series data taken from many different domains. Different data representations and motif models were investigated in order to design an algorithm with an improved balance between run-time and detection accuracy. The different data representations are used to quickly filter large data sets in order to detect potential patterns that form the basis of a more detailed analysis. The representations include global discretization, which can be efficiently analyzed using a suffix tree, local discretization with a corresponding random projection algorithm for locating similar pairs of subsequences, and a density-based detection method that operates on the original, real-valued data. In addition, a new variation of the multivariate motif discovery problem is proposed in which each pattern may span only a subset of the input features. An algorithm that can efficiently discover such “subdimensional” patterns was developed and evaluated. The discovery algorithms are evaluated by measuring the detection accuracy of discovered patterns relative to a set of expected patterns for each data set. The data sets used for evaluation are drawn from a variety of domains including speech, on-body inertial sensors, music, American Sign Language video, and GPS tracks.
The Project for Excellence in Journalism has done an amazing report on “The State of the News Media 2008″
with reference to American Journalism. It is a very interesting read and very interesting with reference to our efforts on Computation and Journalism.
Pei Yin, Irfan Essa, James Rehg, Thad Starner (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”, ICASSP 2008 – March 30 – April 4, 2008 – Las Vegas, Nevada, U.S.A. (Paper: MLSP-P3.D8, Session: Pattern Recognition and Classification II, Time: Thursday, April 3, 15:30 – 17:30, Topic: Machine Learning for Signal Processing: Learning Theory and Modeling) (PDF|Project Site)
ABSTRACT
We address the feature selection problem for hidden Markov models (HMMs) in sequence classification. Temporal correlation in sequences often causes difficulty in applying feature selection techniques. Inspired by segmental k-means segmentation (SKS), we propose Segmentally Boosted HMMs (SBHMMs), where the state-optimized features are constructed in a segmental and discriminative manner. The contributions are twofold. First, we introduce a novel feature selection algorithm, where the temporal dynamics are decoupled from the static learning procedure by assuming that the sequential data are piecewise independent and identically distributed. Second, we show that the SBHMM consistently improves traditional HMM recognition in various domains. The reduction of error compared to traditional HMMs ranges from 17% to 70% in American Sign Language recognition, human gait identification, lip reading, and speech recognition.
ACM SIGGRAPH 2008 Paper’s Committee Meeting was held at GA Tech in Atlanta, March 29-30, under the leadership of Greg Turk. Following is a picture of all of us at work, with our sigs, as a note of thanks for Greg
Original Photo by myself, this version with sigs by Fredo Durand.
Fundamentally, journalism is aimed at collecting news information and disseminating that information with a layer of contextualization and understanding provided by journalists. Recent advances in computational technology are rapidly affecting how news information is gathered, reported and distributed. Furthermore, new avenues for aggregating, visualizing, summarizing, consuming, and collaborating on news are increasingly becoming popular and challenging traditional practices of Journalism. Following the success of text search, image and video search questions are now poised to make a bigger impact to journalism and other related fields. Computation and Journalism individually share a deep routed interest in Information, and the value it provides to society. The concept of Information Quality, the measure of the value that the information provides to the user of that information, brings these two disciplines together. In computing and information sciences, information quality is used to describe the degree of excellence in communicating knowledge or intelligence and is composed of different facets such as accuracy, reliability, comprehensiveness, currency, and validity. In journalism, where the conveyance of quality information is paramount, principles such as accuracy, fairness, thoroughness, and transparency guide journalists in communicating quality information. Traditionally, journalism has also entailed an ethos of working on the side of the citizenry to provide them with quality information they need to make informed decisions in the process of their daily lives. However, the plethora of un-vetted blogs, podcasts, videos and other online media, generated by users or by corporations with subjective biases have led to significant compromise in information quality. Collaborative knowledge generation (wikipedia), and citizen journalism, are showing new ways of how information and (global) news can be shared. However, as the Web and the Internet continue to grow and as computing technologies pervade through the planet, a thorough study of the process of journalism and the deep computational aspects of such processes need to be undertaken. To this end, the PI’s research group at Georgia Institute of Technology is interested in understanding how computational advances impact the field of journalism. The long term aim is to make novel contributions by developing computational technologies to better support the goals of journalism. To launch this effort, they are organizing a Symposium on Computation + Journalism at GA Tech, in Atlanta, GA, February 22-23, 2008. The goal of this symposium is to bring together stakeholder from the all aspects of Journalism, Media, and Computation. Participants in panels, presentations and breakout groups will discuss these issues and create a roadmap towards answering these questions that bring together computation and journalism.
Working with Brad Stenger (Wired), Nick Diakopoulos (GA Tech), Sergio Goldenberg (GA Tech), we are organizing a Symposium on computation+journalism, to bring together computationalists, internet/media experts, and journalists together for a series of panels, presentations, and discussion around how computing technologies are effecting (and changing) journalism practices. We have over 180 people registered and it promise to be a great first-of-its-kind event. This event is being hosted by the GVU Center at Georgia Tech.
We received around 60 submissions to this track and expect a few NECTAR (new scientific and technical advances in research) submissions too (DUE Feb 18, 2008). The primary track submissions are in process of review).
Abstract Submission Deadline: January 25, 2008 *DONE* Paper Submission Deadline: January 30, 2008 *DONE* Author Notification Deadline: April 1, 2008
Spring 2008 Term at GA Tech begins Monday 1/7/2009. It will be a busy term with the following activities, in addition to my research related activities.
As demands on hospital efficiency increase, there is a stronger need for automatic analysis, recovery, and modification of surgical workflows. Even though most of the previous work has dealt with higher level and hospital-wide workflow including issues like document management, workflow is also an important issue within the surgery room. Its study has a high potential, e.g., for building context-sensitive operating rooms, evaluating and training surgical staff, optimizing surgeries and generating automatic reports. In this paper we propose an approach to segment the surgical workflow into phases based on temporal synchronization of multidimensional state vectors. Our method is evaluated on the example of laparoscopic cholecystectomy with state vectors representing tool usage during the surgeries. The discriminative power of each instrument in regard to each phase is estimated using AdaBoost. A boosted version of the Dynamic Time Warping (DTW) algorithm is used to create a surgical reference model and to segment a newly observed surgery. Full cross-validation on ten surgeries is performed and the method is compared to standard DTW and to Hidden Markov Models.
FEATURE AT A GLANCE: Technology in the home environment has the potential to support older adults in a variety of ways. We took an interdisciplinary approach (human factors/ergonomics and computer science) to develop a technology “coach” that could support older adults in learning to use a medical device. Our system provided a computer vision system to track the use of a blood glucose meter and provide users with feedback if they made an error. This research could support the development of an in-home personal assistant to coach individuals in a variety of tasks necessary for independent living.
KEYWORDS: home technology, medical devices, support for learning