MENU: Home Bio Affiliations Research Teaching Publications Collaborators/Students Calendar Contact FAQ ©2007-12 RSS

Video Stabilization on YouTube

May 6th, 2012 Irfan Essa Posted in Computational Photography and Video, Google, In The News, Matthias Grundmann, Vivek Kwatra No Comments »

Here is an excerpt from a Google Research Blog on our Video Stabilization on YouTube.  Now even more improved.

One thing we have been working on within Research at Google is developing methods for making casual videos look more professional, thereby providing users with a better viewing experience. Professional videos have several characteristics that differentiate them from casually shot videos. For example, in order to tell a story, cinematographers carefully control lighting and exposure and use specialized equipment to plan camera movement.

We have developed a technique that mimics professional camera moves and applies them to videos recorded by handheld devices. Cinematographers use specialized equipment such as tripods and dollies to plan their camera paths and hold them steady. In contrast, think of a video you shot using a mobile phone camera. How steady was your hand and were you able to anticipate an interesting moment and smoothly pan the camera to capture that moment? To bridge these differences, we propose an algorithm that automatically determines the best camera path and recasts the video as if it were filmed using stabilization equipment.

Via Video Stabilization on YouTube.

AddThis Social Bookmark Button

Paper in IEEE ICCP 2012: “Calibration-Free Rolling Shutter Removal”

April 28th, 2012 Irfan Essa Posted in Computational Photography and Video, Daniel Castro, ICCP, Matthias Grundmann, Vivek Kwatra No Comments »

Calibration-Free Rolling Shutter Removal

  • M. Grundmann, V. Kwatra, D. Castro, and I. Essa (2012), “Calibration-Free Rolling Shutter Removal,” in Proceedings of IEEE Conference on Computational Photography (ICCP), 2012. [PDF] [WEBSITE] [VIDEO] [BLOG] [BIBTEX]
    @inproceedings{2012-Grundmann-CRSR,
      Author = {Matthias Grundmann and Vivek Kwatra and Daniel Castro and Irfan Essa},
      Blog = {http://prof.irfanessa.com/2012/04/28/paper-iccp12/},
      Booktitle = {Proceedings of IEEE Conference on Computational Photography (ICCP)},
      Date-Added = {2012-04-09 22:40:38 +0000},
      Date-Modified = {2012-04-30 22:18:03 +0000},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2012-Grundmann-CRSR.pdf},
      Publisher = {IEEE Computer Society},
      Title = {Calibration-Free Rolling Shutter Removal},
      Url = {http://www.cc.gatech.edu/cpl/projects/rollingshutter/},
      Video = {http://www.youtube.com/watch?v=_Pr_fpbAok8},
      Year = {2012},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/rollingshutter/}}

Abstract

We present a novel algorithm for efficient removal of rolling shutter distortions in uncalibrated streaming videos. Our proposed method is calibration free as it does not need any knowledge of the camera used, nor does it require calibration using specially recorded calibration sequences. Our algorithm can perform rolling shutter removal under varying focal lengths, as in videos from CMOS cameras equipped with an optical zoom. We evaluate our approach across a broad range of cameras and video sequences demonstrating robustness, scalability, and repeatability. We also conducted a user study, which demonstrates a preference for the output of our algorithm over other state-of-the art methods. Our algorithm is computationally efficient, easy to parallelize, and robust to challenging artifacts introduced by various cameras with differing technologies.

Presented at IEEE International Conference on Computational Photography, Seattle, WA, April 27-29, 2012. Winner of BEST PAPER AWARD.


AddThis Social Bookmark Button

Paper in IEEE CVPR 2012: “Detecting Regions of Interest in Dynamic Scenes with Camera Motions”

April 9th, 2012 Irfan Essa Posted in Activity Recognition, Kihwan Kim, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, PERSEAS, Visual Surviellance No Comments »

Detecting Regions of Interest in Dynamic Scenes with Camera Motions

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [BLOG] [BIBTEX]
    @inproceedings{2012-Kim-DRIDSWCM,
      Author = {Kihwan Kim and Dongreyol Lee and Irfan Essa},
      Blog = {http://prof.irfanessa.com/2012/04/09/paper-cvpr2012/},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2012-04-09 22:37:06 +0000},
      Date-Modified = {2012-04-30 22:26:13 +0000},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2012-Kim-DRIDSWCM.pdf},
      Publisher = {IEEE Computer Society},
      Title = {Detecting Regions of Interest in Dynamic Scenes with Camera Motions},
      Url = {http://www.cc.gatech.edu/cpl/projects/roi/},
      Video = {http://www.youtube.com/watch?v=19BMwDMCSp8},
      Year = {2012},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/roi/}}

Abstract

We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple mov- ing objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noise and can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically.

We evaluate our approach on a variety of videos of team sports and compare the detected regions of interest to the camera motion generated by actual camera operators. Our experimental results demonstrate that our approach is computationally efficient, and provides better prediction than those of previously proposed RBF-based approaches.

Presented at: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, Providence, RI, June 16-21, 2012

AddThis Social Bookmark Button

Award (2012): Best Computer Vision Paper Award by Google Research

March 22nd, 2012 Irfan Essa Posted in Computational Photography and Video, Matthias Grundmann, Papers, Vivek Kwatra No Comments »

Our following paper was just awarded the Excellent Paper for 2011 in Computer Vision by Google Research.

  • M. Grundmann, V. Kwatra, and I. Essa (2011), “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. [PDF] [WEBSITE] [VIDEO] [DEMO] [BLOG] [BIBTEX]
    @inproceedings{2011-Grundmann-AVSWROCP,
      Author = {M. Grundmann and V. Kwatra and I. Essa},
      Blog = {http://prof.irfanessa.com/2011/06/19/videostabilization/},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Modified = {2011-12-08 22:13:20 +0000},
      Demo = {http://www.youtube.com/watch?v=0MiY-PNy-GU},
      Month = {June},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Grundmann-AVSWROCP.pdf},
      Publisher = {IEEE Computer Society},
      Title = {Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths},
      Url = {http://www.cc.gatech.edu/cpl/projects/videostabilization/},
      Video = {http://www.youtube.com/watch?v=i5keG1Y810U},
      Year = {2011},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/videostabilization/}}

Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our stabilization technique automatically converts casual shaky footage into more pleasant and professional looking videos by mimicking these cinematographic principles. The original, shaky camera path is divided into a set of segments, each approximated by either constant, linear or parabolic motion, using an algorithm based on robust L1 optimization. The stabilizer has been part of the YouTube Editor youtube.com/editor since March 2011.

via Research Blog.

AddThis Social Bookmark Button

Teaching: Spring 2012

January 11th, 2012 Irfan Essa Posted in CnJ, Computational Journalism, Computational Photography and Video, DVFX No Comments »

In Spring 2012, I am teaching 2 classes.

Advanced Computational Photography (CS 8803 PHO) [with Grant Schindler]

This is an advanced topics class in Computational Photography, building on my intro class and explores technical aspects of pictures, and more precisely the capture and depiction of reality on a 2D medium. The scientific, perceptual, and artistic principles behind image-making will be emphasized. Topics include the relationship between pictorial techniques and the human visual system; intrinsic limitations of 2D representations and their possible compensations; and technical issues involving depiction. Technical aspects of image capture and rendering, and exploration of how such a medium can be used to its maximum potential, will be examined. Students are strongly encouraged (not required) to bring their digital cameras and a laptop to facilitate experiments. The class will explore recent and state of the art paper in Computational Photography from leading conferences and journals in the area and students will do projects in a variety of topics.

Computation + Journalism (CS 4464 / CS 6465)

This class is aimed at understanding the computational and technological advancements in the area of journalism. Primary focus is on the study of technologies for developing new tools for (a) sense-making from diverse news information sources, (b) the impact of more and cheaper networked sensors (c) collaborative human models for information aggregation and sense-making, (d) mashups and the use of programming in journalism, (e) the impact of mobile computing and data gathering, (f) computational approaches to information quality, (g) data mining for personalization and aggregation, and (h) citizen journalism. Complete schedule and other information will be on the t-square site available to only students taking the class.

AddThis Social Bookmark Button

Kihwan Kim’s Thesis Defense (2011): “Spatio-temporal Data Interpolation for Dynamic Scene Analysis”

December 6th, 2011 Irfan Essa Posted in Computational Photography and Video, Kihwan Kim, Modeling and Animation, Multimedia, PhD, Security, Visual Surviellance, WWW No Comments »

Spatio-temporal Data Interpolation for Dynamic Scene Analysis

Kihwan Kim, PhD Candidate

School of Interactive Computing, College of Computing, Georgia Institute of Technology

Date: Tuesday, December 6, 2011

Time: 1:00 pm – 3:00 pm EST

Location: Technology Square Research Building (TSRB) Room 223

Abstract

Analysis and visualization of dynamic scenes is often constrained by the amount of spatio-temporal information available from the environment. In most scenarios, we have to account for incomplete information and sparse motion data, requiring us to employ interpolation and approximation methods to fill for the missing information. Scattered data interpolation and approximation techniques have been widely used for solving the problem of completing surfaces and images with incomplete input data. We introduce approaches for such data interpolation and approximation from limited sensors, into the domain of analyzing and visualizing dynamic scenes. Data from dynamic scenes is subject to constraints due to the spatial layout of the scene and/or the configurations of video cameras in use. Such constraints include: (1) sparsely available cameras observing the scene, (2) limited field of view provided by the cameras in use, (3) incomplete motion at a specific moment, and (4) varying frame rates due to different exposures and resolutions.

In this thesis, we establish these forms of incompleteness in the scene, as spatio- temporal uncertainties, and propose solutions for resolving the uncertainties by applying scattered data approximation into a spatio-temporal domain.

The main contributions of this research are as follows: First, we provide an effi- cient framework to visualize large-scale dynamic scenes from distributed static videos. Second, we adopt Radial Basis Function (RBF) interpolation to the spatio-temporal domain to generate global motion tendency. The tendency, represented by a dense flow field, is used to optimally pan and tilt a video camera. Third, we propose a method to represent motion trajectories using stochastic vector fields. Gaussian Pro- cess Regression (GPR) is used to generate a dense vector field and the certainty of each vector in the field. The generated stochastic fields are used for recognizing motion patterns under varying frame-rate and incompleteness of the input videos. Fourth, we also show that the stochastic representation of vector field can also be used for modeling global tendency to detect the region of interests in dynamic scenes with camera motion. We evaluate and demonstrate our approaches in several applications for visualizing virtual cities, automating sports broadcasting, and recognizing traffic patterns in surveillance videos.

Committee:

  • Prof. Irfan Essa (Advisor, School of Interactive Computing, Georgia Institute of Technology)
  • Prof. James M. Rehg (School of Interactive Computing, Georgia Institute of Technology)
  • Prof. Thad Starner (School of Interactive Computing, Georgia Institute of Technology)
  • Prof. Greg Turk (School of Interactive Computing, Georgia Institute of Technology)
  • Prof. Jessica K. Hodgins (Robotics Institute, Carnegie Mellon University, and Disney Research Pittsburgh)
AddThis Social Bookmark Button

Event: CnJ Panel at Georgia Tech’s Future Media Fest 2011 | Computation + Journalism

November 15th, 2011 Irfan Essa Posted in Computational Journalism, Eric Gilbert, Events No Comments »

Computational Journalism is defined as the application of computation to the activities of journalism such as information gathering, organization, communication, and dissemination of information, while upholding values of journalism such as accuracy and verifiability. Journalists are increasingly adopting and using the proliferation of open-source tools and embracing different styles of journalism. Explore how newsrooms are opening, what new tools are being created, and how to use those tools most effectively.

Panelists:

Topics of discussion will include (but will not be limited to):

  • What is Computational Journalism?
  • What impact has Computation / Information Technology / Networking Technology had on Journalism?
  • What is the newsroom of the future? How has the newsroom changed?
  • How has investigative journalism changed with new technologies?
  • How is social networking changed how we gather, distribute, and share news (and information)?
  • What are the economic / financial models that need to explored to support (and sustain) journalism?
  • What is the role of an Editor in the new journalism model?
  • What should we be teaching the next generation of journalists?

via CnJ Panel at Georgia Tech’s Future Media Fest 2011 | Computation + Journalism.

AddThis Social Bookmark Button

Paper in ICCV 2011: “Gaussian Process Regression Flow for Analysis of Motion Trajectories”

October 28th, 2011 Irfan Essa Posted in Activity Recognition, DARPA, Kihwan Kim, PAMI/ICCV/CVPR/ECCV, Papers No Comments »

Gaussian Process Regression Flow for Analysis of Motion Trajectories

  • Kim, Lee, and Essa (2011), “Gaussian Process Regression Flow for Analysis of Motion Trajectories,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011. [PDF] [WEBSITE] [VIDEO] [BIBTEX]
     @inproceedings{Kim2011-GPRF, Author = {K. Kim and D. Lee and I. Essa}, Booktitle = {Proceedings of IEEE International Conference on Computer Vision (ICCV)}, Month = {November}, Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Kim-GPRFAMT.pdf}, Publisher = {IEEE Computer Society}, Title = {Gaussian Process Regression Flow for Analysis of Motion Trajectories}, Url = {http://www.cc.gatech.edu/cpl/projects/gprf/}, Video = {http://www.youtube.com/watch?v=UtLr37hDQz0}, Year = {2011}}

Abstract

Analysis and Recognition of motions and activities of objects in videos requires effective representations for analysis and matching of motion trajectories. In this paper, we introduce a new representation specifically aimed at matching motion trajectories. We model a trajectory as a continuous dense flow field from a sparse set of vector sequences using Gaussian Process Regression. Furthermore, we introduce a random sampling strategy for learning stable classes of motions from limited data.

Our representation allows for incrementally predicting possible paths and detecting anomalous events from online trajectories. This representation also supports matching of complex motions with acceleration changes and pauses or stops within a trajectory. We use the proposed approach for classifying and predicting motion trajectories in traffic monitoring domains and test on several data sets. We show that our approach works well on various types of complete and incomplete trajectories from a variety of video data sets with different frame rates

AddThis Social Bookmark Button

In the News (2011): “Shake it like an Instagram picture — Online Video News”

September 15th, 2011 Irfan Essa Posted in Collaborators, Computational Photography and Video, Google, In The News, Matthias Grundmann, Vivek Kwatra, WWW No Comments »

Our work, as described in the following paper, now showcased in youtube.

  • M. Grundmann, V. Kwatra, and I. Essa (2011), “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. [PDF] [WEBSITE] [VIDEO] [DEMO] [BLOG] [BIBTEX]
    @inproceedings{2011-Grundmann-AVSWROCP,
      Author = {M. Grundmann and V. Kwatra and I. Essa},
      Blog = {http://prof.irfanessa.com/2011/06/19/videostabilization/},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Modified = {2011-12-08 22:13:20 +0000},
      Demo = {http://www.youtube.com/watch?v=0MiY-PNy-GU},
      Month = {June},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Grundmann-AVSWROCP.pdf},
      Publisher = {IEEE Computer Society},
      Title = {Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths},
      Url = {http://www.cc.gatech.edu/cpl/projects/videostabilization/},
      Video = {http://www.youtube.com/watch?v=i5keG1Y810U},
      Year = {2011},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/videostabilization/}}

YouTube effects: Shake it like an Instagram picture

via YouTube effects: Shake it like an Instagram picture — Online Video News.

YouTube users can now apply a number of Instagram-like effects to their videos, giving them a cartoonish or Lomo-like look with the click of a button. The effects are part of a new editing feature that also includes cropping and advanced image stabilization.

Taking the shaking out of video uploads should go a long way towards making some of the amateur footage captured on mobile phones more watchable, but it can also be resource-intensive — which is why Google’s engineers invented an entirely new approach toward image stabilization.

The new editing functionality will be part of YouTube’s video page, where a new “Edit video” button will offer access to filters and other editing functionality. This type of post-processing is separate from YouTube’s video editor, which allows to produce new videos based on existing clips.

AddThis Social Bookmark Button

Funding (2011) NSF (1146352) “EAGER: Linguistic Task Transfer for Humans and Cyber Systems”

September 1st, 2011 Irfan Essa Posted in Activity Recognition, Mike Stilman, NSF, Robotics No Comments »

EAGER: Linguistic Task Transfer for Humans and Cyber Systems (Mike Stillman, Irfan Essa) NSF/RI

This project, investigating formal languages as a general methodology for task transfer between distinct cyber-physical systems such as humans and robots, aims to expand the science of cyber physical systems by developing Motion Grammars that will enable task transfer between distinct systems.

Formal languages are tools for encoding, describing and transferring structured knowledge. In natural language, the latter process is called communication. Similarly, we will develop a formal language through which arbitrary cyber-physical systems communicate tasks via structured actions. This investigation of Motion Grammars will contribute to the science of human cognition and the engineering of cyber-physical algorithms. By observing human activities during manipulation we will develop a novel class of hybrid control algorithms based on linguistic representations of task execution. These algorithms will broaden the capabilities of man-made systems and provide the infrastructure for motion transfer between humans, robots and broader systems in a generic context. Furthermore, the representation in a rigorous grammatical context will enable formal verification and validation in future work.
Broader Impacts: The proposed research has direct applications to new solutions for manufacturing, medical treatments such as surgery, logistics and food processing. In turn, each of these areas has a significant impact on the efficiency and convenience of our daily lives. The PIs serve as coordinators of graduate/undergraduate programs and mentors to community schools. In order to guarantee that women and minorities have a significant role in the research, the PIs will annually invite K-12 students from Atlanta schools with primarily African American populations to the laboratories. One-day robot classes will be conducted that engage students in the excitement of hands-on science by interactively using lab equipment to transfer their manipulation skills to a robot arm.

Via Award#1146352 – EAGER: Linguistic Task Transfer for Humans and Cyber Systems.

AddThis Social Bookmark Button