Thesis: Mitch Parry PhD (2007), “Separation and Analysis of Multichannel Signals”

October 9th, 2007 Irfan Essa Posted in Audio Analysis, Mitch Parry, PhD, Thesis No Comments »

Mitch Parry (2007), Separation and Analysis of Multichannel Signals PhD Thesis [PDF], Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Irfan Essa)

Abstract

This thesis examines a large and growing class of digital signals that capture the combined effect of multiple underlying factors. In order to better understand these signals, we would like to separate and analyze the underlying factors independently. Although source separation applies to a wide variety of signals, this thesis focuses on separating individual instruments from a musical recording. In particular, we propose novel algorithms for separating instrument recordings given only their mixture. When the number of source signals does not exceed the number of mixture signals, we focus on a subclass of source separation algorithms based on joint diagonalization. Each approach leverages a different form of source structure. We introduce repetitive structure as an alternative that leverages unique repetition patterns in music and compare its performance against the other techniques.

When the number of source signals exceeds the number of mixtures (i.e., the underdetermined problem), we focus on spectrogram factorization techniques for source separation. We extend single-channel techniques to utilize the additional spatial information in multichannel recordings, and use phase information to improve the estimation of the underlying components.

AddThis Social Bookmark Button

Thesis: Vivek Kwatra’s PhD Thesis (2005) “Example-based Rendering of Textural Phenomena”

July 19th, 2005 Irfan Essa Posted in Computational Photography and Video, PhD, Thesis, Vivek Kwatra No Comments »

Vivek Kwatra (2005), “Example-based Rendering of Textural Phenomena”PhD Thesis, Georgia Institute of Technology, College of Computing (Advisors: Aaron Bobick, Irfan Essa) [URI], 19-Jul-2005

Abstract

This thesis explores synthesis by example as a paradigm for rendering real-world phenomena. In particular, phenomena that can be visually described as texture are considered. We exploit, for synthesis, the self-repeating nature of the visual elements constituting these texture exemplars. Techniques for unconstrained as well as constrained/controllable synthesis of both image and video textures are presented. For unconstrained synthesis, we present two robust techniques that can perform spatio-temporal extension, editing, and merging of image as well as video textures. In one of these techniques, large patches of input texture are automatically aligned and seamless stitched with each other to generate realistic looking images and videos. The second technique is based on iterative optimization of a global energy function that measures the quality of the synthesized texture with respect to the given input exemplar. We also present a technique for controllable texture synthesis. In particular, it allows for generation of motion-controlled texture animations that follow a specified flow field. Animations synthesized in this fashion maintain the structural properties like local shape, size, and orientation of the input texture even as they move according to the specified flow. We cast this problem into an optimization framework that tries to simultaneously satisfy the two (potentially competing) objectives of similarity to the input texture and consistency with the flow field. This optimization is a simple extension of the approach used for unconstrained texture synthesis. A general framework for example-based synthesis and rendering is also presented. This framework provides a design space for constructing example-based rendering algorithms. The goal of such algorithms would be to use texture exemplars to render animations for which certain behavioral characteristics need to be controlled. Our motion-controlled texture synthesis technique is an instantiation of this framework where the characteristic being controlled is motion represented as a flow field.

AddThis Social Bookmark Button

Thesis: Drew Steedly PhD (2004): “Rigid Partitioning Techniques for Efficiently Generating 3D Reconstructions from Images”

December 9th, 2004 Irfan Essa Posted in Computational Photography and Video, Drew Steedly, PhD, Thesis No Comments »

Drew Steedly (2004)“Rigid Partitioning Techniques for Efficiently Generating 3D Reconstructions from Images”PhD Thesis, Georgia Institute of Technology, College of Computing. (Advisor: Irfan Essa) [PDF] [URI]

Abstract

This thesis explores efficient techniques for generating 3D reconstructions from imagery. Non-linear optimization is one of the core techniques used when computing a reconstruction and is a computational bottleneck for large sets of images. Since non-linear optimization requires a good initialization to avoid getting stuck in local minima, robust systems for generating reconstructions from images build up the reconstruction incrementally. A hierarchical approach is to split up the images into small subsets, reconstruct each subset independently and then hierarchically merge the subsets. Rigidly locking together portions of the reconstructions reduces the number of parameters needed to represent them when merging, thereby lowering the computational cost of the optimization. We present two techniques that involve optimizing with parts of the reconstruction rigidly locked together. In the first, we start by rigidly grouping the cameras and scene features from each of the reconstructions being merged into separate groups. Cameras and scene features are then incrementally unlocked and optimized until the reconstruction is close to the minimum energy. This technique is most effective when the influence of the new measurements is restricted to a small set of parameters. Measurements that stitch together weakly coupled portions of the reconstruction, though, tend to cause deformations in the low error modes of the reconstruction and cannot be efficiently incorporated with the previous technique. To address this, we present a spectral technique for clustering the tightly coupled portions of a reconstruction into rigid groups. Reconstructions partitioned in this manner can closely mimic the poorly conditioned, low error modes, and therefore efficiently incorporate measurements that stitch together weakly coupled portions of the reconstruction. We explain how this technique can be used to scalably and efficiently generate reconstructions from large sets of images.

AddThis Social Bookmark Button

Thesis: Gabriel Brostow’s PhD (2004): “Novel Skeletal Representation for Articulated Creatures”

April 9th, 2004 Irfan Essa Posted in Activity Recognition, Gabriel Brostow, Modeling and Animation, Research, Thesis No Comments »

Gabriel Brostow (2004), “Novel Skeletal Representation for Articulated Creatures” PhD Thesis, Georgia Institute of Technology, College of Computing. (Advisor: Irfan Essa) [PDF] [URI]AbstractThis research examines an approach for capturing 3D surface and structural data of moving articulated creatures. Given the task of non-invasively and automatically capturing such data, a methodology and the associated experiments are presented, that apply to multiview videos of the subjects motion. Our thesis states: A functional structure and the timevarying surface of an articulated creature subject are contained in a sequence of its 3D data. A functional structure is one example of the possible arrangements of internal mechanisms (kinematic joints, springs, etc.) that is capable of performing the motions observed in the input data. Volumetric structures are frequently used as shape descriptors for 3D data. The capture of such data is being facilitated by developments in multi-view video and range scanning, extending to subjects that are alive and moving. In this research, we examine vision-based modeling and the related representation of moving articulated creatures using Spines. We define a Spine as a branching axial structure representing the shape and topology of a 3D objects limbs, and capturing the limbs correspondence and motion over time. The Spine concept builds on skeletal representations often used to describe the internal structure of an articulated object and the significant protrusions. Our representation of a Spine provides for enhancements over a 3D skeleton. These enhancements form temporally consistent limb hierarchies that contain correspondence information about real motion data. We present a practical implementation that approximates a Spines joint probability function to reconstruct Spines for synthetic and real subjects that move. In general, our approach combines the objectives of generalized cylinders, 3D scanning, and markerless motion capture to generate baseline models from real puppets, animals, and human subjects.

AddThis Social Bookmark Button

Thesis: Antonio Haro’s PhD (2003): “Example based processing for image and video synthesis”

November 6th, 2003 Irfan Essa Posted in Antonio Haro, Computational Photography and Video, PhD, Research, Thesis No Comments »

Antonio Haro (2003) “Example based processing for image and video synthesis” PhD Thesis, Georgia Institute of Technology, College of Computing, Atlanta, GA, [URI] [PDF] (Advisor: Irfan Essa)

Abstract:

The example based processing problem can be expressed as: “Given an example of an image or video before and after processing, apply a similar processing to a new image or video”. Our thesis is that there are some problems where a single general algorithm can be used to create varieties of outputs, solely by presenting examples of what is desired to the algorithm. This is valuable if the algorithm to produce the output is non-obvious, e.g. an algorithm to emulate an example painting’s style. We limit our investigations to example based processing of images, video, and 3D models as these data types are easy to acquire and experiment with.

We represent this problem first as a texture synthesis influenced sampling problem, where the idea is to form feature vectors representative of the data and then sample them coherently to synthesize a plausible output for the new image or video. Grounding the problem in this manner is useful as both problems involve learning the structure of training data under some assumptions to sample it properly. We then reduce the problem to a labeling problem to perform example based processing in a more generalized and principled manner than earlier techniques. This allows us to perform a different estimation of what the output should be by approximating the optimal (and possibly not known) solution through a different approach.

AddThis Social Bookmark Button

Thesis: Irfan Essa’s PhD Thesis (1994): “Analysis, interpretation and synthesis of facial expressions”

August 30th, 1994 Irfan Essa Posted in Face and Gesture, Thesis No Comments »

Irfan Essa (1994), “Analysis, interpretation and synthesis of facial expressions“, PhD Thesis, MIT, Cambridge, MA, USA. (Advisor: Alex (Sandy) Pentland


Irfan Essa’s PhD Thesis

AddThis Social Bookmark Button

Thesis: Irfan Essa’s MS Thesis (1990): “Contact detection, collision forces and friction for physically based virtual world modeling”

May 3rd, 1990 Irfan Essa Posted in Masters, Modeling and Animation, Thesis No Comments »

Irfan Essa (1990), “Contact detection, collision forces and friction for physically based virtual world modeling” MS Thesis, Massachusetts Institute of Technology. Cambridge, Massachusetts, USA.

AddThis Social Bookmark Button