Thesis: Mitch Parry PhD (2007), “Separation and Analysis of Multichannel Signals”

October 9th, 2007 Irfan Essa Posted in Audio Analysis, Mitch Parry, PhD, Thesis No Comments »

Mitch Parry (2007), Separation and Analysis of Multichannel Signals PhD Thesis [PDF], Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Irfan Essa)

Abstract

This thesis examines a large and growing class of digital signals that capture the combined effect of multiple underlying factors. In order to better understand these signals, we would like to separate and analyze the underlying factors independently. Although source separation applies to a wide variety of signals, this thesis focuses on separating individual instruments from a musical recording. In particular, we propose novel algorithms for separating instrument recordings given only their mixture. When the number of source signals does not exceed the number of mixture signals, we focus on a subclass of source separation algorithms based on joint diagonalization. Each approach leverages a different form of source structure. We introduce repetitive structure as an alternative that leverages unique repetition patterns in music and compare its performance against the other techniques.

When the number of source signals exceeds the number of mixtures (i.e., the underdetermined problem), we focus on spectrogram factorization techniques for source separation. We extend single-channel techniques to utilize the additional spatial information in multichannel recordings, and use phase information to improve the estimation of the underlying components.

AddThis Social Bookmark Button

Paper: IEEE ICASSP (2007) “Incorporating Phase Information for Source Separation via Spectrogram Factorization”

April 15th, 2007 Irfan Essa Posted in Audio Analysis, Mitch Parry, Papers, Research No Comments »

Incorporating Phase Information for Source Separation via Spectrogram Factorization

Parry, R.M. Essa, I.
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA
This paper appears in: Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Publication Date: 15-20 April 2007
Volume: 2
On page(s): II-661 - II-664
Number of Pages: II-661 - II-664
Location: Honolulu, HI
ISSN: 1520-6149
ISBN: 1-4244-0728-1
INSPEC Accession Number:9497202
Digital Object Identifier: 10.1109/ICASSP.2007.366322
Posted online: 2007-06-04 10:15:41.0

Abstract

Spectrogram factorization methods have been proposed for single channel source separation and audio analysis. Typically, the mixture signal is first converted into a time-frequency representation such as the short-time Fourier transform (STFT). The phase information is thrown away and this spectrogram matrix is then factored into the sum of rank-one source spectrograms. This approach incorrectly assumes the mixture spectrogram is the sum of the source spectrograms. In fact, the mixture spectrogram depends on the phase of the source STFTs. We investigate the consequences of this common assumption and introduce an approach that leverages a probabilistic representation of phase to improve the separation results

AddThis Social Bookmark Button

Paper: J. Parallel Distrib. Computing (2005): “Experiences with optimizing two stream-based applications for cluster execution”

September 30th, 2006 Irfan Essa Posted in Computational Photography and Video, Mitch Parry, Papers, Research No Comments »

Experiences with optimizing two stream-based applications for cluster execution Angelov, Y., Ramachandran, U., Mackenzie, K., Rehg, J. M., and Essa, I. 2005. “Experiences with optimizing two stream-based applications for cluster execution”. J. Parallel Distrib. Comput. 65, 6 (Jun. 2005), 678-691. [DOI]

Abstract

We explore optimization strategies and resulting performance of two stream-based video applications, video texture and color tracker, on a cluster of SMPs. The two applications are representative of a class of emerging applications, which we call “stream-based applications”, that are sensitive to both latency of individual results and overall throughput. Such applications require non-trivial parallelization techniques in order to improve both latency and throughput, given that the stream data emanates from a limited set of sources (exactly one in the two applications studied) and that the distribution of the data cannot be done a priori.We suggest techniques that address in a coordinated fashion the problems of data distribution and work partitioning. We believe the two problems are related and need to be addressed together. We have parallelized two applications using the Stampede cluster programming system that provides abstractions for implementing time-and throughput-sensitive applications elegantly and efficiently. For the Video Textures application we show that we can achieve a speedup of 24.26 on a 112 processor cluster. For the Color Tracker application, where latency is more crucial, we identify the extent of data parallelism that ensures that the slowest member of the pipeline is no longer the bottleneck for achieving a decent frame rate.

AddThis Social Bookmark Button

Paper: IEEE ICASSP (2006) “Source Detection Using Repetitive Structure”

May 14th, 2006 Irfan Essa Posted in Audio Analysis, Mitch Parry, Papers, Research No Comments »

Source Detection Using Repetitive Structure (IEEEXplore)

Parry, R.M. Essa, I.
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA
This paper appears in: Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Publication Date: 14-19 May 2006
Volume: 4
On page(s): IV - IV
Number of Pages: IV - IV
Location: Toulouse
ISSN: 1520-6149
ISBN: 1-4244-0469-X
INSPEC Accession Number:9154520
Digital Object Identifier: 10.1109/ICASSP.2006.1661163
Posted online: 2006-09-18 09:38:57.0

Abstract

Blind source separation algorithms typically require that the number of sources are known in advance. However, it is often the case that the number of sources change over time and that the total number is not known. Existing source separation techniques require source number estimation methods to determine how many sources are active within the mixture signals. These methods typically operate on the covariance matrix of mixture recordings and require fewer active sources than mixtures. When sources do not overlap in the time-frequency domain, more sources than mixtures may be detected and then separated. However, separating more sources than mixtures when sources overlap in time and frequency poses a particularly difficult problem. This paper addresses the issue of source detection when more sources than sensors overlap in time and frequency. We show that repetitive structure in the form of time-time correlation matrices can reveal when each source is active

AddThis Social Bookmark Button