Gordon Wichern | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gordon Wichern is active.

Explore More

Publication

Featured researches published by Gordon Wichern.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

Segmentation, Indexing, and Retrieval for Environmental and Natural Sounds

Gordon Wichern; Jiachen Xue; Harvey D. Thornburg; Brandon Mechtley; Andreas Spanias

We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions.

international conference on acoustics, speech, and signal processing | 2008

Fast query by example of environmental sounds via robust and efficient cluster-based indexing

Jiachen Xue; Gordon Wichern; Harvey D. Thornburg; Andreas Spanias

There has been much recent progress in the technical infrastructure necessary to continuously characterize and archive all sounds, or more precisely auditory streams, that occur within a given space or human life. Efficient and intuitive access, however, remains a considerable challenge. In specifically musical domains, i.e., melody retrieval, query-by-example (QBE) has found considerable success in accessing music that matches a specific query. We propose an extension of the QBE paradigm to the broad class of natural and environmental sounds, which occur frequently in continuous recordings. We explore several cluster-based indexing approaches, namely non-negative matrix factorization (NMF) and spectral clustering to efficiently organize and quickly retrieve archived audio using the QBE paradigm. Experiments on a test database compare the performance of the different clustering algorithms in terms of recall, precision, and computational complexity. Initial results indicate significant improvements over both exhaustive search schemes and traditional K- means clustering, and excellent overall performance in the example-based retrieval of environmental sounds.

content based multimedia indexing | 2007

Robust Multi-Features Segmentation and Indexing for Natural Sound Environments

Gordon Wichern; Harvey D. Thornburg; Brandon Mechtley; Alex Fink; Kai Tu; Andreas Spanias

Creating an audio database from continuous long-term recordings, allows for sounds to not only be linked by the time and place in which they were recorded, but also to sounds with similar acoustic characteristics. Of paramount importance in this application is the accurate segmentation of sound events, enabling realistic navigation of these recordings. We first propose a novel feature set of specific relevance to environmental sounds, and then develop a Bayesian framework for sound segmentation, which fuses dynamics across multiple features. This probabilistic model possesses the ability to account for non-instantaneous sound onsets and absent or delayed responses among individual features, providing flexibility in defining exactly what constitutes a sound event. Example recordings demonstrate the diversity of our feature set, and the utility of our probabilistic segmentation model in extracting sound events from both indoor and outdoor environments.

workshop on applications of signal processing to audio and acoustics | 2009

Unifying semantic and content-based approaches for retrieval of environmental sounds

Gordon Wichern; Harvey D. Thornburg; Andreas Spanias

Creating a database of user-contributed recordings allows sounds to be linked not only by the semantic tags and labels applied to them, but also to other sounds with similar acoustic characteristics. Of paramount importance in navigating these databases are the problems of retrieving similar sounds using text or sound-based queries, and automatically annotating unlabeled sounds. We propose an integrated system, which can be used for text-based retrieval of unlabeled audio, content-based query-by-example, and automatic annotation. To this end, we introduce an ontological framework where sounds are connected to each other based on a measure of perceptual similarity, while words and sounds are connected by optimizing link weights given user preference data. Results on a freely available database of environmental sounds contributed and labeled by non-expert users, demonstrate effective average precision scores for both the text-based retrieval and annotation tasks.

workshop on applications of signal processing to audio and acoustics | 2007

Distortion-Aware Query-by-Example for Environmental Sounds

Gordon Wichern; Jiachen Xue; Harvey D. Thornburg; Andreas Spanias

There has been much recent progress in the technical infrastructure necessary to continuously characterize and archive all sounds that occur within a given space or human life. Efficient and intuitive access, however, remains a considerable challenge. In other domains, i.e., melody retrieval, query-by-example (QBE) has found considerable success in accessing music that matches a specific query. We propose an extension of the QBE paradigm to the broad class of natural and environmental sounds. These sounds occur frequently in continuous recordings, and are often difficult for humans to imitate. We utilize a probabilistic QBE scheme that is flexible in the presence of time, level, and scale distortions along with a clustering approach to efficiently organize and retrieve the archived audio. Experiments on a test database demonstrate accurate retrieval of archived sounds, whose relevance to example queries is determined by human users.

international conference on acoustics, speech, and signal processing | 2010

Combining semantic, social, and acoustic similarity for retrieval of environmental sounds

Brandon Mechtley; Gordon Wichern; Harvey D. Thornburg; Andreas Spanias

Recent work in audio information retrieval has demonstrated the effectiveness of combining semantic information, such as descriptive, tags with acoustic content. However, these methods largely ignore the possibility of tag queries that do not yet exist in the database and the possibility of similar terms. In this work, we propose a network structure integrating similarity between semantic tags, content-based similarity between environmental audio recordings, and the collective sound descriptions provided by a user community. We then demonstrate the effectiveness of our approach by comparing the use of existing similarity measures for incorporating new vocabulary into an audio annotation and retrieval system.

Digital Signal Processing | 2013

Noise adaptive optimization of matrix initialization for frequency-domain independent component analysis

Makoto Yamada; Gordon Wichern; Kazunobu Kondo; Masashi Sugiyama; Hiroshi Sawada

Initializing an unmixing matrix is an important problem in source separation since an objective function to be optimized is typically non-convex. In this paper, we consider the problem of two-source signal separation from a two-microphone array located on a mobile device, where a point source such as a speech signal is placed in front of the array, while no information is available about another interference signal. We propose a simple and computationally efficient method for estimating the geometry and source type (a point or diffuse) of the interference signal, which allows us to adaptively choose a suitable unmixing matrix initialization scheme. Our proposed method, noise adaptive optimization of matrix initialization (NAOMI), is shown to be effective through source separation simulations.

frontiers in education conference | 2010

Audio content-based feature extraction algorithms using J-DSP for arts, media and engineering courses

Mohit Shah; Gordon Wichern; Andreas Spanias; Harvey D. Thornburg

J-DSP is a java-based object-oriented online programming environment developed at Arizona State University for education and research. This paper presents a collection of interactive Java modules for the purpose of introducing undergraduate and graduate students to feature extraction in music and audio signals. These tools enable online simulations of different algorithms that are being used in applications related to content-based audio classification and Music Information Retrieval (MIR). The simulation software is accompanied by a series of computer experiments and exercises that can be used to provide hands-on training. Specific functions that have been developed include modules used widely such as Pitch Detection, Tonality, Harmonicity, Spectral Centroid and the Mel-Frequency Cepstral Coefficients (MFCC). This effort is part of a combined research and curriculum program funded by NSF CCLI that aims towards exposing students to advanced multidisciplinary concepts and research in signal processing.

international conference on acoustics, speech, and signal processing | 2010

Automatic audio tagging using covariate shift adaptation

Gordon Wichern; Makoto Yamada; Harvey D. Thornburg; Masashi Sugiyama; Andreas Spanias

Automatically annotating or tagging unlabeled audio files has several applications, such as database organization and recommender systems. We are interested in the case where the system is trained using clean high-quality audio files, butmost of the files that need to be automatically tagged during the test phase are heavily compressed and noisy, for instance if they were captured on a mobile device. In this situation we assume the audio files follow a covariate shift model in the acoustic feature space, i.e., the feature distributions are different in the training and test phases, but the conditional distribution of labels given features remains unchanged. Our method uses a specially designed audio similarity measure as input to a set of weighted logistic regressors, which attempt to alleviate the influence of covariate shift. Results on a freely available database of sound files contributed and labeled by non-expert users, demonstrate effective automatic tagging performance.

Eurasip Journal on Audio, Speech, and Music Processing | 2010

An ontological framework for retrieving environmental sounds using semantics and acoustic content

Gordon Wichern; Brandon Mechtley; Alex Fink; Harvey D. Thornburg; Andreas Spanias

Organizing a database of user-contributed environmental sound recordings allows sound files to be linked not only by the semantic tags and labels applied to them, but also to other sounds with similar acoustic characteristics. Of paramount importance in navigating these databases are the problems of retrieving similar sounds using text- or sound-based queries, and automatically annotating unlabeled sounds. We propose an integrated system, which can be used for text-based retrieval of unlabeled audio, content-based query-by-example, and automatic annotation of unlabeled sound files. To this end, we introduce an ontological framework where sounds are connected to each other based on the similarity between acoustic features specifically adapted to environmental sounds, while semantic tags and sounds are connected through link weights that are optimized based on user-provided tags. Furthermore, tags are linked to each other through a measure of semantic similarity, which allows for efficient incorporation of out-of-vocabulary tags, that is, tags that do not yet exist in the database. Results on two freely available databases of environmental sounds contributed and labeled by nonexpert users demonstrate effective recall, precision, and average precision scores for both the text-based retrieval and annotation tasks.

Explore More