Is this you? Create Your Porfile

Elaine Chew

Queen Mary University of London

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Elaine Chew is active.

Explore More

Publication

Featured researches published by Elaine Chew.

acm sigmm workshop on experiential telepresence | 2003

From remote media immersion to Distributed Immersive Performance

Alexander A. Sawchuk; Elaine Chew; Roger Zimmermann; Christos Papadopoulos; Chris Kyriakakis

We present the architecture, technology and experimental applications of a real-time, multi-site, interactive and collaborative environment called Distributed Immersive Performance (DIP). The objective of DIP is to develop the technology for live, interactive musical performances in which the participants - subsets of musicians, the conductor and the audience - are in different physical locations and are interconnected by very high fidelity multichannel audio and video links. DIP is a specific realization of broader immersive technology - the creation of the complete aural and visual ambience that places a person or a group of people in a virtual space where they can experience events occurring at a remote site or communicate naturally regardless of their location. The DIP experimental system has interaction sites and servers in different locations on the USC campus and at several partners, including the New World Symphony of Miami Beach, FL. The sites have different types of equipment to test the effects of video and audio fidelity on the ease of use and functionality for different applications. Many sites have high-definition (HD) video or digital video (DV) quality images projected onto wide screen wall displays completely integrated with an immersive audio reproduction system for a seamless, fully three-dimensional aural environment with the correct spatial sound localization for participants. The system is capable of storage and playback of the many streams of synchronized audio and video data (immersidata), and utilizes novel protocols for the low-latency, seamless, synchronized real-time delivery of immersidata over local area networks and wide-area networks such as Internet2. We discuss several recent interactive experiments using the system and many technical challenges common to the DIP scenario and a broader range of applications. These challenges include: (1). low latency continuous media (CM) stream transmission, synchronization and data loss management; (2). low latency, real-time video and multichannel immersive audio acquisition and rendering; (3). real-time continuous media stream recording, storage, playback; (4). human factors studies: psychophysical, perceptual, artistic, performance evaluation; (5). robust integration of all these technical areas into a seamless presentation to the participants.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

Challenging Uncertainty in Query by Humming Systems: A Fingerprinting Approach

Erdem Unal; Elaine Chew; Panayiotis G. Georgiou; Shrikanth Narayanan

Robust data retrieval in the presence of uncertainty is a challenging problem in multimedia information retrieval. In query-by-humming (QBH) systems, uncertainty can arise in query formulation due to user-dependent variability, such as incorrectly hummed notes, and in query transcription due to machine-based errors, such as insertions and deletions. We propose a fingerprinting (FP) algorithm for representing salient melodic information so as to better compare potentially noisy voice queries with target melodies in a database. The FP technique is employed in the QBH system back end; a hidden Markov model (HMM) front end segments and transcribes the hummed audio input into a symbolic representation. The performance of the FP search algorithm is compared to the conventional edit distance (ED) technique. Our retrieval database is built on 1500 MIDI files and evaluated using 400 hummed samples from 80 people with different musical backgrounds. A melody retrieval accuracy of 88% is demonstrated for humming samples from musically trained subjects, and 70% for samples from untrained subjects, for the FP algorithm. In contrast, the widely used ED method achieves 86% and 62% accuracy rates, respectively, for the same samples, thus suggesting that the proposed FP technique is more robust under uncertainty, particularly for queries by musically untrained users.

international conference on multimedia and expo | 2005

Polyphonic Audio Key Finding Using the Spiral Array CEG Algorithm

Ching-Hua Chuan; Elaine Chew

Key finding is an integral step in content-based music indexing and retrieval. In this paper, we present an O(n) real-time algorithm for determining key from polyphonic audio. We use the standard Fast Fourier Transform with a local maximum detection scheme to extract pitches and pitch strengths from polyphonic audio. Next, we use Chews Spiral Array Center of Effect Generator (CEG) algorithm to determine the key from pitch strength information. We test the proposed system using Mozarts Symphonies. The test data is audio generated from MIDI source. The algorithm achieves a maximum correct key recognition rate of 96% within the first fifteen seconds, and exceeds 90% within the first three seconds. Starting from the extracted pitch strength information, we compare the CEG algorithms performance to the classic Krumhansl-Schmuckler (K-S) probe tone profile method and Temperleys modified version of the K-S method. Correct key recognition rates for the K-S and modified K-S methods remain under 50% in the first three seconds, with maximum values of 80% and 87% respectively within the first fifteen seconds for the same test set. The CEG method consistently scores higher throughout the fifteen-second selections.

Computer Music Journal | 2005

Real-Time Pitch Spelling Using the Spiral Array

Elaine Chew; Yun-Ching Chen

This article describes and presents a real-time bootstrapping algorithm for pitch spelling based on the Spiral Array Model (Chew 2000). Pitch spelling is the process of assigning appropriate pitch names that are consistent with the key context of numeric representations of pitch, such as MIDI or pitch class numbers. The Spiral Array Model is a spatial model for representing pitch relations in the tonal system. It has been shown to be an effective tool for tracking evolving key contexts (Chew 2001, 2002). Our pitch-spelling method derives primarily from a twopart process consisting of the determining of context-defining windows and pitch-name assignment using the Spiral Array. The method assigns the appropriate pitch names without having to first ascertain the key. The Spiral Array Model clusters together closely related pitches and summarizes note content by spatial points in the interior of the structure. These interior points, called centers of effect (CEs), approximate and track the key context for the purpose of pitch spelling. The appropriate letter name is assigned to each pitch through a nearestneighbor search in the Spiral Array space. The algorithms use windows of varying sizes for determining local and long-term tonal contexts using the Spiral Array Model.

computer music modeling and retrieval | 2004

Separating voices in polyphonic music: a contig mapping approach

Elaine Chew; Xiaodan Wu

Voice separation is a critical component of music information retrieval, music analysis and automated transcription systems. We present a contig mapping approach to voice separation based on perceptual principles. The algorithm runs in O(n2) time, uses only pitch height and event boundaries, and requires no user-defined parameters. The method segments a piece into contigs according to voice count, then reconnects fragments in adjacent contigs using a shortest distance strategy. The order of connection is by distance from maximal voice contigs, where the voice ordering is known. This contig-mapping algorithm has been implemented in VoSA, a Java-based voice separation analyzer software. The algorithm performed well when applied to J. S. Bachs Two- and Three-Part Inventions and the forty-eight Fugues from the Well-Tempered Clavier. We report an overall average fragment consistency of 99.75%, correct fragment connection rate of 94.50% and average voice consistency of 88.98%, metrics which we propose to measure voice separation performance.

ACM Transactions on Multimedia Computing, Communications, and Applications | 2008

Distributed musical performances: Architecture and stream management

Roger Zimmermann; Elaine Chew; Sakire Arslan Ay; Moses Pawar

An increasing number of novel applications produce a rich set of different data types that need to be managed efficiently and coherently. In this article we present our experience with designing and implementing a data management infrastructure for a distributed immersive performance (DIP) application. The DIP project investigates a versatile framework for the capture, recording, and replay of video, audio, and MIDI (Musical Instrument Digital Interface) streams in an interactive environment for collaborative music performance. We are focusing on two classes of data streams that are generated within this environment. The first category consists of high-resolution isochronous media streams, namely audio and video. The second class comprises MIDI data produced by electronic instruments. MIDI event sequences are alphanumeric in nature and fall into the category of the data streams that have been of interest to data management researchers in recent years. We present our data management architecture, which provides a repository for all DIP data. Streams of both categories need to be acquired, transmitted, stored, and replayed in real time. Data items are correlated across different streams with temporal indices. The audio and video streams are managed in our own High-performance Data Recording Architecture (HYDRA), which integrates multistream recording and retrieval in a consistent manner. This paper reports on the practical issues and challenges that we encountered during the design, implementation and experimental phases of our prototype. We also present some analysis results and discuss future extensions for the architecture.

Archive | 2003

Mapping Midi to the Spiral Array: Disambiguating Pitch Spellings

Elaine Chew; Yun-Ching Chen

The problem of assigning appropriate pitch spellings is one of the most fundamental problems in the analysis of digital music information. We present an algorithm for finding the optimal spelling based on the Spiral Array model, a geometric model embodying the relations in tonality. The algorithm does not require the key context to be determined. Instead, it uses a center of effect (c.e.), an interior point in the Spiral Array model, as a proxy for the key context. Plausible pitch spellings are measured against this c.e., and the optimal pitch is selected using the nearest neighbor criteria. Two examples are given from Beethoven’s Sonata Op. 109 to illustrate the algorithm. The algorithm is implemented and the results used in MuSA — a music visualization software using the Spiral Array. We present and analyze computational results from test runs on MIDI files of two movements from Beethoven’s Piano Sonatas Op.79 and Op. 109.

new interfaces for musical expression | 2007

Visual feedback in performer-machine interaction for musical improvisation

Alexandre R. J. François; Elaine Chew; Dennis Thurmond

This paper describes the design of Mimi, a multi-modal interactive musical improvisation system that explores the potential and powerful impact of visual feedback in performer-machine interaction. Mimi is a performer-centric tool designed for use in performance and teaching. Its key and novel component is its visual interface, designed to provide the performer with instantaneous and continuous information on the state of the system. For human improvisation, in which context and planning are paramount, the relevant state of the system extends to the near future and recent past. Mimis visual interface allows for a peculiar blend of raw reflex typically associated with improvisation, and preparation and timing more closely affiliated with score-based reading. Mimi is not only an effective improvisation partner, it has also proven itself to be an invaluable platform through which to interrogate the mental models necessary for successful improvisation.

Informs Journal on Computing | 2006

Slicing It All Ways: Mathematical Models for Tonal Induction, Approximation, and Segmentation Using the Spiral Array

Elaine Chew

This paper presents the spiral array model and its associated algorithms for tonal induction, approximation, and segmentation. The spiral array is a geometric model for tonality that clusters perceptually similar tonal entities. The model summarizes music information as interior points inside an array of spirals. Distances in the spiral array space are used to quantify tonal similarity. The paper traces the evolution, and presents general forms, of the existing algorithms for key finding, pitch spelling, and segmentation, and proposes a new O(n) algorithm, Argus, for tonal segmentation. The proposed algorithm computes a value that quantifies the discrepancy between the local contexts in the future and past at each point in time. Discrepancy values exceeding control thresholds are shown to mark the segmentation boundaries of the test set that concur with expert analyses. A number of window sizes and threshold settings are investigated. The algorithm is demonstrated using Edward MacDowells To A Wild Rose and tested on Franz Schuberts Allegretto from Moment Musical D780 No. 6 and Thema from Impromptu D935 No. 4. The algorithm accurately locates tonal boundaries in all three case studies.

multimedia signal processing | 2007

Statistical Modeling and Retrieval of Polyphonic Music

Erdem Unal; Panayiotis G. Georgiou; Shrikanth Narayanan; Elaine Chew

In this article, we propose a solution to the problem of query by example for polyphonic music audio. We first present a generic mid-level representation for audio queries. Unlike previous efforts in the literature, the proposed representation is not dependent on the different spectral characteristics of different musical instruments and the accurate location of note onsets and offsets. This is achieved by first mapping the short term frequency spectrum of consecutive audio frames to the musical space (the spiral array) and defining a tonal identity with respect to center of effect that is generated by the spectral weights of the musical notes. We then use the resulting single dimensional text representations of the audio to create a-gram statistical sequence models to track the tonal characteristics and the behavior of the pieces. After performing appropriate smoothing, we build a collection of melodic n-gram models for testing. Using perplexity-based scoring, we test the likelihood of a sequence of lexical chords (an audio query) given each model in the database collection. Initial results show that, some variations of the input piece appears in the top 5 results 81% of the time for whole melody inputs within a 500 polyphonic melody database. We also tested the retrieval engine for small audio clips. Using 25s segments, variations of the input piece are among the top 5 results 75% of the time.

Explore More