Michael A. Casey | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael A. Casey is active.

Explore More

Publication

Featured researches published by Michael A. Casey.

Proceedings of the IEEE | 2008

Content-Based Music Information Retrieval: Current Directions and Future Challenges

Michael A. Casey; Remco C. Veltkamp; Masataka Goto; Marc Leman; Christophe Rhodes; Malcolm Slaney

The steep rise in music downloading over CD sales has created a major shift in the music industry away from physical media formats and towards online products and services. Music is one of the most popular types of online information and there are now hundreds of music streaming and download services operating on the World-Wide Web. Some of the music collections available are approaching the scale of ten million tracks and this has posed a major challenge for searching, retrieving, and organizing music content. Research efforts in music information retrieval have involved experts from music perception, cognition, musicology, engineering, and computer science engaged in truly interdisciplinary activity that has resulted in many proposed algorithmic and methodological solutions to music search using content-based methods. This paper outlines the problems of content-based music information retrieval and explores the state-of-the-art methods using audio cues (e.g., query by humming, audio fingerprinting, content-based music retrieval) and other cues (e.g., music notation and symbolic representation), and identifies some of the major challenges for the coming years.

IEEE Signal Processing Magazine | 2008

Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes]

Malcolm Slaney; Michael A. Casey

This lecture note describes a technique known as locality-sensitive hashing (LSH) that allows one to quickly find similar entries in large databases. This approach belongs to a novel and interesting class of algorithms that are known as randomized algorithms. A randomized algorithm does not guarantee an exact answer but instead provides a high probability guarantee that it will return the correct answer or one close to it. By investing additional computational effort, the probability can be pushed as high as desired.

IEEE Transactions on Circuits and Systems for Video Technology | 2001

MPEG-7 sound-recognition tools

Michael A. Casey

The MPEG-7 sound-recognition descriptors and description schemes consist of tools for indexing audio media using probabilistic sound models. The descriptors provide containers for category labels, as well as data structures for quantitative information about sound content. We describe the normative tools, as well as informative methods, for automatic description extraction and sound matching.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

Analysis of Minimum Distances in High-Dimensional Musical Spaces

Michael A. Casey; Christophe Rhodes; Malcolm Slaney

We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles.

Organised Sound | 2001

General sound classification and similarity in MPEG-7

Michael A. Casey

We introduce a system for generalised sound classification and similarity using a machine-learning framework. Applications of the system include automatic classification of environmental sounds, musical instruments, music genre and human speakers. In addition to classification, the system may also be used for computing similarity metrics between a target sound and other sounds in a database. We discuss the use of hidden Markov models for representing the temporal evolution of audio spectra and present results of testing the system on classification and retrieval tasks. The system has been incorporated into the MPEG-7 international standard for multimedia content description and is therefore publicly available in the form of a set of standardised interfaces and software reference tools for developers and researchers.

international conference on acoustics, speech, and signal processing | 2006

Extraction of High-Level Musical Structure From Audio Data and Its Application to Thumbnail Generation

Mark Levy; Mark B. Sandler; Michael A. Casey

A method for segmenting musical audio with a hierarchical timbre model is introduced. New evidence is presented to show that music segmentation can be recast as clustering of timbre features, and a new clustering algorithm is described. A prototype thumbnail-generating application is described and evaluated. Experimental results are given, including comparison of machine and human segmentations

Proceedings of the National Academy of Sciences of the United States of America | 2013

Music and movement share a dynamic structure that supports universal expressions of emotion

Beau Sievers; Larry Polansky; Michael A. Casey; Thalia Wheatley

Music moves us. Its kinetic power is the foundation of human behaviors as diverse as dance, romance, lullabies, and the military march. Despite its significance, the music-movement relationship is poorly understood. We present an empirical method for testing whether music and movement share a common structure that affords equivalent and universal emotional expressions. Our method uses a computer program that can generate matching examples of music and movement from a single set of features: rate, jitter (regularity of rate), direction, step size, and dissonance/visual spikiness. We applied our method in two experiments, one in the United States and another in an isolated tribal village in Cambodia. These experiments revealed three things: (i) each emotion was represented by a unique combination of features, (ii) each combination expressed the same emotion in both music and movement, and (iii) this common structure between music and movement was evident within and across cultures.

Presence: Teleoperators & Virtual Environments | 1997

Diamond park and spline: Social virtual reality with 3d animation, spoken interaction, and runtime extendability

Richard C. Waters; David B. Anderson; John W. Barrus; David C. Brogan; Michael A. Casey; Stephan G. Mckeown; T. Nitta; Ilene B. Sterns; William S. Yerazunis

Diamond Park is a social virtual reality system in which multiple geographically separated users can speak to each other and participate in joint activities. The central theme of the park is cycling. Human visitors to the park are represented by 3D animated avatars and can explore a square mile of 3D terrain. In addition to human visitors, the park hosts a number of computer simulations, including tour buses and autonomous animated figures. Diamond Park is implemented using a software platform called Spline, which makes it easy to build virtual worlds where multiple people interact with each other and with computer simulations in a 3D visual and audio environment. Spline performs all the processing necessary to maintain a distributed, modifiable, and extendable model of a virtual world that is shared between the participants. For more information visit http://www.merl.com.

international conference on acoustics, speech, and signal processing | 2006

The Importance of Sequences in Musical Similarity

Michael A. Casey; Malcolm Slaney

This paper demonstrates the importance of temporal sequences for passage-level music information retrieval. A number of audio analysis problems are solved successfully by using models that throw away the temporal sequence data. This paper suggests that we do not have this luxury when we consider a more difficult problem: that is finding musically similar passages within a narrow range of musical styles or within a single musical piece. Our results demonstrate a significant improvement in performance for audio similarity measures using temporal sequences of features, and we show that quantizing the features to string-based representations also performs well, thus admitting efficient implementations based on string matching

IEEE Spectrum | 1997

The sound dimension

David B. Anderson; Michael A. Casey

Although the spotlight of virtual reality research has been on providing views of simulated scenes and objects, some researchers have chosen to study how to fool other senses: hearing, touch, and even smell, into perceiving what is not there. They have good reason: the virtual environments that are best at stimulating multiple senses are also best at evoking a feeling of presence and immersion. Next to sight, hearing is the sense on which people rely the most. So sounds, too, can play an extremely critical role in a distributed virtual environment (DVE). The virtual reality (VR) experience is more satisfying when sound adds to or reinforces other DVE information. The paper discusses the variety of sound in VR systems and considers the selection of software and hardware for these uses of audio in DVE systems.

Explore More