Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Perry R. Cook is active.

Publication


Featured researches published by Perry R. Cook.


IEEE Transactions on Speech and Audio Processing | 2002

Musical genre classification of audio signals

George Tzanetakis; Perry R. Cook

Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. In this paper, the automatic classification of audio signals into an hierarchy of musical genres is explored. More specifically, three feature sets for representing timbral texture, rhythmic content and pitch content are proposed. The performance and relative importance of the proposed features is investigated by training statistical pattern recognition classifiers using real-world audio collections. Both whole file and real-time frame-based classification schemes are described. Using the proposed feature sets, classification of 61% for ten musical genres is achieved. This result is comparable to results reported for human musical genre classification.


Organised Sound | 1999

MARSYAS: a framework for audio analysis

George Tzanetakis; Perry R. Cook

Existing audio tools handle the increasing amount of computer audio data inadequately. The typical tape-recorder paradigm for audio interfaces is inflexible and time consuming, especially for large data sets. On the other hand, completely automatic audio analysis and annotation is impossible using current techniques. Alternative solutions are semi-automatic user interfaces that let users interact with sound in flexible ways based on content. This approach offers significant advantages over manual browsing, annotation and retrieval. Furthermore, it can be implemented using existing techniques for audio content analysis in restricted domains. This paper describes MARSYAS, a framework for experimenting, evaluating and integrating such techniques. As a test for the architecture, some recently proposed techniques have been implemented and tested. In addition, a new method for temporal segmentation based on audio texture is described. This method is combined with audio analysis techniques and used for hierarchical browsing, classification and annotation of audio files.


Attention Perception & Psychophysics | 1996

Memory for musical tempo: Additional evidence that auditory memory is absolute

Daniel J. Levitin; Perry R. Cook

We report evidence that long-term memory retains absolute (accurate) features of perceptual events. Specifically, we show that memory for music seems to preserve the absolute tempo of the musical performance. In Experiment 1, 46 subjects sang two different popular songs from memory, and their tempos were compared with recorded versions of the songs. Seventy-two percent of the productions on two consecutive trials came within 8% of the actual tempo, demonstrating accuracy near the perceptual threshold (JND) for tempo. In Experiment 2, a control experiment, we found that folk songs lacking a tempo standard generally have a large variability in tempo; this counters arguments that memory for the tempo of remembered songs is driven by articulatory constraints. The relevance of the present findings to theories of perceptual memory and memory for music is discussed.


IEEE Computer Graphics and Applications | 2000

Building and using a scalable display wall system

Kai Li; Han Wu Chen; Yuqun Chen; Douglas W. Clark; Perry R. Cook; Stefanos N. Damianakis; Georg Essl; Adam Finkelstein; Thomas A. Funkhouser; T. Housel; Allison W. Klein; Zhiyan Liu; Emil Praun; Jaswinder Pal Singh; B. Shedd; J. Pal; George Tzanetakis; J. Zheng

Princetons scalable display wall project explores building and using a large-format display with commodity components. The prototype system has been operational since March 1998. Our goal is to construct a collaborative space that fully exploits a large-format display system with immersive sound and natural user interfaces. Our prototype system is built with low-cost commodity components: a cluster of PCs, PC graphics accelerators, consumer video and sound equipment, and portable presentation projectors. This approach has the advantages of low cost and of tracking technology well, as high-volume commodity components typically have better price-performance ratios and improve at faster rates than special-purpose hardware. We report our early experiences in building and using the display wall system. In particular, we describe our approach to research challenges in several specific research areas, including seamless tiling, parallel rendering, parallel data visualization, parallel MPEG decoding, layered multiresolution video input, multichannel immersive sound, user interfaces, application tools, and content creation.


international conference on computer graphics and interactive techniques | 2001

Synthesizing sounds from physically based motion

James F. O'Brien; Perry R. Cook; Georg Essl

This paper describes a technique for approximating sounds that are generated by the motions of solid objects. The technique builds on previous work in the field of physically based animation that uses deformable models to simulate the behavior of the solid objects. As the motions of the objects are computed, their surfaces are analyzed to determine how the motion will induce acoustic pressure waves in the surrounding medium. Our technique computes the propagation of those waves to the listener and then uses the results to generate sounds corresponding to the behavior of the simulated objects.


workshop on applications of signal processing to audio and acoustics | 1999

Multifeature audio segmentation for browsing and annotation

George Tzanetakis; Perry R. Cook

Indexing and content-based retrieval are necessary to handle the large amounts of audio and multimedia data that is becoming available on the Web and elsewhere. Since manual indexing using existing audio editors is extremely time consuming a number of automatic content analysis systems have been proposed. Most of these systems rely on speech recognition techniques to create text indices. On the other hand, very few systems have been proposed for automatic indexing of music and general audio. Typically these systems rely on classification and similarity-retrieval techniques and work in restricted audio domains. A somewhat different, more general approach for fast indexing of arbitrary audio data is the use of segmentation based on multiple temporal features combined with automatic or semi-automatic annotation. In this paper, a general methodology for audio segmentation is proposed. A number of experiments were performed to evaluate the proposed methodology and compare different segmentation schemes. Finally, a prototype audio browsing and annotation tool based on segmentation combined with existing classification techniques was implemented.


Journal of the Acoustical Society of America | 1999

Music, cognition, and computerized sound: an introduction to psychoacoustics

Perry R. Cook

The MIT Press, Cambridge, 1999. xi+372 pp.+CD-ROM (formatted for Macintosh, for Windows 95, 98, and NT, and for Unix). Price:


new interfaces for musical expression | 2004

On-the-fly programming: using code as an expressive musical instrument

Ge Wang; Perry R. Cook

60.000 hardcover.


Journal of New Music Research | 2003

Pitch Histograms in Audio and Symbolic Music Information Retrieval

George Tzanetakis; Andrey Ermolinskyi; Perry R. Cook

On-the-fly programming is a style of programming in which the programmer/performer/composer augments and modifies the program while it is running, without stopping or restarting, in order to assert expressive, programmable control at runtime. Because of the fundamental powers of programming languages, we believe the technical and aesthetic aspects of on-the-fly programming are worth exploring.In this paper, we present a formalized framework for on-the-fly programming, based on the ChucK synthesis language, which supports a truly concurrent audio programming model with sample-synchronous timing, and a highly on-the-fly style of programming. We first provide a well-defined notion of on-the-fly programming. We then address four fundamental issues that confront the on-the-fly programmer: timing, modularity, conciseness, and flexibility. Using the features and properties of ChucK, we show how it solves many of these issues. In this new model, we show that (1) concurrency provides natural modularity for on-the-fly programming, (2) the timing mechanism in ChucK guarantees on-the-fly precision and consistency, (3) the Chuck syntax improves conciseness, and (4) the overall system is a useful framework for exploring on-the-fly programming. Finally, we discuss the aesthetics of on-the-fly performance.


IEEE Computer Graphics and Applications | 2005

Tools and applications for large-scale display walls

Grant Wallace; Peng Bi; Han Chen; Yuqun Chen; Douglas W. Clark; Perry R. Cook; Adam Finkelstein; Thomas A. Funkhouser; Anoop Gupta; Matthew A. Hibbs; Kai Li; Zhiyan Liu; Rudrajit Samanta; Rahul Sukthankar; Olga G. Troyanskaya

In order to represent musical content, pitch and timing information is utilized in the majority of existing work in Symbolic Music Information Retrieval (MIR). Symbolic representations such as MIDI allow the easy calculation of such information and its manipulation. In contrast, most of the existing work in Audio MIR uses timbral and beat information, which can be calculated using automatic computer audition techniques. In this paper, Pitch Histograms are defined and proposed as a way to represent the pitch content of music signals both in symbolic and audio form. This representation is evaluated in the context of automatic musical genre classification. A multiple-pitch detection algorithm for polyphonic signals is used to calculate Pitch Histograms for audio signals. In order to evaluate the extent and significance of errors resulting from the automatic multiple-pitch detection, automatic musical genre classification results from symbolic and audio data are compared. The comparison indicates that Pitch Histograms provide valuable information for musical genre classification. The results obtained for both symbolic and audio cases indicate that although pitch errors degrade classification performance for the audio case, Pitch Histograms can be effectively used for classification in both cases.

Collaboration


Dive into the Perry R. Cook's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Georg Essl

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ajay Kapur

California Institute of the Arts

View shared research outputs
Top Co-Authors

Avatar

Xiaojuan Ma

Hong Kong University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge