George Tzanetakis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where George Tzanetakis is active.

Explore More

Publication

Featured researches published by George Tzanetakis.

IEEE Transactions on Speech and Audio Processing | 2002

Musical genre classification of audio signals

George Tzanetakis; Perry R. Cook

Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. In this paper, the automatic classification of audio signals into an hierarchy of musical genres is explored. More specifically, three feature sets for representing timbral texture, rhythmic content and pitch content are proposed. The performance and relative importance of the proposed features is investigated by training statistical pattern recognition classifiers using real-world audio collections. Both whole file and real-time frame-based classification schemes are described. Using the proposed feature sets, classification of 61% for ten musical genres is achieved. This result is comparable to results reported for human musical genre classification.

Organised Sound | 1999

MARSYAS: a framework for audio analysis

George Tzanetakis; Perry R. Cook

Existing audio tools handle the increasing amount of computer audio data inadequately. The typical tape-recorder paradigm for audio interfaces is inflexible and time consuming, especially for large data sets. On the other hand, completely automatic audio analysis and annotation is impossible using current techniques. Alternative solutions are semi-automatic user interfaces that let users interact with sound in flexible ways based on content. This approach offers significant advantages over manual browsing, annotation and retrieval. Furthermore, it can be implemented using existing techniques for audio content analysis in restricted domains. This paper describes MARSYAS, a framework for experimenting, evaluating and integrating such techniques. As a test for the architecture, some recently proposed techniques have been implemented and tested. In addition, a new method for temporal segmentation based on audio texture is described. This method is combined with audio analysis techniques and used for hierarchical browsing, classification and annotation of audio files.

workshop on applications of signal processing to audio and acoustics | 2003

Polyphonic audio matching and alignment for music retrieval

Ning Hu; Roger B. Dannenberg; George Tzanetakis

We describe a method that aligns polyphonic audio recordings of music to symbolic score information in standard MIDI files without the difficult process of polyphonic transcription. By using this method, we can search through a MIDI database to find the MIDI file corresponding to a polyphonic audio recording.

IEEE Computer Graphics and Applications | 2000

Building and using a scalable display wall system

Kai Li; Han Wu Chen; Yuqun Chen; Douglas W. Clark; Perry R. Cook; Stefanos N. Damianakis; Georg Essl; Adam Finkelstein; Thomas A. Funkhouser; T. Housel; Allison W. Klein; Zhiyan Liu; Emil Praun; Jaswinder Pal Singh; B. Shedd; J. Pal; George Tzanetakis; J. Zheng

Princetons scalable display wall project explores building and using a large-format display with commodity components. The prototype system has been operational since March 1998. Our goal is to construct a collaborative space that fully exploits a large-format display system with immersive sound and natural user interfaces. Our prototype system is built with low-cost commodity components: a cluster of PCs, PC graphics accelerators, consumer video and sound equipment, and portable presentation projectors. This approach has the advantages of low cost and of tracking technology well, as high-volume commodity components typically have better price-performance ratios and improve at faster rates than special-purpose hardware. We report our early experiences in building and using the display wall system. In particular, we describe our approach to research challenges in several specific research areas, including seamless tiling, parallel rendering, parallel data visualization, parallel MPEG decoding, layered multiresolution video input, multichannel immersive sound, user interfaces, application tools, and content creation.

affective computing and intelligent interaction | 2005

Gesture-Based affective computing on motion capture data

Asha Kapur; Ajay Kapur; Naznin Virji-Babul; George Tzanetakis; Peter F. Driessen

This paper presents research using full body skeletal movements captured using video-based sensor technology developed by Vicon Motion Systems, to train a machine to identify different human emotions. The Vicon system uses a series of 6 cameras to capture lightweight markers placed on various points of the body in 3D space, and digitizes movement into x, y, and z displacement data. Gestural data from five subjects was collected depicting four emotions: sadness, joy, anger, and fear. Experimental results with different machine learning techniques show that automatic classification of this data ranges from 84% to 92% depending on how it is calculated. In order to put these automatic classification results into perspective a user study on the human perception of the same data was conducted with average classification accuracy of 93%.

workshop on applications of signal processing to audio and acoustics | 1999

Multifeature audio segmentation for browsing and annotation

George Tzanetakis; Perry R. Cook

Indexing and content-based retrieval are necessary to handle the large amounts of audio and multimedia data that is becoming available on the Web and elsewhere. Since manual indexing using existing audio editors is extremely time consuming a number of automatic content analysis systems have been proposed. Most of these systems rely on speech recognition techniques to create text indices. On the other hand, very few systems have been proposed for automatic indexing of music and general audio. Typically these systems rely on classification and similarity-retrieval techniques and work in restricted audio domains. A somewhat different, more general approach for fast indexing of arbitrary audio data is the use of segmentation based on multiple temporal features combined with automatic or semi-automatic annotation. In this paper, a general methodology for audio segmentation is proposed. A number of experiments were performed to evaluate the proposed methodology and compare different segmentation schemes. Finally, a prototype audio browsing and annotation tool based on segmentation combined with existing classification techniques was implemented.

international conference on acoustics, speech, and signal processing | 2000

Sound analysis using MPEG compressed audio

George Tzanetakis; F. Cook

There is a huge amount of audio data available that is compressed using the MPEG audio compression standard. Sound analysis is based on the computation of short time feature vectors that describe the instantaneous spectral content of the sound. An interesting possibility is the calculation of features directly from compressed data. Since the bulk of the feature calculation is performed during the encoding stage this process has a significant performance advantage if the available data is compressed. Combining decoding and analysis in one stage is also very important for audio streaming applications. In this paper, we describe the calculation of features directly from MPEG audio compressed data. Two of the basic processes of analyzing sound are: segmentation and classification. To illustrate the effectiveness of the calculated features we have implemented two case studies: a general audio segmentation algorithm and a music/speech classifier. Experimental data is provided to show that the results obtained are comparable with sound analysis algorithms working directly with audio samples.

acm multimedia | 2009

Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs

Steven R. Ness; Anthony Theocharis; George Tzanetakis; Luis Gustavo Martins

Music listeners frequently use words to describe music. Personalized music recommendation systems such as Last.fm and Pandora rely on manual annotations (tags) as a mechanism for querying and navigating large music collections. A well-known issue in such recommendation systems is known as the cold-start problem: it is not possible to recommend new songs/tracks until those songs/tracks have been manually annotated. Automatic tag annotation based on content analysis is a potential solution to this problem and has recently been gaining attention. We describe how stacked generalization can be used to improve the performance of a state-of-the-art automatic tag annotation system for music based on audio content analysis and report results on two publicly available datasets.

workshop on applications of signal processing to audio and acoustics | 2003

Factors in automatic musical genre classification of audio signals

Tao Li; George Tzanetakis

Automatic musical genre classification is an important tool for organizing the large collections of music that are becoming available to the average user. In addition, it provides a structured way of evaluating musical content features that does not require extensive user studies. The paper provides a detailed comparative analysis of various factors affecting automatic classification performance, such as choice of features and classifiers. Using recent machine learning techniques, such as support vector machines, we improve on previously published results using identical data collections and features.

Journal of New Music Research | 2003

Pitch Histograms in Audio and Symbolic Music Information Retrieval

George Tzanetakis; Andrey Ermolinskyi; Perry R. Cook

In order to represent musical content, pitch and timing information is utilized in the majority of existing work in Symbolic Music Information Retrieval (MIR). Symbolic representations such as MIDI allow the easy calculation of such information and its manipulation. In contrast, most of the existing work in Audio MIR uses timbral and beat information, which can be calculated using automatic computer audition techniques. In this paper, Pitch Histograms are defined and proposed as a way to represent the pitch content of music signals both in symbolic and audio form. This representation is evaluated in the context of automatic musical genre classification. A multiple-pitch detection algorithm for polyphonic signals is used to calculate Pitch Histograms for audio signals. In order to evaluate the extent and significance of errors resulting from the automatic multiple-pitch detection, automatic musical genre classification results from symbolic and audio data are compared. The comparison indicates that Pitch Histograms provide valuable information for musical genre classification. The results obtained for both symbolic and audio cases indicate that although pitch errors degrade classification performance for the audio case, Pitch Histograms can be effectively used for classification in both cases.

Explore More