Michael I. Mandel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael I. Mandel is active.

Explore More

Publication

Featured researches published by Michael I. Mandel.

international symposium/conference on music information retrieval | 2005

Song-Level Features and Support Vector Machines for Music Classification

Michael I. Mandel; Daniel P. W. Ellis

Searching and organizing growing digital music collections requires automatic classification of music. This paper describes a new system, tested on the task of artist identification, that uses support vector machines to classify songs based on features calculated over their entire lengths. Since support vector machines are exemplarbased classifiers, training on and classifying entire songs instead of short-time features makes intuitive sense. On a dataset of 1200 pop songs performed by 18 artists, we show that this classifier outperforms similar classifiers that use only SVMs or song-level features. We also show that the KL divergence between single Gaussians and Mahalanobis distance between MFCC statistics vectors perform comparably when classifiers are trained and tested on separate albums, but KL divergence outperforms Mahalanobis distance when trained and tested on songs from the same albums.

Journal of New Music Research | 2008

A Web-Based Game for Collecting Music Metadata

Michael I. Mandel; Daniel P. W. Ellis

Abstract We have designed a web-based game, MajorMiner, that makes collecting descriptions of musical excerpts fun, easy, useful, and objective. Participants describe 10 second clips of songs and score points when their descriptions match those of other participants. The rules were designed to encourage players to be thorough and the clip length was chosen to make judgments objective and specific. To analyse the data, we measured the degree to which binary classifiers could be trained to spot popular tags. We also compared the performance of clip classifiers trained with MajorMiners tag data to those trained with social tag data from a popular website. On the top 25 tags from each source, MajorMiners tags were classified correctly 67.2% of the time, while the social tags were classified correctly 62.6% of the time.

Multimedia Systems | 2006

Support vector machine active learning for music retrieval

Michael I. Mandel; Graham E. Poliner; Daniel P. W. Ellis

Searching and organizing growing digital music collections requires a computational model of music similarity. This paper describes a system for performing flexible music similarity queries using SVM active learning. We evaluated the success of our system by classifying 1210 pop songs according to mood and style (from an online music guide) and by the performing artist. In comparing a number of representations for songs, we found the statistics of mel-frequency cepstral coefficients to perform best in precision-at-20 comparisons. We also show that by choosing training examples intelligently, active learning requires half as many labeled examples to achieve the same accuracy as a standard scheme.

Proceedings of the IEEE | 2008

Active Learning for Interactive Multimedia Retrieval

Thomas S. Huang; Charlie K. Dagli; Shyamsundar Rajaram; Edward Y. Chang; Michael I. Mandel; Graham E. Poliner; Daniel P. W. Ellis

As the first decade of the 21st century comes to a close, growth in multimedia delivery infrastructure and public demand for applications built on this backbone are converging like never before. The push towards reaching truly interactive multimedia technologies becomes stronger as our media consumption paradigms continue to change. In this paper, we profile a technology leading the way in this revolution: active learning. Active learning is a strategy that helps alleviate challenges inherent in multimedia information retrieval through user interaction. We show how active learning is ideally suited for the multimedia information retrieval problem by giving an overview of the paradigm and component technologies used with special attention given to the application scenarios in which these technologies are useful. Finally, we give insight into the future of this growing field and how it fits into the larger context of multimedia information retrieval.

international symposium conference on music information retrieval | 2008

MULTIPLE-INSTANCE LEARNING FOR MUSIC INFORMATION RETRIEVAL

Michael I. Mandel; Daniel P. W. Ellis

Multiple-instance learning algorithms train classifiers from lightly supervised data, i.e. labeled collections of items, rather than labeled items. We compare the multiple-instance learners mi-SVM and MILES on the task of classifying 10second song clips. These classifiers are trained on tags at the track, album, and artist levels, or granularities, that have been derived from tags at the clip granularity, allowing us to test the effectiveness of the learners at recovering the clip labeling in the training set and predicting the clip labeling for a held-out test set. We find that mi-SVM is better than a control at the recovery task on training clips, with an average classification accuracy as high as 87% over 43 tags; on test clips, it is comparable to the control with an average classification accuracy of up to 68%. MILES performed adequately on the recovery task, but poorly on the test clips.

neural information processing systems | 2006

An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments

Michael I. Mandel; Daniel P. W. Ellis; Tony Jebara

We present a method for localizing and separating sound sources in stereo recordings that is robust to reverberation and does not make any assumptions about the source statistics. The method consists of a probabilistic model of binaural multi-source recordings and an expectation maximization algorithm for finding the maximum likelihood parameters of that model. These parameters include distributions over delays and assignments of time-frequency regions to sources. We evaluate this method against two comparable algorithms on simulations of simultaneous speech from two or three sources. Our method outperforms the others in anechoic conditions and performs as well as the better of the two in the presence of reverberation.

international conference on acoustics, speech, and signal processing | 2008

Cross-correlation of beat-synchronous representations for music similarity

Daniel P. W. Ellis; Courtenay Valentine Cotton; Michael I. Mandel

Systems to predict human judgments of music similarity directly from the audio have generally been based on the global statistics of spectral feature vectors i.e. collapsing any large-scale temporal structure in the data. Based on our work in identifying alternative (cover) versions of pieces, we investigate using direct correlation of beat-synchronous representations of music audio to find segments that are similar not only in feature statistics, but in the relative positioning of those features in tempo-normalized time. Given a large enough search database, good matches by this metric should have very high perceived similarity to query items. We evaluate our system through a listening test in which subjects rated system-generated matches as similar or not similar, and compared results to a more conventional timbral and rhythmic similarity baseline, and to random selections.

workshop on applications of signal processing to audio and acoustics | 2007

EM Localization and Separation using Interaural Level and Phase Cues

Michael I. Mandel; Daniel P. W. Ellis

We describe a system for localizing and separating multiple sound sources from a reverberant two-channel recording. It consists of a probabilistic model of interaural level and phase differences and an EM algorithm for finding the maximum likelihood parameters of this model. By assigning points in the interaural spectrogram probabilistically to sources with the best-fitting parameters and then estimating the parameters of the sources from the points assigned to them, the system is able to separate and localize more sound sources than there are available channels. It is also able to estimate frequency-dependent level differences of sources in a mixture that correspond well to those measured in isolation. In experiments in simulated anechoic and reverberant environments, the proposed system improved the signal-to-noise ratio of target sources by 2.7 and 3.4dB more than two comparable algorithms on average.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

Evaluating Source Separation Algorithms With Reverberant Speech

Michael I. Mandel; Scott Bressler; Barbara G. Shinn-Cunningham; Daniel P. W. Ellis

This paper examines the performance of several source separation systems on a speech separation task for which human intelligibility has previously been measured. For anechoic mixtures, automatic speech recognition (ASR) performance on the separated signals is quite similar to human performance. In reverberation, however, while signal separation has some benefit for ASR, the results are still far below those of human listeners facing the same task. Performing this same experiment with a number of oracle masks created with a priori knowledge of the separated sources motivates a new objective measure of separation performance, the Direct-path, Early echo, and Reverberation, of the Target and Masker (DERTM), which is closely related to the ASR results. This measure indicates that while the non-oracle algorithms successfully reject the direct-path signal from the masking source, they reject less of its reverberation, explaining the disappointing ASR performance.

acm multimedia | 2011

Contextual tag inference

Michael I. Mandel; Razvan Pascanu; Douglas Eck; Yoshua Bengio; Luca Maria Aiello; Rossano Schifanella; Filippo Menczer

This article examines the use of two kinds of context to improve the results of content-based music taggers: the relationships between tags and between the clips of songs that are tagged. We show that users agree more on tags applied to clips temporally “closer” to one another; that conditional restricted Boltzmann machine models of tags can more accurately predict related tags when they take context into account; and that when training data is “smoothed” using context, support vector machines can better rank these clips according to the original, unsmoothed tags and do this more accurately than three standard multi-label classifiers.

Explore More