Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Matthias Mauch is active.

Publication


Featured researches published by Matthias Mauch.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Simultaneous Estimation of Chords and Musical Context From Audio

Matthias Mauch; Simon Dixon

Chord labels provide a concise description of musical harmony. In pop and jazz music, a sequence of chord labels is often the only written record of a song, and forms the basis of so-called lead sheets. We devise a fully automatic method to simultaneously estimate from an audio waveform the chord sequence including bass notes, the metric positions of chords, and the key. The core of the method is a six-layered dynamic Bayesian network, in which the four hidden source layers jointly model metric position, key, chord, and bass pitch class, while the two observed layers model low-level audio features corresponding to bass and treble tonal content. Using 109 different chords our method provides substantially more harmonic detail than previous approaches while maintaining a high level of accuracy. We show that with 71% correctly classified chords our method significantly exceeds the state of the art when tested against manually annotated ground truth transcriptions on the 176 audio tracks from the MIREX 2008 Chord Detection Task. We introduce a measure of segmentation quality and show that bass and meter modeling are especially beneficial for obtaining the correct level of granularity.


international conference on acoustics, speech, and signal processing | 2014

PYIN: A fundamental frequency estimator using probabilistic threshold distributions

Matthias Mauch; Simon Dixon

We propose the Probabilistic YIN (PYIN) algorithm, a modification of the well-known YIN algorithm for fundamental frequency (F0) estimation. Conventional YIN is a simple yet effective method for frame-wise monophonic F0 estimation and remains one of the most popular methods in this domain. In order to eliminate short-term errors, outputs of frequency estimators are usually post-processed resulting in a smoother pitch track. One shortcoming of YIN is that such post-processing cannot fall back on alternative interpretations of the signal because the method outputs precisely one estimate per frame. To address this problem we modify YIN to output multiple pitch candidates with associated probabilities (PYIN Stage 1). These probabilities arise naturally from a prior distribution on the YIN threshold parameter. We use these probabilities as observations in a hidden Markov model, which is Viterbi-decoded to produce an improved pitch track (PYIN Stage 2). We demonstrate that the combination of Stages 1 and 2 raises recall and precision substantially. The additional computational complexity of PYIN over YIN is low. We make the method freely available online1 as an open source C++ library for Vamp hosts.


Royal Society Open Science | 2015

The evolution of popular music: USA 1960–2010

Matthias Mauch; Robert M. MacCallum; Mark Levy; Armand M. Leroi

In modern societies, cultural change seems ceaseless. The flux of fashion is especially obvious for popular music. While much has been written about the origin and evolution of pop, most claims about its history are anecdotal rather than scientific in nature. To rectify this, we investigate the US Billboard Hot 100 between 1960 and 2010. Using music information retrieval and text-mining tools, we analyse the musical properties of approximately 17 000 recordings that appeared in the charts and demonstrate quantitative trends in their harmonic and timbral properties. We then use these properties to produce an audio-based classification of musical styles and study the evolution of musical diversity and disparity, testing, and rejecting, several classical theories of cultural change. Finally, we investigate whether pop musical evolution has been gradual or punctuated. We show that, although pop music has evolved continuously, it did so with particular rapidity during three stylistic ‘revolutions’ around 1964, 1983 and 1991. We conclude by discussing how our study points the way to a quantitative science of cultural change.We investigate the association between musical chords and lyrics by analyzing a large dataset of user-contributed guitar tablatures. Motivated by the idea that the emotional content of chords is reflected in the words used in corresponding lyrics, we analyze associations between lyrics and chord categories. We also examine the usage patterns of chords and lyrics in different musical genres, historical eras, and geographical regions. Our overall results confirms a previously known association between Major chords and positive valence. We also report a wide variation in this association across regions, genres, and eras. Our results suggest possible existence of different emotional associations for other types of chords.


international conference on acoustics, speech, and signal processing | 2010

High precision frequency estimation for harpsichord tuning classification

Dan Tidhar; Matthias Mauch; Simon Dixon

We present a novel music signal processing task of classifying the tuning of a harpsichord from audio recordings of standard musical works. We report the results of a classification experiment involving six different temperaments, using real harpsichord recordings as well as synthesised audio data. We introduce the concept of conservative transcription, and show that existing high-precision pitch estimation techniques are sufficient for our task if combined with conservative transcription. In particular, using the CQIFFT algorithm with conservative transcription and removal of short duration notes, we are able to distinguish between 6 different temperaments of harpsichord recordings with 96% accuracy (100% for synthetic data).


computer music modeling and retrieval | 2010

Probabilistic and logic-based modelling of harmony

Simon Dixon; Matthias Mauch; Amélie Anglade

Many computational models of music fail to capture essential aspects of the high-level musical structure and context, and this limits their usefulness, particularly for musically informed users. We describe two recent approaches to modelling musical harmony, using a probabilistic and a logic-based framework respectively, which attempt to reduce the gap between computational models and human understanding of music. The first is a chord transcription system which uses a high-level model of musical context in which chord, key, metrical position, bass note, chroma features and repetition structure are integrated in a Bayesian framework, achieving state-of-the-art performance. The second approach uses inductive logic programming to learn logical descriptions of harmonic sequences which characterise particular styles or genres. Each approach brings us one step closer to modelling music in the way it is conceptualised by musicians.


Journal of the Acoustical Society of America | 2014

Intonation in unaccompanied singing: Accuracy, drift, and a model of reference pitch memory

Matthias Mauch; Klaus Frieler; Simon Dixon

This paper presents a study on intonation and intonation drift in unaccompanied singing, and proposes a simple model of reference pitch memory that accounts for many of the effects observed. Singing experiments were conducted with 24 singers of varying ability under three conditions (Normal, Masked, Imagined). Over the duration of a recording, ∼50 s, a median absolute intonation drift of 11 cents was observed. While smaller than the median note error (19 cents), drift was significant in 22% of recordings. Drift magnitude did not correlate with other measures of singing accuracy, singing experience, or the presence of conditions tested. Furthermore, it is shown that neither a static intonation memory model nor a memoryless interval-based intonation model can account for the accuracy and drift behavior observed. The proposed causal model provides a better explanation as it treats the reference pitch as a changing latent variable.


Journal of the Acoustical Society of America | 2012

Estimation of harpsichord inharmonicity and temperament from musical recordingsa)

Simon Dixon; Matthias Mauch; Dan Tidhar

The inharmonicity of vibrating strings can easily be estimated from recordings of isolated tones. Likewise, the tuning system (temperament) of a keyboard instrument can be ascertained from isolated tones by estimating the fundamental frequencies corresponding to each key of the instrument. This paper addresses a more difficult problem: the automatic estimation of the inharmonicity and temperament of a harpsichord given only a recording of an unknown musical work. An initial conservative transcription is used to generate a list of note candidates, and high-precision frequency estimation techniques and robust statistics are employed to estimate the inharmonicity and fundamental frequency of each note. These estimates are then matched to a set of known keyboard temperaments, allowing for variation in the tuning reference frequency, in order to obtain the temperament used in the recording. Results indicate that it is possible to obtain inharmonicity estimates and to classify keyboard temperament automatically from audio recordings of standard musical works, to the extent of accurately (96%) distinguishing between six different temperaments commonly used in harpsichord recordings. Although there is an interaction between inharmonicity and temperament, this is shown to be minor relative to the tuning accuracy.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Sequential complexity as a descriptor for musical similarity

Peter Foster; Matthias Mauch; Simon Dixon

We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantized audio features, using multiple temporal resolutions and quantization granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15 500 track excerpts of Western popular music, for which we obtain 7 800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.


international conference on acoustics, speech, and signal processing | 2015

Modelling the decay of piano sounds

Tian Cheng; Simon Dixon; Matthias Mauch

We investigate piano acoustics and compare the theoretical temporal decay of individual partials to recordings of real-world piano notes from the RWC Music Database. We first describe the theory behind double decay and beats, known phenomena caused by the interaction between strings and soundboard. Then we fit the decay of the first 30 partials to a standard linear model and two physically-motivated non-linear models that take into account the coupling of strings and soundboard. We show that the use of non-linear models provides a better fit to the data. We use these estimated decay rates to parameterise the characteristic decay response (decay rates along frequencies) of the piano under investigation. The results also show that dynamics have no significant effect on the decay rate.


european signal processing conference | 2015

Improving piano note tracking by HMM smoothing

Tian Cheng; Simon Dixon; Matthias Mauch

In this paper we improve piano note tracking using a Hidden Markov Model (HMM). We first transcribe piano music based on a non-negative matrix factorisation (NMF) method. For each note four templates are trained to represent the different stages of piano sounds: silence, attack, decay and release. Then a four-state HMM is employed to track notes on the gains of each pitch. We increase the likelihood of staying in silence for low pitches and set a minimum duration to reduce short false-positive notes. For quickly repeated notes, we allow the note state to transition from decay directly back to attack. The experiments tested on 30 piano pieces from the MAPS dataset shows promising results for both frame-wise and note-wise transcription.

Collaboration


Dive into the Matthias Mauch's collaboration.

Top Co-Authors

Avatar

Simon Dixon

Queen Mary University of London

View shared research outputs
Top Co-Authors

Avatar

Tian Cheng

Queen Mary University of London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Masataka Goto

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Dan Tidhar

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Hiromasa Fujihara

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Tomoyasu Nakano

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jun Ogata

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge