Wilbert Heeringa
University of Groningen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wilbert Heeringa.
Language Variation and Change | 2004
Charlotte Gooskens; Wilbert Heeringa
The Levenshtein dialect distance method has proven to be a successful method for measuring phonetic distances between Dutch dialects. The aim of the present investigation is to validate the Levenshtein dialect distance with perceptual data from a language area other than the Dutch, namely Norway. We calculate the correlation between the Levenshtein distances and the distances between 15 Norwegian dialects as judged by Norwegian listeners. We carry out this analysis to see the degree to which the average Levenshtein distances correspond to the psychoacoustic perception of the speakers of the dialects. The present article reports on part of a study supported by NWO, the Netherlands Organization for Scientific Research. We are grateful for the permission from Kristian Skarbo and Jorn Almberg to use their material and for the help of Jorn Almberg during the whole investigation. We thank Saakje van Dellen for her obliging help with the data entry and Peter Kleiweg for letting us use the programs that he developed for the visualization of the maps and dendrograms in this article. Finally, we would like to thank John Nerbonne for valuable comments and for correcting our English.
Language Variation and Change | 2001
Wilbert Heeringa; John Nerbonne
The organising concept behind dialect variation is still seen predominantly as realized by the areas within which similar varieties are spoken. The opposing view, that dialects are organised in a continuum without sharp boundaries is likewise popular. This paper introducing a new element into this traditional discussion, the opportunity to view dialectal differences in the aggregate. We employ a dialectometric technique which provides an additive measure of pronunciation difference the (aggregate) pronunciation distance. This allows us to determine how much of the linguistic variation we find is accounted for by geography – between 65% and 81% in our sample of 27 Dutch towns and villages, a fact which lends credence to the continuum view. The borders of well-established dialect areas nonetheless show large deviations from the expected aggregate pronunciation distance. We pay particular attention to a puzzle about the subjective perception of continua introduced by Chambers and Trudgill, who consider a traveller walking in a straight line and noticing successive small changes as he walks from village to village, but seldom, if ever large differences. This sounds like a justification of a the continuum view, but there is an added twist: might the traveller be misled by the perspective of most recent memory? We shall use the Chambers-Trudgill puzzle to organise this paper at several points.
Proceedings of the Workshop on Linguistic Distances | 2006
Wilbert Heeringa; Peter Kleiweg; Charlotte Gooskens; John Nerbonne
We examine various string distance measures for suitability in modeling dialect distance, especially its perception. We find measures superior which do not normalize for word length, but which are are sensitive to order. We likewise find evidence for the superiority of measures which incorporate a sensitivity to phonological context, realized in the form of n-grams--although we cannot identify which form of context (bigram, trigram, etc.) is best. However, we find no clear benefit in using gradual as opposed to binary segmental difference when calculating sequence distances.
International Journal of Humanities and Arts Computing | 2008
Charlotte Gooskens; Wilbert Heeringa
In the present investigation, the intelligibility of 17 Scandinavian language varieties and standard Danish was assessed among young Danes from Copenhagen. In addition, distances between standard Danish and each of the 17 varieties were measured at the lexical level and at different phonetic levels. In order to determine how well these linguistic levels can predict intelligibility, we correlated the intelligibility scores with the linguistic distances and we carried out a number of regression analyses. The results show that for this particular set of closely related language varieties phonetic distance is a better predictor of intelligibility than lexical distance. Consonant substitutions, vowel insertions and vowel shortenings contribute significantly to the prediction of intelligibility.
STUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION | 2008
John Nerbonne; Peter Kleiweg; Wilbert Heeringa; Franz Manni
Dialectometry produces aggregate DISTANCE MATRICES in which a distance is specified for each pair of sites. By projecting groups obtained by clustering onto geography one compares results with traditional dialectology, which produced maps partitioned into implicitly non-overlapping DIALECT AREAS. The importance of dialect areas has been challenged by proponents of CONTINUA, but they too need to compare their findings to older literature, expressed in terms of areas.
Literary and Linguistic Computing | 2006
Franz Manni; Wilbert Heeringa; John Nerbonne
Since the early studies by Sokal (1988) and Cavalli-Sforza et al. (1989), there has been an increasing interest in depicting the history of human migrations by comparing genetic and linguistic differences that mirror different aspects of human history. Most of the literature concerns continental or macroregional patterns of variation, while regional and microregional scales were investigated less successfully. In this article we concentrate on the Netherlands, an area of only
24th Annual Conference of the German-Classification-Society | 2002
Wilbert Heeringa; John Nerbonne; Peter Kleiweg
The range of dialectometric methods suggests the need for validation work. We propose a gold standard, based on the consensual classification of a well-studied area. Fidelity to the gold standard is assessed via matrix overlap measures (Rand and Fowlkes/Mallows). Word-based techniques in which varieties are compared to each other directly emerge as superior.
Computers and The Humanities | 2003
Wilbert Heeringa; Charlotte Gooskens
Gooskens (2003) described an experiment which determined linguistic distances between 15 Norwegian dialects as perceived by Norwegian listeners. The results are compared toLevenshtein distances, calculated on the basis of transcriptions (of the words) of the same recordings as used in the perception experiment. The Levenshtein distance is equal to the sum of the weights of the insertions,deletions and substitutions needed to change one pronunciation into another. The success of the method depends on the reliability of the transcriber.The aim of this paper is to find an acoustic distance measure between dialects which approximates perceptual distance measure. We use andcompare different representations of the acoustic signal: Barkfilter spectrograms, cochleagrams and formant tracks. We now apply the Levenshteinalgorithm to spectra or formant value bundles instead of transcription segments. From these acoustic representations we got the best results usingthe formant track representation. However the transcription-based Levenshtein distances correlate still more closely. In the acoustic signalthe speaker-dependent influence is kept to some extent, while a transcriberabstracts from voice quality. Using more samples per dialect word (instead of only one as in our research) should improve the accuracy of the measurements.
Human Biology | 2008
Franz Manni; Wilbert Heeringa; Bruno Toupance; John Nerbonne
ABSTRACT Our focus in this paper is the analysis of surnames, which have been proven to be reliable genetic markers because in patrilineal systems they are transmitted along generations virtually unchanged, similarly to a genetic locus on the Y chromosome. We compare the distribution of surnames to the distribution of dialect pronunciations, which are clearly culturally transmitted. Because surnames, at the time of their introduction, were words subject to the same linguistic processes that otherwise result in dialect differences, one might expect their geographic distribution to be correlated with dialect pronunciation differences. In this paper we concentrate on the Netherlands, an area of only 40,000 km2, where two official languages are spoken, Dutch and Frisian. We analyze 19,910 different surnames, sampled in 226 locations, and 125 different words, whose pronunciation was recorded in 252 sites. We find that, once the collinear effects of geography on both surname and cultural transmission are taken into account, there is no statistically significant association between the two, suggesting that surnames cannot be taken as a proxy for dialect variation, even though they can be safely used as a proxy for Y-chromosome genetic variation. We find the results historically and geographically insightful, hopefully leading to a deeper understanding of the role that local migrations and cultural diffusion play in surname and dialect diversity.
Speech Communication | 2009
Wilbert Heeringa; Keith Johnson; Charlotte Gooskens
Levenshtein distance has become a popular tool for measuring linguistic dialect distances, and has been applied to Irish Gaelic, Dutch, German and other dialect groups. The method, in the current state of the art, depends upon phonetic transcriptions, even when acoustic differences are used the number of segments in the transcriptions is used for speech rate normalization. The goal of this paper is to find a fully acoustic measure which approximates the quality of semi-acoustic measures that rely on tagged speech. We use a set of 15 Norwegian dialect recordings and test the hypothesis that the use of the acoustic signal only, without transcriptions, is sufficient for obtaining results which largely agree with both traditional Norwegian dialectology and the perception of the speakers themselves. We use formant trajectories and consider both the Hertz and the Bark scale. We experiment with an approach in which z-scores per frame are used instead of the original frequency values. Besides formant tracks, we also consider zero crossing rates: the number of times per interval that the amplitude waveform crosses the zero line. The zero crossing rate is sensitive to the difference between voiced and unvoiced speech sections. When using the fully acoustic measure on the basis of the combined representation with normalized frequency values, we obtained results comparable with the results obtained with the semi-acoustic measure. We applied cluster analysis and multidimensional scaling to distances obtained with this method and found results which largely agree with both the results of traditional Norwegian dialectology and with the perception of the speakers. When scaling to three dimensions, we found the first dimension responsible for gender differences. However, when leaving out this dimension, dialect specific information is lost as well.