Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where William A. Kretzschmar is active.

Publication


Featured researches published by William A. Kretzschmar.


Archive | 2009

The linguistics of speech

William A. Kretzschmar

Introduction 1. The contemporary marketplace of ideas about language 2. Saussure 3. Evidence from linguistic survey research: basic description 4. Statistical evidence from linguistic survey research 5. Evidence from corpus linguistics 6. Speech as a complex system 7. Speech perception 8. Speech models and applications.


International Journal of Geographic Information Systems | 1993

Spatial analysis of linguistic data with GIS functions

Jay Lee; William A. Kretzschmar

Abstract During the 1980s techniques for analysis of geographical patterns have been refined to the point that they may be applied to data from many fields. Quantitative spatial analysis and existing functions available in geographical information systems (GIS) enable computerized implementations of these spatial analysis methods. This paper describes the application of quantitative spatial analysis and GIS functions to analysis of language data, using the extensive files of the Linguistic Atlas of the Middle and South Atlantic States (LAMSAS). A brief review of recent development of using quantitative and statistical methods for analysing linguistic data is also included.


Language | 1996

Introduction to quantitative analysis of linguistic survey data : an atlas by the numbers

William A. Kretzschmar; Edgar W. Schneider

The Research Design of a Linguistic Atlas The Application of Statistical Tests to LAMSAS From Atlas to Database Structure The Statistical Analysis of LAMSAS Data Model Analyses of LAMSAS Data


Journal of English Linguistics | 2006

Collaboration on Corpora for Regional and Social Analysis

William A. Kretzschmar; Jean Anderson; Joan C. Beal; Karen P. Corrigan; Lisa Lena Opas-Hänninen; Bartlomiej Plichta

Compilers of corpora that document regional and social languages and varieties of languages have different needs and goals, and yet we also face common problems, and we should have an interest in collaboration. In this paper, we set forth our intention to begin such a collaboration. We begin by exploring the parameters of our various corpora. We then explore issues of access and analysis, whether public or private, whether for general audiences or for specialists. Finally, we assert that it is indeed possible, practical, and desirable for us to apply common methods to our common problems, and we propose specific recommendations.


Literary and Linguistic Computing | 2010

Library collaboration with large digital humanities projects

William A. Kretzschmar; William Gray Potter

The sustainability of digital humanities research projects is a pressing issue for humanities computing. Currently, even well-established large digital projects like the Linguistic Atlas Project (LAP) are at future risk because funding and other resources are contingent on grant funding or faculty status of the director, neither of which will necessarily be available to maintain the project over time. The mission of the university library, however, includes archiving and dissemination, now increasingly of digital materials as well as traditional paper. Collaboration with the university library is the only realistic option for long-term sustainability of digital humanities projects in the current environment. Unlike paper collections, which only require secure storage, digital projects also require the means of adaptation to new electronic media and operating environments. Even data storage requires that materials from digital projects be included in library media refresh cycles, which will include transfer of old data to new media as technology develops. Projects like LAP should provide resources to assist the library in starting the project archive, including staff time, and funding for equipment. Project metadata must be provided and, to the extent possible, integrated with library systems and finding aids. Project staff will also need to maintain a Web presence and tools developed for the project. Such cooperation leads toward the development of a digital institutional repository, in which research results and tools may be maintained in the library, not just for the humanities but across many disciplines.


Journal of English Linguistics | 1989

LAMSAS goes SASsy: Statistical Methods and Linguistic Atlas Data

Edgar W. Schneider; William A. Kretzschmar

This paper provides a brief overview of our recent consideration of the applicability of statistical testing procedures to Linguistic Atlas data in general and to LAMSAS in particular, and will present some preliminary results of our work. Before going into practical details, however, we feel it is important to discuss very briefly some general and theoretical problems in applying statistical machinery to Linguistic Atlas data, some restrictions that the nature of the data may impose upon possible analyses. We can then provide an account of the state of the computerization of LAMSAS, and finally offer a few examples of submitting LAMSAS data to analytical procedures in the R:Base database program and the SAS statistical package. Linguistic atlas data are revealing only insofar as they are representative of the speech of individuals beyond the immediate sample. The same assumption is required in applying statistical methods: a sample is analyzed not for its


Literary and Linguistic Computing | 2013

Scaled measurement of geographic and social speech data

William A. Kretzschmar; Brendan Kretzschmar; Irene M. Brockman

One of the principle signs that speech is a complex system is the nonlinear arrangement of frequencies of variants in linguistic survey data. When the counts are charted by frequency, they form an asymptotic hyperbolic curve (A-curve) at every scale of analysis. The shape of the curve is sensitive to sample size: a small sample is unlikely to show an A-curve. So, too, categorization: too large a number of categories makes the data appear linear because of the small number of tokens in each category, while allowing too few categories, such as the two data points from binary categories, also gives us a line, not a curve. The A-curve can only be observed when the number of categories into which the data are sorted lies between these two extremes. Common practice in dialectology and sociolinguistics has been to establish a small number of possible categories such as phonemes for pronunciation, or to notice only the few most frequently occurring variants and to ignore the rest. Such methods cannot address the underlying complexity of the data. In this essay, we discuss the Gini coefficient, used in economics, as a means to measure optimal nonlinearity. In an experiment where pronunciation data from survey research on the American English vowel system are analyzed in various subsamples, we demonstrate that A-curves do exist in the data in all cases, and we establish parameters for the interaction of sample size and number of categories in the design of valid and reliable experiments.


Literary and Linguistic Computing | 2006

Art and Science in Computational Dialectology

William A. Kretzschmar

Aristotle long ago divided kinds of study into technē and epistēmē, which we can roughly translate into the modern terms ‘art’ and ‘science’. It is certainly the case that computational dialectologists do well with the Art (technē), in our technical construction and execution of statistical experiments, and we have two different prominent models to choose from, each one corresponding to a mode of scientific discovery, either to deductive or to inductive scientific procedure. But that in itself should not be the whole story. The Science (epistēmē) of computational dialectology lies in the creation of arguments from our statistical results that are appropriate to the scientific procedure that motivates us. It is not so clear that computational dialectologists have done so well with their Science. What do the results of the technical work really mean? In what way are they associated with particular choices of linguistic theory? Is it the case that, after all of our technical hard work, we find only what we are looking for? In this paper, I will suggest that an appropriate use of the technical results of computational dialectology requires that practitioners take a more subtle approach to the theory that motivates the study in the first place, especially to the relationship between perception and production of language.


Journal of English Linguistics | 2004

Looking for the Smoking Gun: Principled Sampling in Creating the Tobacco Industry Documents Corpus

William A. Kretzschmar; Clayton Darwin; Cati Brown; Donald L. Rubin; Douglas Biber

As a result of litigation over the past decade, major tobacco companies were compelled to make public a broad range of previously confidential documents. We have created a series of corpora from the tobacco industry documents (TIDs) for three purposes: (1) to establish baseline descriptions of various linguistic features of this unique set of texts; (2) to identify TIDs in which rhetorical manipulation (“deception”) may have occurred and to estimate the extent and prevalence of manipulation; (3) to analyze manipulation in order to classify it and develop means to identify similar manipulation in other industry document sets. Our threepart corpus creation strategy employed rigorous sampling methods. First, we drew a limited sample from the largest collection of TIDs, to determine a representative classification of text types and to estimate their proportions within the overall body of texts. Then, we created a reference corpus (500,000+ words) constituting a stratified random sample of all TIDs, whether or not they exhibit manipulation. Finally, we compiled a corpus of texts presumed to exhibit rhetorical manipulation. We assumed that multiple drafts of a text or versions of a text prepared for different audiences constituted rhetorical manipulation. This article presents our experience with the sampling methods utilized in this corpus-building process and our findings regarding text types comprising the reference corpus.As a result of litigation over the past decade, major tobacco companies were compelled to make public a broad range of previously confidential documents. We have created a series of corpora from the tobacco industry documents (TIDs) for three purposes: (1) to establish baseline descriptions of various linguistic features of this unique set of texts; (2) to identify TIDs in which rhetorical manipulation (“deception”) may have occurred and to estimate the extent and prevalence of manipulation; (3) to analyze manipulation in order to classify it and develop means to identify similar manipulation in other industry document sets. Our threepart corpus creation strategy employed rigorous sampling methods. First, we drew a limited sample from the largest collection of TIDs, to determine a representative classification of text types and to estimate their proportions within the overall body of texts. Then, we created a reference corpus (500,000+ words) constituting a stratified random sample of all TIDs, whether or not they exhibit manipulation. Finally, we compiled a corpus of texts presumed to exhibit rhetorical manipulation. We assumed that multiple drafts of a text or versions of a text prepared for different audiences constituted rhetorical manipulation. This article presents our experience with the sampling methods utilized in this corpus-building process and our findings regarding text types comprising the reference corpus.


Journal of English Linguistics | 1996

Mapping with Numbers

Deanna Light; William A. Kretzschmar

Light (1992) presented a paper at the Modem Language Association (MLA) Present-Day English session titled &dquo;Quantitative Analysis of Areal Linguistic Data.&dquo; In that paper, she described a new statistical method for the analysis of data from the Linguistic Atlas of the Middle and South Atlantic States (LAMSAS). Since that time, we have made several refinements in the method, and have run a considerable number of files from the LAMSAS corpus through the statistical procedure. We would like to share some of those refinements and results.

Collaboration


Dive into the William A. Kretzschmar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Robert A. Cloutier

Tennessee Technological University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge