Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Greg Kochanski is active.

Publication


Featured researches published by Greg Kochanski.


Journal of the Acoustical Society of America | 2005

Loudness predicts prominence: Fundamental frequency lends little

Greg Kochanski; Esther Grabe; John Coleman; Burton S. Rosner

We explored a database covering seven dialects of British and Irish English and three different styles of speech to find acoustic correlates of prominence. We built classifiers, trained the classifiers on human prominence/nonprominence judgments, and then evaluated how well they behaved. The classifiers operate on 452 ms windows centered on syllables, using different acoustic measures. By comparing the performance of classifiers based on different measures, we can learn how prominence is expressed in speech. Contrary to textbooks and common assumption, fundamental frequency (f0) played a minor role in distinguishing prominent syllables from the rest of the utterance. Instead, speakers primarily marked prominence with patterns of loudness and duration. Two other acoustic measures that we examined also played a minor role, comparable to f0. All dialects and speaking styles studied here share a common definition of prominence. The result is robust to differences in labeling practice and the dialect of the labeler.


The Astrophysical Journal | 1998

Detailed Mass Map of CL 0024+1654 from Strong Lensing

J. Anthony Tyson; Greg Kochanski; Ian P. Dell'Antonio

We construct a high-resolution mass map of the cluster 00241654, based on a parametric inversion z 0.39 of the associated gravitational lens. The lens creates eight well-resolved subimages of a background galaxy, seen in deep imaging with the Hubble Space Telescope. 1 Excluding mass concentrations centered on visible galaxies, more than 98% of the remaining mass is represented by a smooth concentration of dark matter centered near the brightest cluster galaxies, with a 35 kpc soft core. The asymmetry in the mass distribution is less than 3% 1 h inside 107 h 1 kpc radius. The dark matter distribution we observe in CL 0024 is far more smooth, symmetric, and nonsingular than in typical simulated clusters in either or cold dark matter cosmologies. Q 1 Q 0.3 Integrated to a 107 h 1 kpc radius, the rest-frame mass-to-light ratio is M /. L 276 40 h (M/L ) VV


Speech Communication | 2003

Prosody modeling with soft templates

Greg Kochanski; Chilin Shih

This paper describes a novel prosody generation model. We intend it to broadly support many linguistic theories and multiple languages, for the model imposes no restriction on accent categories and shapes. This capability is crucial to the next generation of text-to-speech systems that will need to synthesize intonation variations for different speech acts, emotions, and styles of speech. The system supports mark-up tags that are mathematically defined and generate f0 deterministically. Underlying the tags is an articulatory model of accent interaction which balances physiological and communication constraints. We specify the model by way of an algorithm for calculating the pitch, and by way of examples. The model allows localized, linguistically reasonable tags, and is suitable for a data-driven fitting process.


Journal of the Acoustical Society of America | 2011

Rhythm measures and dimensions of durational variation in speech.

Anastassia Loukina; Greg Kochanski; Burton S. Rosner; Elinor Keane; Chilin Shih

Patterns of durational variation were examined by applying 15 previously published rhythm measures to a large corpus of speech from five languages. In order to achieve consistent segmentation across all languages, an automatic speech recognition system was developed to divide the waveforms into consonantal and vocalic regions. The resulting duration measurements rest strictly on acoustic criteria. Machine classification showed that rhythm measures could separate languages at rates above chance. Within-language variability in rhythm measures, however, was large and comparable to that between languages. Therefore, different languages could not be identified reliably from single paragraphs. In experiments separating pairs of languages, a rhythm measure that was relatively successful at separating one pair often performed very poorly on another pair: there was no broadly successful rhythm measure. Separation of all five languages at once required a combination of three rhythm measures. Many triplets were about equally effective, but the confusion patterns between languages varied with the choice of rhythm measures.


Speech Communication | 2003

Quantitative measurement of prosodic strength in Mandarin

Greg Kochanski; Chilin Shih; Hongyan Jing

Abstract We describe models of Mandarin prosody that allow us to make quantitative measurements of prosodic strengths. These models use Stem-ML, which is a phenomenological model of the muscle dynamics and planning process that controls the tension of the vocal folds, and therefore the pitch of speech. Because Stem-ML describes the interactions between nearby tones, we were able to capture surface tonal variations using a highly constrained model with only one template for each lexical tone category, and a single prosodic strength per word. The model accurately reproduces the intonation of the speaker, capturing 87% of the variance of f 0 with these strength parameters. The result reveals alternating metrical patterns in words, and shows that the speaker marks a hierarchy of boundaries by controlling the prosodic strength of words. The strengths we obtain are also correlated with syllable duration, mutual information and part-of-speech.


Language and Speech | 2007

Connecting Intonation Labels to Mathematical Descriptions of Fundamental Frequency

Esther Grabe; Greg Kochanski; John Coleman

The mathematical models of intonation used in speech technology are often inaccessible to linguists. By the same token, phonological descriptions of intonation are rarely used by speech technologists, as they cannot be implemented directly in applications. Consequently, these research communities do not benefit much from each others insights. In this paper, we explore the interface between the disciplines, in search of bridges between intonational phonology and speech technology. In a corpus of speech data from seven dialects of English, we hand-labeled over 700 sentences and identified seven nuclear accent types. Then we fitted a third-order polynomial to the fundamental frequency (F0) contour in the region around the accent mark. The polynomial captures the local shape (time-dependence) of F0 in a few numbers, in our case, four coefficients. The coefficients were subjected to statistical analysis. Nineteen of the 21 pairs of accent types differed significantly in one or more coefficients. Our approach bridges the gap between intonational phonology and speech technology. It provides quantitative, empirically testable models of intonation labels that can be implemented in applications.


Journal of the Acoustical Society of America | 2008

What marks the beat of speech

Greg Kochanski; Christina Orphanidou

Which acoustic properties of the speech signal differ between rhythmically prominent syllables and non-prominent ones? A production experiment was conducted to identify these acoustic properties. Subjects read out repetitive text to a metronome, trying to match stressed syllables to its beat. The analysis searched for the function of the speech signal that best predicts the timing of the metronome ticks. The most important factor in this function is found to be the contrast in loudness between a syllable and its neighbors. The prominence of a syllable can be deduced from the specific loudness in an (approximately) 360 ms window centered on the syllable in question relative to an (approximately) 800-ms-wide symmetric window.


International Journal of Speech Technology | 2003

Hierarchical Structure and Word Strength Prediction of Mandarin Prosody

Greg Kochanski; Chilin Shih; Hongyan Jing

We use Stem-ML to build an automatic learning system for Mandarin prosody that allows us to make quantitative measurements of prosodic strengths. Stem-ML is a phenomenological model of the muscle dynamics and planning process that controls the tension of the vocal folds. Because Stem-ML describes the interactions between nearby tones or accents, we were able to use a highly constrained model with only one accent template for each lexical tone category, and a single prosodic strength per word. The model accurately reproduces the intonation of the speaker, capturing 87% of the variance of the speechs fundamental frequency, f0. The result reveals strong alternating metrical patterns in words, and suggests that the speaker uses word strength to mark a hierarchy of sentence, clause, phrase, and word boundaries.


Surface Science | 1992

STM measurements of photovoltage on Si(111) and Si(111):Ge

Greg Kochanski; R.F. Bell

Abstract Photovoltages were measured with a tunneling microscope for Si(111)7 × 7 and Si(111):Ge5 × 5 surfaces. Over 100 A images, the photovoltages were uniform within 10 mV, despite missing surface atoms or adsorbed residual gas atoms. Cross-correlation between photovoltage and topograph images shows that the atomically varying part of the photovoltage is smaller than 0.1 mV. The average photovoltages were large and consistent with the band structure of the surfaces. We discuss error sources in photovoltage measurements and differences with previous publications.


Literary and Linguistic Computing | 2007

Tools for Searching, Annotation and Analysis of Speech, Music, Film and Video—A Survey

Alan Marsden; Adrian Mackenzie; Adam T. Lindsay; Harriet Nock; John Coleman; Greg Kochanski

This article examines the actual and potential use of software tools in research in the arts and humanities focusing on audiovisual (AV) materials such as recorded speech, music, video and film. The quantity of such materials available to researchers is massive and rapidly expanding. Researchers need to locate the material of interest in the vast quantity available, and to organize and process the material once collected. Locating and organizing often depend on metadata and tags to describe the actual content, but standards for metadata for AV materials are not widely adopted. Content-based search is becoming possible for speech, but is still beyond the horizon for music, and even more distant for video. Copyright protection hampers research with AV materials, and Digital Rights Management (DRM) systems threaten to prevent research altogether. Once material has been located and accessed, much research proceeds by annotation, for which many tools exist. Many researchers make some kind of transcription of materials, and would value tools to automate this process. Such tools exist for speech, though with important limits to their accuracy and applicability. For music and video, researchers can make use of visualizations. A better understanding (in general terms) by researchers of the processes carried out by computer software and of the limitations of its results would lead to more effective use of Information and Communications Technology (ICT).

Collaboration


Dive into the Greg Kochanski's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge