Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Fred Cummins is active.

Publication


Featured researches published by Fred Cummins.


Neural Computation | 2000

Learning to Forget: Continual Prediction with LSTM

Felix A. Gers; Jürgen Schmidhuber; Fred Cummins

Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the networks internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel, adaptive forget gate that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them, and in an elegant way.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Speaker Identification Using Instantaneous Frequencies

Marco Grimaldi; Fred Cummins

This paper presents an experimental evaluation of different features for use in speaker identification. The features are tested using speech data provided by the chains corpus, in a closed-set speaker identification task. The main objective of the paper is to present a novel parametrization of speech that is based on the AM-FM representation of the speech signal and to assess the utility of these features in the context of speaker identification. In order to explore the extent to which different instantaneous frequencies due to the presence of formants and harmonics in the speech signal may predict a speakers identity, this work evaluates three different decompositions of the speech signal within the same AM-FM framework: a first setup has been used previously for formant tracking, a second setup is designed to enhance familiar resonances below 4000 Hz, and a third setup is designed to approximate the bandwidth scaling of the filters conventionally used in the extraction of Mel-fequency cepstral coefficients (MFCCs). From each of the proposed setups, parameters are extracted and used in a closed-set text-independent speaker identification task. The performance of the new featural representation is compared with results obtained adopting MFCC and RASTA-PLP features in the context of a generic Gaussian mixture model (GMM) classification system. In evaluating the novel features, we look selectively at information for speaker identification contained in the frequency range 0-4000 Hz and 4000-8000 Hz, as the instantaneous frequencies revealed by the AM-FM approach suggest the presence of structures not well known from conventional spectrographic analyses. Accuracy results obtained using the new parametrization perform as well as conventional MFCC parameters within the same reference system, when tested and trained on modally voiced speech which is mismatched in both channel and style. When the testing material is whispered speech, the new parameters provide better results than any of the other features tested, although they remain far from ideal in this limiting case.


Journal of Phonetics | 2009

Rhythm as entrainment: The case of synchronous speech

Fred Cummins

Abstract One view of rhythm, not conventionally adopted in speech research, is that it constitutes an affordance for movement. We test this view in an experimental situation in which speakers speak in synchrony with one another. After first establishing that speakers can synchronize with specific recordings, we present two experiments in which the information in the model speech is systematically reduced, allowing an evaluation of the respective roles of the amplitude envelope, the fundamental frequency and intelligibility in synchronization among speakers. Results demonstrate that synchronization is affected by several factors working together. The amplitude envelope, the pitch contour and the spectral qualities of the signal each contribute to synchronization. Intelligibility is not found to be absolutely necessary to support synchronization. This provides initial support for a dynamic account of synchronization among speakers based on the continuous flow of information between them.


Language and Cognitive Processes | 2011

The temporal relation between beat gestures and speech

Thomas Leonard; Fred Cummins

The temporal relation between beat gestures and accompanying speech is examined in two experiments. In the first, we find that subjects are very quick to spot altered timing between gesture and speech if the gesture is later than normal, but are considerably less sensitive to alterations that result in an earlier gesture. This suggests an asymmetry in the expectation on the part of listeners/watchers and raises immediate questions about which elements within the speech are being perceived as linked to which elements in the gestural series. We therefore examine the variability between several kinematic landmarks in a beat gesture, and three potential anchor points in the accompanying speech. We find the least variable relationship obtains between the point of maximum extension of the gesture and the accompanying pitch accent. Together, these findings contribute to our understanding of both the production and perception of beat gestures along with speech, and support an account of speech communication as a strongly embodied activity.


Phonetica | 2009

Rhythm as an affordance for the entrainment of movement.

Fred Cummins

A general account of rhythm in human behaviour is provided, according to which rhythm inheres in the affordance that a signal provides for the entrainment of movement on the part of a perceiver. This generic account is supported by an explication of the central concepts of affordance and entrainment. When viewed in this light, rhythm appears as the correct explanandum to account for coordinated behaviour in a wide variety of situations, including such core senses as dance and the production of music. Speech may appear to be only marginally rhythmical under such an account, but several experimental studies reveal that speech, too, has the potential to entrain movement.


north american chapter of the association for computational linguistics | 2004

UI on the Fly: generating a multimodal user interface

David Reitter; Erin Marie Panttaja; Fred Cummins

UI on the Fly is a system that dynamically presents coordinated multimodal content through natural language and a small-screen graphical user interface. It adapts to the users preferences and situation. Multimodal Functional Unification Grammar (MUG) is a unification-based formalism that uses rules to generate content that is coordinated across several communication modes. Faithful variants are scored with a heuristic function.


Language and Cognitive Processes | 2012

Gaze and blinking in dyadic conversation: A study in coordinated behaviour among individuals

Fred Cummins

Face to face conversation necessarily involves a great deal of bodily movement beyond that required for speaking. We seek to understand the systematic variation of such para-linguistic activity as a function of the ebb and flow of conversation. Gaze and blinking in dyadic conversation are examined, along with their relation to speech turn. Eight pairs provide 15 minutes of conversation each, including five participants who partake in two dyads each. This facilitates a thorough examination of the rich covariation of gaze and blinking both within an individual and as a function of the dyad. Many aspects of systematic variation are found to be relatively invariant within the individual, but individuals display large qualitative differences, one from the other.


Acoustics Research Letters Online-arlo | 2002

On synchronous speech

Fred Cummins

Synchronous speech is speech elicited by asking speakers to read a text in synchrony. The present study investigates the timing characteristics of speech obtained under such circumstances. In a main experiment, subjects read a text alone, with a recording of another speaker, or with another live speaker. The last condition produces a much higher degree of synchrony, even at the left edges of phrases following a pause. Subjects display a high level of agreement in pause placement in the synchronous condition, but add pauses idiosyncratically when reading alone. A small second experiment fails to uncover the informational basis of this synchrony, because some subjects can achieve similar synchrony with a recording of synchronous speech, whereas others appear to require a live speaker. Speech that has been modified in this manner is of immediate interest because it seems to express speaker’s attempts to produce maximally predictable speech.


Frontiers in Human Neuroscience | 2011

Periodic and aperiodic synchronization in skilled action.

Fred Cummins

Synchronized action is considered as a manifestation of shared skill. Most synchronized behaviors in humans and other animals are based on periodic repetition. Aperiodic synchronization of complex action is found in the experimental task of synchronous speaking, in which naive subjects read a common text in lock step. The demonstration of synchronized behavior without a periodic basis is presented as a challenge for theoretical understanding. A unified treatment of periodic and aperiodic synchronization is suggested by replacing the sequential processing model of cognitivist approaches with the more local notion of a task-specific sensorimotor coordination. On this view, skilled action is the imposition of constraints on the co-variation of movement and sensory flux such that the boundary conditions that define the skill are met. This non-cognitivist approach originates in the work of John Dewey. It allows a unification of the treatment of sensorimotor synchronization in simple rhythmic behavior and in complex skilled behavior and it suggests that skill sharing is a uniquely human trait of considerable import.


Psychological Review | 2010

Embodied task dynamics

Juraj Simko; Fred Cummins

Movement science faces the challenge of reconciling parallel sequences of discrete behavioral goals with observed fluid, context-sensitive motion. This challenge arises with a vengeance in the speech domain, in which gestural primitives play the role of discrete goals. The task dynamic framework has proved effective in modeling the manner in which the gestural primitives of articulatory phonology can result in smooth, biologically plausible, movement of model articulators. We present a variant of the task dynamic model with 1 significant innovation: Tasks are not abstract and context free but are embodied and tied to specific effectors. An advantage of this approach is that it allows the definition of a parametric cost function that can be optimized. Optimization generates gestural scores in which the relative timing of gestures is fully specified. We demonstrate that movements generated in an optimal manner are phonetically plausible. Highly nuanced movement trajectories are emergent based on relatively simple optimality criteria. This addresses a long-standing need within this theoretical framework and provides a rich modeling foundation for subsequent work.

Collaboration


Dive into the Fred Cummins's collaboration.

Top Co-Authors

Avatar

Robert F. Port

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Felix A. Gers

Dalle Molle Institute for Artificial Intelligence Research

View shared research outputs
Top Co-Authors

Avatar

Juraj Simko

University College Dublin

View shared research outputs
Top Co-Authors

Avatar

Nuala Brady

University College Dublin

View shared research outputs
Top Co-Authors

Avatar

Stuart Jackson

University College Dublin

View shared research outputs
Top Co-Authors

Avatar

Jürgen Schmidhuber

Dalle Molle Institute for Artificial Intelligence Research

View shared research outputs
Top Co-Authors

Avatar

Marco Grimaldi

University College Dublin

View shared research outputs
Top Co-Authors

Avatar

Sophie K. Scott

University College London

View shared research outputs
Top Co-Authors

Avatar

Anna Esposito

Seconda Università degli Studi di Napoli

View shared research outputs
Top Co-Authors

Avatar

Georgios Papadelis

Aristotle University of Thessaloniki

View shared research outputs
Researchain Logo
Decentralizing Knowledge