Christian Kroos
University of Western Sydney
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christian Kroos.
Journal of Cognitive Neuroscience | 2004
Jeffery A. Jones; Kevin G. Munhall; Christian Kroos; Akiko Callan; Eric Vatikiotis-Bateson
Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility.
Cognitive Brain Research | 2001
Akiko Callan; Christian Kroos; Eric Vatikiotis-Bateson
In this single-sweep electroencephalographic case study, independent component analysis (ICA) was used to investigate multimodal processes underlying the enhancement of speech intelligibility in noise (for monosyllabic English words) by visualizing facial motion concordant with the audio speech signal. Wavelet analysis of the single-sweep IC activation waveforms revealed increased high-frequency energy for two ICs underlying the visual enhancement effect. For one IC, current source density analysis localized activity mainly to the superior temporal gyrus, consistent with principles of multimodal integration. For the other IC, activity was distributed across multiple cortical areas perhaps reflecting global mappings underlying the visual enhancement effect.
Attention Perception & Psychophysics | 2004
Kevin G. Munhall; Christian Kroos; G. Jozan; Eric Vatikiotis-Bateson
Spatial frequency band-pass and low-pass filtered images of a talker were used in an audiovisual speech-in-noise task. Three experiments tested subjects’ use of information contained in the different filter bands with center frequencies ranging from 2.7 to 44.1 cycles/face (c/face). Experiment 1 demonstrated that information from a broad range of spatial frequencies enhanced auditory intelligibility. The frequency bands differed in the degree of enhancement, with a peak being observed in a mid-range band (11-c/face center frequency). Experiment 2 showed that this pattern was not influenced by viewing distance and, thus, that the results are best interpreted in object spatial frequency, rather than in retinal coordinates. Experiment 3 showed that low-pass filtered images could produce a performance equivalent to that produced by unfiltered images. These experiments are consistent with the hypothesis that high spatial resolution information is not necessary for audiovisual speech perception and that a limited range of spatial frequency spectrum is sufficient.
Journal of Phonetics | 2010
Mark Antoniou; Catherine T. Best; Michael D. Tyler; Christian Kroos
The way that bilinguals produce phones in each of their languages provides a window into the nature of the bilingual phonological space. For stop consonants, if early sequential bilinguals, whose languages differ in voice onset time (VOT) distinctions, produce native-like VOTs in each of their languages, it would imply that they have developed separate first and second language phones, that is, language-specific phonetic realisations for stop-voicing distinctions. Given the ambiguous phonological status of Greek voiced stops, which has been debated but not investigated experimentally, Greek-English bilinguals can offer a unique perspective on this issue. We first recorded the speech of Greek and Australian-English monolinguals to observe native VOTs in each language for /p, t, b, d/ in word-initial and word-medial (post-vocalic and post-nasal) positions. We then recorded fluent, early Greek-Australian-English bilinguals in either a Greek or English language context; all communication occurred in only one language. The bilinguals in the Greek context were indistinguishable from the Greek monolinguals, whereas the bilinguals in the English context matched the VOTs of the Australian-English monolinguals in initial position, but showed some modest differences from them in the phonetically more complex medial positions. We interpret these results as evidence that bilingual speakers possess phonetic categories for voiced versus voiceless stops that are specific to each language, but are influenced by positional context differently in their second than in their first language.
Journal of Phonetics | 2002
Christian Kroos; Takaaki Kuratate; Eric Vatikiotis-Bateson
Abstract In this paper, we describe and evaluate a noninvasive method of measuring face motion during speech production. Reliable measures are extracted from standard video sequences using an image analysis process that takes advantage of important constraints on face structure and motion. Measures are made by deforming the surface of an ellipsoidal mesh fit to the face image at successive degrees of spatial resolution using a two-dimensional wavelet transform of the image frames. Reliability of the measures is evaluated through comparison with 3D marker data recorded for the same utterances and speaker, and perceptual evaluation of the talking head animations created from the measures are underway.
Applied Psycholinguistics | 2012
Rikke L. Bundgaard-Nielsen; Catherine T. Best; Christian Kroos; Michael D. Tyler
This paper tests the predictions of the vocabulary-tuning model of second language (L2) rephonologization in the domain of L2 segmental production. This model proposes a facilitating effect of adults’ L2 vocabulary expansion on L2 perception and production and suggests that early improvements in L2 segmental production may be positively associated with an expanding L2 vocabulary. The model was tested in a study of the L2 vowel intelligibility of adult Japanese learners of Australian English, who differed only in the size of their L2 vocabularies. The results support the predicted association between L2 vocabulary size and L2 vowel intelligibility and the prediction that early-phase L2 vocabulary expansion leads to improved L2 production.
Leonardo | 2012
Christian Kroos; Damith C. Herath; Stelarc
ABSTRACT Robotic embodiments of artificial agents seem to reinstate a body-mind dualism as consequence of their technical implementation, but could this supposition be a misconception? The authors present their artistic, scientific and engineering work on a robotic installation, the Articulated Head, and its perception-action control system, the Thinking Head Attention Model and Behavioral System (THAMBS). The authors propose that agency emerges from the interplay of the robots behavior and the environment and that, in the systems interaction with humans, it is to the same degree attributed to the robot as it is grounded in the robots actions: Agency cannot be instilled; it needs to be evoked.
human-robot interaction | 2010
Christian Kroos; Damith C. Herath; Stelarc
The Articulated Head (AH) is an artistic installation that consists of a LCD monitor mounted on an industrial robot arm (Fanuc LR Mate 200iC) displaying the head of a virtual human. It was conceived as the next step in the evolution of Embodied Conversational Agents (ECAs) transcending virtual reality into the physical space shared with the human interlocutor. Recently an attention module has been added as part of a behavioural control system for non-verbal interaction between robot/ECA and human. Unstructured incoming perceptual information (currently originating from a custom acoustic localisation algorithm and a commercial people tracking software) is narrowed down to the most salient aspects allowing the generation of a single motor response. The requirements of the current task determine what salient means at any point in time, that is, the rules and associated thresholds and weights of the attention system are modified by the requirements of the current task while the task itself is specified by the central control system depending on the overall state of the AH with respect to the ongoing interaction. The attention system determines a single attended event using a winner-takes-all strategy and relays it to the central control system. It also directly generates a motor goal and forwards it to the motor system. The video shows how the robots attention system drives its behaviour, (1) when there is no stimulus over an extended period of time, (2) when a person moves within its visual field, and (3) when a sudden loud auditory event attracts attention during an ongoing visually-based interaction (auditory-visual attention conflict). The subtitles are direct mappings from numeric descriptions of the central control systems internal states to slightly more entertaining English sentences.
Perception | 2010
Jeesun Kim; Christian Kroos; Chris Davis
Parsing of information from the world into objects and events occurs in both the visual and auditory modalities. It has been suggested that visual and auditory scene perceptions involve similar principles of perceptual organisation. We investigated here cross-modal scene perception by determining whether an auditory stimulus could facilitate visual object segregation. Specifically, we examined whether the presentation of matched auditory speech would facilitate the detection of a point-light talking face amid point-light distractors. An adaptive staircase procedure (3-up–1-down rule) was used to estimate the 79% correct threshold in a two-alternative forced-choice procedure. To determine if different degrees of speech motion would show auditory influence of different sizes, two speech modes were tested (in quiet and Lombard speech). A facilitatory auditory effect on talking-face detection was found; the size of this effect did not differ between the different speech modes.
Journal of the Acoustical Society of America | 1996
Christian Kroos; Philip Hoole; Barbara Kühnert; Hans‐G. Tillmann
Recently [Hoole et al., Proc. ICSLP 94, 1, 53–56 (1994)] a comparison was made with EMMA of the kinematic properties of German tense and lax vowels over changes in speech rate. They appeared not to differ in the internal organization of the elementary CV and VC movements, but the lax vowels showed tighter serial coupling of CV to VC movement. These results were expected given the strong phonological tradition (on a phonetically elusive substrate) of accounting for the differences between these vowels at the level of word prosody (especially in the link between vowel and following consonant). Nevertheless, important questions remain open: First, analysis of the velocity profiles of the CV and VC movements had been based on a parameter (ratio of peak to average velocity) that may not capture all relevant differences. Second, it was unclear whether tense–lax differences are equally clear‐cut for all vowel subcategories (e.g, high, low, rounded, unrounded) and also for different consonantal contexts. The more refined and extensive analyses now carried out will be used to assess the well‐foundedness of the earlier preliminary conclusions and to place the German results within the wider perspective of improved understanding of vowel‐system typologies. [Work supported by German Research Council.]