Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Santeri Yrttiaho is active.

Publication


Featured researches published by Santeri Yrttiaho.


Journal of the Acoustical Society of America | 2009

Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering

Paavo Alku; Carlo Magi; Santeri Yrttiaho; Tom Bäckström; Brad H. Story

Closed phase (CP) covariance analysis is a widely used glottal inverse filtering method based on the estimation of the vocal tract during the glottal CP. Since the length of the CP is typically short, the vocal tract computation with linear prediction (LP) is vulnerable to the covariance frame position. The present study proposes modification of the CP algorithm based on two issues. First, and most importantly, the computation of the vocal tract model is changed from the one used in the conventional LP into a form where a constraint is imposed on the dc gain of the inverse filter in the filter optimization. With this constraint, LP analysis is more prone to give vocal tract models that are justified by the source-filter theory; that is, they show complex conjugate roots in the formant regions rather than unrealistic resonances at low frequencies. Second, the new CP method utilizes a minimum phase inverse filter. The method was evaluated using synthetic vowels produced by physical modeling and natural speech. The results show that the algorithm improves the performance of the CP-type inverse filtering and its robustness with respect to the covariance frame position.


Journal of the Acoustical Society of America | 2010

The neural code for interaural time difference in human auditory cortex

Nelli H. Salminen; Hannu Tiitinen; Santeri Yrttiaho; Patrick J. C. May

A magnetoencephalography study was conducted to reveal the neural code of interaural time difference (ITD) in the human cortex. Widely used crosscorrelator models predict that the code consists of narrow receptive fields distributed to all ITDs. The present findings are, however, more in line with a neural code formed by two opponent neural populations: one tuned to the left and the other to the right hemifield. The results are consistent with models of ITD extraction in the auditory brainstem of small mammals and, therefore, suggest that similar computational principles underlie human sound source localization.


Journal of the Acoustical Society of America | 2008

Cortical sensitivity to periodicity of speech sounds

Santeri Yrttiaho; Hannu Tiitinen; Patrick J. C. May; Sakari Leino; Paavo Alku

Previous non-invasive brain research has reported auditory cortical sensitivity to periodicity as reflected by larger and more anterior responses to periodic than to aperiodic vowels. The current study investigated whether there is a lower fundamental frequency (F0) limit for this effect. Auditory evoked fields (AEFs) elicited by natural-sounding 400 ms periodic and aperiodic vowel stimuli were measured with magnetoencephalography. Vowel F0 ranged from normal male speech (113 Hz) to exceptionally low values (9 Hz). Both the auditory N1m and sustained fields were larger in amplitude for periodic than for aperiodic vowels. The AEF sources for periodic vowels were also anterior to those for the aperiodic vowels. Importantly, the AEF amplitudes and locations were unaffected by the F0 decrement of the periodic vowels. However, the N1m latency increased monotonically as F0 was decreased down to 19 Hz, below which this trend broke down. Also, a cascade of transient N1m-like responses was observed in the lowest F0 condition. Thus, the auditory system seems capable of extracting the periodicity even from very low F0 vowels. The behavior of the N1m latency and the emergence of a response cascade at very low F0 values may reflect the lower limit of pitch perception.


Neuroreport | 2007

The right-hemispheric auditory cortex in humans is sensitive to degraded speech sounds

Lassi A. Liikkanen; Hannu Tiitinen; Paavo Alku; Sakari Leino; Santeri Yrttiaho; Patrick J. C. May

We investigated how degraded speech sounds activate the auditory cortices of the left and right hemisphere. To degrade the stimuli, we introduce uniform scalar quantization, a controlled and replicable manipulation, not used before, in cognitive neuroscience. Three Finnish vowels (/a/, /e/ and /u/) were used as stimuli for 10 participants in magnetoencephalography registrations. Compared with the original vowel sounds, the degraded sounds increased the amplitude of the right-hemispheric N1m without affecting the latency whereas the amplitude and latency of the N1m in the left hemisphere remained unaffected. Although the participants were able to identify the stimuli correctly, the increased degradation led to increased reaction times which correlated positively with the N1m amplitude. Thus, the auditory cortex of right hemisphere might be particularly involved in processing degraded speech and possibly compensates for the poor signal quality by increasing its activity.


Journal of the Acoustical Society of America | 2013

Detection of shouted speech in noise: Human and machine

Jouni Pohjalainen; Tuomo Raitio; Santeri Yrttiaho; Paavo Alku

High vocal effort has characteristic acoustic effects on speech. This study focuses on the utilization of this information by human listeners and a machine-based detection system in the task of detecting shouted speech in the presence of noise. Both female and male speakers read Finnish sentences using normal and shouted voice in controlled conditions, with the sound pressure level recorded. The speech material was artificially corrupted by noise and supplemented with pure noise. The human performance level was statistically evaluated by a listening test, where the subjects labeled noisy samples according to whether shouting was heard or not. A Bayesian detection system was constructed and statistically evaluated. Its performance was compared against that of human listeners, substituting different spectrum analysis methods in the feature extraction stage. Using features capable of taking into account the spectral fine structure (i.e., the fundamental frequency and its harmonics), the machine reached the detection level of humans even in the noisiest conditions. In the listening test, male listeners detected shouted speech significantly better than female listeners, especially with speakers making a smaller vocal effort increase for shouting.


Journal of the Acoustical Society of America | 2009

Representation of the vocal roughness of aperiodic speech sounds in the auditory cortex

Santeri Yrttiaho; Paavo Alku; Patrick J. C. May; Hannu Tiitinen

Aperiodicity of speech alters voice quality. The current study investigated the relationship between vowel aperiodicity and human auditory cortical N1m and sustained field (SF) responses with magnetoencephalography. Behavioral estimates of vocal roughness perception were also collected. Stimulus aperiodicity was experimentally varied by increasing vocal jitter with techniques that model the mechanisms of natural speech production. N1m and SF responses for vowels with high vocal jitter were reduced in amplitude as compared to those elicited by vowels of normal vocal periodicity. Behavioral results indicated that the ratings of vocal roughness increased up to the highest jitter values. Based on these findings, the representation of vocal jitter in the auditory cortex is suggested to be formed on the basis of reduced activity in periodicity-sensitive neural populations.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Bandwidth Extension of Telephone Speech to Low Frequencies Using Sinusoidal Synthesis and a Gaussian Mixture Model

Hannu Pulakka; Ulpu Remes; Santeri Yrttiaho; Kalle J. Palomäki; Mikko Kurimo; Paavo Alku

The quality of narrowband telephone speech is degraded by the limited audio bandwidth. This paper describes a method that extends the bandwidth of telephone speech to the frequency range 0-300 Hz. The method generates the lowest harmonics of voiced speech using sinusoidal synthesis. The energy in the extension band is estimated from spectral features using a Gaussian mixture model. The amplitudes and phases of the synthesized sinusoidal components are adjusted based on the amplitudes and phases of the narrowband input speech, which provides adaptivity to varying input bandwidth characteristics. The proposed method was evaluated with listening tests in combination with another bandwidth extension method for the frequency range 4-8 kHz. While the low-frequency bandwidth extension was not found to improve perceived quality, the method reduced dissimilarity with wideband speech.


Journal of the Acoustical Society of America | 2012

Signal-to-noise ratio adaptive post-filtering method for intelligibility enhancement of telephone speech.

Emma Jokinen; Santeri Yrttiaho; Hannu Pulakka; Martti Vainio; Paavo Alku

Post-filtering can be utilized to improve the quality and intelligibility of telephone speech. Previous studies have shown that energy reallocation with a high-pass type filter works effectively in improving the intelligibility of speech in difficult noise conditions. The present study introduces a signal-to-noise ratio adaptive post-filtering method that utilizes energy reallocation to transfer energy from the first formant to higher frequencies. The proposed method adapts to the level of the background noise so that, in favorable noise conditions, the post-filter has a flat frequency response and the effect of the post-filtering is increased as the level of the ambient noise increases. The performance of the proposed method is compared with a similar post-filtering algorithm and unprocessed speech in subjective listening tests which evaluate both intelligibility and listener preference. The results indicate that both of the post-filtering methods maintain the quality of speech in negligible noise conditions and are able to provide intelligibility improvement over unprocessed speech in adverse noise conditions. Furthermore, the proposed post-filtering algorithm performs better than the other post-filtering method under evaluation in moderate to difficult noise conditions, where intelligibility improvement is mostly required.


Journal of the Acoustical Society of America | 2010

Temporal integration of vowel periodicity in the auditory cortex

Santeri Yrttiaho; Hannu Tiitinen; Paavo Alku; Ismo Miettinen; Patrick J. C. May

Cortical sensitivity to the periodicity of speech sounds has been evidenced by larger, more anterior responses to periodic than to aperiodic vowels in several non-invasive studies of the human brain. The current study investigated the temporal integration underlying the cortical sensitivity to speech periodicity by studying the increase in periodicity-specific cortical activation with growing stimulus duration. Periodicity-specific activation was estimated from magnetoencephalography as the differences between the N1m responses elicited by periodic and aperiodic vowel stimuli. The duration of the vowel stimuli with a fundamental frequency (F0=106 Hz) representative of typical male speech was varied in units corresponding to the vowel fundamental period (9.4 ms) and ranged from one to ten units. Cortical sensitivity to speech periodicity, as reflected by larger and more anterior responses to periodic than to aperiodic stimuli, was observed when stimulus duration was 3 cycles or more. Further, for stimulus durations of 5 cycles and above, response latency was shorter for the periodic than for the aperiodic stimuli. Together the current results define a temporal window of integration for the periodicity of speech sounds in the F0 range of typical male speech. The length of this window is 3-5 cycles, or 30-50 ms.


NeuroImage | 2012

Cortical processing of degraded speech sounds: Effects of distortion type and continuity☆

Ismo Miettinen; Paavo Alku; Santeri Yrttiaho; Patrick J. C. May; Hannu Tiitinen

Human speech perception is highly resilient to acoustic distortions. In addition to distortions from external sound sources, degradation of the acoustic structure of the sound itself can substantially reduce the intelligibility of speech. The degradation of the internal structure of speech happens, for example, when the digital representation of the signal is impoverished by reducing its amplitude resolution. Further, the perception of speech is also influenced by whether the distortion is transient, coinciding with speech, or is heard continuously in the background. However, the complex effects of the acoustic structure and continuity of the distortion on the cortical processing of degraded speech are unclear. In the present magnetoencephalography study, we investigated how the cortical processing of degraded speech sounds as measured through the auditory N1m response is affected by variation of both the distortion type (internal, external) and the continuity of distortion (transient, continuous). We found that when the distortion was continuous, the N1m was significantly delayed, regardless of the type of distortion. The N1m amplitude, in turn, was affected only when speech sounds were degraded with transient internal distortion, which resulted in larger response amplitudes. The results suggest that external and internal distortions of speech result in divergent patterns of activity in the auditory cortex, and that the effects are modulated by the temporal continuity of the distortion.

Collaboration


Dive into the Santeri Yrttiaho's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carlo Magi

Helsinki University of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge