Okko Räsänen
Aalto University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Okko Räsänen.
Computer Speech & Language | 2015
Jouni Pohjalainen; Okko Räsänen; Serdar Kadioglu
This study focuses on feature selection in paralinguistic analysis and presents recently developed supervised and unsupervised methods for feature subset selection and feature ranking. Using the standard k-nearest-neighbors (kNN) rule as the classification algorithm, the feature selection methods are evaluated individually and in different combinations in seven paralinguistic speaker trait classification tasks. In each analyzed data set, the overall number of features highly exceeds the number of data points available for training and evaluation, making a well-generalizing feature selection process extremely difficult. The performance of feature sets on t feature selection data is observed to be a poor indicator of their performance on unseen data. The studied feature selection methods clearly outperform a standard greedy hill-climbing selection algorithm by being more robust against overfitting. When the selection methods are suitably combined with each other, the performance in the classification task can be further improved. In general, it is shown that the use of automatic feature selection in paralinguistic analysis can be used to reduce the overall number of features to a fraction of the original feature set size while still achieving a comparable or even better performance than baseline support vector machine or random forest classifiers using the full feature set. The most typically selected features for recognition of speaker likability, intelligibility and five personality traits are also reported.
Speech Communication | 2012
Okko Räsänen
This work reviews a number of existing computational studies concentrated on the question of how spoken language can be learned from continuous speech in the absence of linguistically or phonetically motivated background knowledge, a situation faced by human infants when they first attempt to learn their native language. Specifically, the focus is on how phonetic categories and word-like units can be acquired purely on the basis of the statistical structure of speech signals, possibly aided by some articulatory or visual constraints. The outcomes and shortcomings of the existing work are reflected onto findings from experimental and theoretical studies. Finally, some of the open questions and possible future research directions related to the computational models of language acquisition are discussed.
Cognition | 2011
Okko Räsänen
Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this work, a computational model for word segmentation and learning of primitive lexical items from continuous speech is presented. The model does not utilize any a priori linguistic or phonemic knowledge such as phones, phonemes or articulatory gestures, but computes transitional probabilities between atomic acoustic events in order to detect recurring patterns in speech. Experiments with the model show that word segmentation is possible without any knowledge of linguistically relevant structures, and that the learned ungrounded word models show a relatively high selectivity towards specific words or frequently co-occurring combinations of short words.
Pattern Recognition | 2012
Okko Räsänen; Unto K. Laine
An efficient method for weakly supervised pattern discovery and recognition from discrete categorical sequences is introduced. The method utilizes two parallel sources of data: categorical sequences carrying some temporal or spatial information and a set of labeled, but not exactly aligned, contextual events related to the sequences. From these inputs the method builds associative models able to describe systematically co-occurring structures in the input streams. The learned models, based on transitional probabilities of events observed at several different time lags, inherently segment and classify novel sequences into contextual categories. Learning and recognition processes are purely incremental and computationally cheap, making the approach suitable for on-line learning tasks. The capabilities of the algorithm are demonstrated in a keyword learning task from continuous infant-directed speech and a continuous speech recognition task operating at varying noise levels.
NeuroImage | 2013
Okko Räsänen; Marjo Metsäranta; Sampsa Vanhatalo
The degree of interhemispheric synchrony in the neonatal EEG assessment refers to the co-occurrence of activity bouts during quiet sleep or burst suppression, and it has been widely considered as a key component in assessing background activity. However, no objective measures have been published for measuring it, and all conventionally used visual criteria suffer from significant ambiguities. Our present study aimed to develop such a quantitative measure of (a)synchrony, called activation synchrony index (ASI). We developed the ASI paradigm based on the testing of statistical independence between two quantized amplitude envelopes of wideband-filtered signals where higher frequencies had been pre-emphasized. The core parameter settings of ASI paradigm were defined using a smaller EEG dataset, and the final ASI paradigm was tested using a visually classified dataset of EEG records from 33 fullterm and 25 preterm babies, which showed varying grades of asynchrony. Our findings show that ASI could distinguish all EEG recordings with normal synchrony from those with modest or severe asynchrony at individual level, and there was a highly significant correlation (p<0.001) between ASI and the visually assessed grade of asynchrony. In addition, we showed that i) ASI is stable in recordings over several hours in duration, such as the typical neonatal brain monitoring, that ii) ASI values are sensitive to sleep stage, and that iii) they correlate with age in the preterm babies. Comparison of ASI to other three potential paradigms demonstrated a significant competitive advantage. Finally, ASI was found to be remarkably resistant to common artefacts as tested by adding significant level of real EEG artefacts from noisy recordings. An objective and reliable measure of (a)synchrony may open novel avenues for using ASI as a putative early functional biomarker in the neonatal brain, as well as for building proper automated classifiers of neonatal EEG background. Notably, the signature of synchrony of this kind, temporal coincidence of activity bouts, is a common feature in biological signals, suggesting that ASI may also hold promise as a useful paradigm for assessing temporal synchrony in other biosignals as well, such as muscle activities or movements.
Archive | 2011
Okko Räsänen; Unto K. Laine; Toomas Altosaar
Automated segmentation of speech into phone-sized units has been a subject of study for over 30 years, as it plays a central role in many speech processing and ASR applications. While segmentation by hand is relatively precise, it is also extremely laborious and tedious. This is one reason why automated methods are widely utilized. For example, phonetic analysis of speech (Mermelstein, 1975), audio content classification (Zhang & Kuo, 1999), and word recognition (Antal, 2004) utilize segmentation for dividing continuous audio signals into discrete, non-overlapping units in order to provide structural descriptions for the different parts of a processed signal. In the field of automatic segmentation of speech, the best results have so far been achieved with semi-automatic HMMs that require prior training (see, e.g., Makhoul & Schwartz, 1994). Algorithms using additional linguistic information like phonetic annotation during the segmentation process are often also effective (e.g., Hemert, 1991). The use of these types of algorithms is well justified for several different purposes, but extensive training may not always be possible, nor may adequately rich descriptions of speech material be available, for instance, in real-time applications. Training of the algorithms also imposes limitations to the material that can be segmented effectively, with the results being highly dependent on, e.g., the language and vocabulary of the training and target material. Therefore, several researchers have concurrently worked on blind speech segmentation methods that do not require any external or prior knowledge regarding the speech to be segmented (Almpanidis & Kotropoulos, 2008; Aversano et al., 2001; Cherniz et al., 2007; Esposito & Aversano, 2005; Estevan et al., 2007; Sharma & Mammone, 1996). These so called blind segmentation algorithms have many potential applications in the field of speech processing that are complementary to supervised segmentation, since they do not need to be trained extensively on carefully prepared speech material. As an important property, blind algorithms do not necessarily make assumptions about underlying signal conditions whereas in trained algorithms possible mismatches between training data and processed input cause problems and errors in segmentation, e.g., due to changes in background noise conditions or microphone properties. Blind methods also provide a valuable tool for investigating speech from a basic level such as phonetic research, they are language independent, and they can be used as a processing step in self-learning agents attempting to make sense of sensory input where externally supplied linguistic knowledge cannot be used (e.g., Rasanen & Driesen, 2009; Rasanen et al., 2008).
Neuroscience | 2016
Ninah Koolen; Anneleen Dereymaeker; Okko Räsänen; Katrien Jansen; Jan Vervisch; Vladimir Matic; Gunnar Naulaers; M. De Vos; S. Van Huffel; Sampsa Vanhatalo
Highlights • We study the early development of cortical activations synchrony index (ASI).• Cortical activations become increasingly synchronized during the last trimester.• Interhemispheric synchrony increases more than intrahemispheric synchrony.• Our EEG metric ASI can be directly translated to experimental animal studies.• ASI holds promise as an early functional biomarker of brain networks.
IEEE Transactions on Neural Networks | 2016
Okko Räsänen; Jukka Saarinen
Modeling and prediction of temporal sequences is central to many signal processing and machine learning applications. Prediction based on sequence history is typically performed using parametric models, such as fixed-order Markov chains (n-grams), approximations of high-order Markov processes, such as mixed-order Markov models or mixtures of lagged bigram models, or with other machine learning techniques. This paper presents a method for sequence prediction based on sparse hyperdimensional coding of the sequence structure and describes how higher order temporal structures can be utilized in sparse coding in a balanced manner. The method is purely incremental, allowing real-time online learning and prediction with limited computational resources. Experiments with prediction of mobile phone use patterns, including the prediction of the next launched application, the next GPS location of the user, and the next artist played with the phone media player, reveal that the proposed method is able to capture the relevant variable-order structure from the sequences. In comparison with the n-grams and the mixed-order Markov models, the sparse hyperdimensional predictor clearly outperforms its peers in terms of unweighted average recall and achieves an equal level of weighted average recall as the mixed-order Markov chain but without the batch training of the mixed-order model.
Psychological Review | 2015
Okko Räsänen; Heikki Rasilo
Human infants learn meanings for spoken words in complex interactions with other people, but the exact learning mechanisms are unknown. Among researchers, a widely studied learning mechanism is called cross-situational learning (XSL). In XSL, word meanings are learned when learners accumulate statistical information between spoken words and co-occurring objects or events, allowing the learner to overcome referential uncertainty after having sufficient experience with individually ambiguous scenarios. Existing models in this area have mainly assumed that the learner is capable of segmenting words from speech before grounding them to their referential meaning, while segmentation itself has been treated relatively independently of the meaning acquisition. In this article, we argue that XSL is not just a mechanism for word-to-meaning mapping, but that it provides strong cues for proto-lexical word segmentation. If a learner directly solves the correspondence problem between continuous speech input and the contextual referents being talked about, segmentation of the input into word-like units emerges as a by-product of the learning. We present a theoretical model for joint acquisition of proto-lexical segments and their meanings without assuming a priori knowledge of the language. We also investigate the behavior of the model using a computational implementation, making use of transition probability-based statistical learning. Results from simulations show that the model is not only capable of replicating behavioral data on word learning in artificial languages, but also shows effective learning of word segments and their meanings from continuous speech. Moreover, when augmented with a simple familiarity preference during learning, the model shows a good fit to human behavioral data in XSL tasks. These results support the idea of simultaneous segmentation and meaning acquisition and show that comprehensive models of early word segmentation should take referential word meanings into account. (PsycINFO Database Record
Frontiers in Human Neuroscience | 2014
Ninah Koolen; Anneleen Dereymaeker; Okko Räsänen; Katrien Jansen; Jan Vervisch; Vladimir Matic; Maarten De Vos; Sabine Van Huffel; Gunnar Naulaers; Sampsa Vanhatalo
A key feature of normal neonatal EEG at term age is interhemispheric synchrony (IHS), which refers to the temporal co-incidence of bursting across hemispheres during trace alternant EEG activity. The assessment of IHS in both clinical and scientific work relies on visual, qualitative EEG assessment without clearly quantifiable definitions. A quantitative measure, activation synchrony index (ASI), was recently shown to perform well as compared to visual assessments. The present study was set out to test whether IHS is stable enough for clinical use, and whether it could be an objective feature of EEG normality. We analyzed 31 neonatal EEG recordings that had been clinically classified as normal (n = 14) or abnormal (n = 17) using holistic, conventional visual criteria including amplitude, focal differences, qualitative synchrony, and focal abnormalities. We selected 20-min epochs of discontinuous background pattern. ASI values were computed separately for different channel pair combinations and window lengths to define them for the optimal ASI intraindividual stability. Finally, ROC curves were computed to find trade-offs related to compromised data lengths, a common challenge in neonatal EEG studies. Using the average of four consecutive 2.5-min epochs in the centro-occipital bipolar derivations gave ASI estimates that very accurately distinguished babies clinically classified as normal vs. abnormal. It was even possible to draw a cut-off limit (ASI~3.6) which correctly classified the EEGs in 97% of all cases. Finally, we showed that compromising the length of EEG segments from 20 to 5 min leads to increased variability in ASI-based classification. Our findings support the prior literature that IHS is an important feature of normal neonatal brain function. We show that ASI may provide diagnostic value even at individual level, which strongly supports its use in prospective clinical studies on neonatal EEG as well as in the feature set of upcoming EEG classifiers.