Is this you? Create Your Porfile

Santiago Fernández

Dalle Molle Institute for Artificial Intelligence Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Santiago Fernández is active.

Explore More

Publication

Featured researches published by Santiago Fernández.

international conference on machine learning | 2006

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

Alex Graves; Santiago Fernández; Faustino J. Gomez; Jürgen Schmidhuber

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

international conference on artificial neural networks | 2007

Multi-dimensional recurrent neural networks

Alex Graves; Santiago Fernández; Juergen Schmidhuber

Recurrent neural networks (RNNs) have proved effective at one dimensional sequence learning tasks, such as speech and online handwriting recognition. Some of the properties that make RNNs suitable for such tasks, for example robustness to input warping, and the ability to access contextual information, are also desirable in multi-dimensional domains. However, there has so far been no direct way of applying RNNs to data with more than one spatio-temporal dimension. This paper introduces multi-dimensional recurrent neural networks, thereby extending the potential applicability of RNNs to vision, video processing, medical imaging and many other areas, while avoiding the scaling problems that have plagued other multi-dimensional models. Experimental results are provided for two image segmentation tasks.

international conference on artificial neural networks | 2007

An application of recurrent neural networks to discriminative keyword spotting

Santiago Fernández; Alex Graves; Jürgen Schmidhuber

The goal of keyword spotting is to detect the presence of specific spoken words in unconstrained speech. The majority of keyword spotting systems are based on generative hidden Markov models and lack discriminative capabilities. However, discriminative keyword spotting systems are currently based on frame-level posterior probabilities of sub-word units. This paper presents a discriminative keyword spotting system based on recurrent neural networks only, that uses information from long time spans to estimate word-level posterior probabilities. In a keyword spotting task on a large database of unconstrained speech the system achieved a keyword spotting accuracy of 84.5%

international conference on artificial neural networks | 2005

Bidirectional LSTM networks for improved phoneme classification and recognition

Alex Graves; Santiago Fernández; Jürgen Schmidhuber

In this paper, we carry out two experiments on the TIMIT speech corpus with bidirectional and unidirectional Long Short Term Memory (LSTM) networks. In the first experiment (framewise phoneme classification) we find that bidirectional LSTMoutperforms both unidirectional LSTMand conventional Recurrent Neural Networks (RNNs). In the second (phoneme recognition) we find that a hybrid BLSTM-HMM system improves on an equivalent traditional HMM system, as well as unidirectional LSTM-HMM.

Thermochimica Acta | 2003

Microcalorimetric determination of the cell specific heat rate in soils: relationship with the soil microbial population and biophysic significance

Nieves Barros; Sergio Feijóo; Santiago Fernández

Microcalorimetry was applied to study the basal respiration in several soils collected in Galicia (Northwest Spain) and in the Brazilian Amazon. The microbial activity was recorded microcalorimetrically as power–time lines during 24 h. The soil mass specific heat rate JQ/S and the cell specific heat rate JQ/N were calculated, and compared to the microbial population of the soil samples and to the number of microorganisms per organic carbon. Results showed an inverse hyperbolic relation between JQ/N and number of microorganisms of the samples, and between JQ/N and the number of microorganisms per organic carbon. The microcalorimetric indexes of microbial activity were affected by some other soil properties, as percent of carbon, nitrogen, and C/N ratio, as well as by the introduction of agriculture, which affected the microbial population. We believe that the cell specific heat rate can be considered as an index that indicates the efficiency of the energy utilization by soil microorganisms, similarly to the specific respiration activity. The reason of its negative correlation with the microbial density could be attributed to changes in the strategy of the energy utilization by microorganisms in soils.

Journal of the Acoustical Society of America | 1998

Context effects in the auditory identification of Spanish fricatives /f/ and /θ/: Hyper and Hypospeech

Sergio Feijóo; Santiago Fernández; Ramón Balsa

Twenty‐eight subjects heard 40 Spanish words in which the initial fricative was /f/ or /θ/, combined with vowels /e/ and /u/. Ten words were used for each particular combination (2 fricatives×2 vowels ×10 words). Two forms of speech (Hypo and Hyperspeech) and four conditions were considered: (1) Isolated fricative segment; (2) fricative segment + 51.2 ms of the following vowel; (3) fricative + whole following vowel; (4) whole word. The statistical analysis showed that, despite their differences in production and acoustic characteristics, isolated fricative segments were equally recognized in Hypo and Hyperspeech (cond. 1). Including the vowel ( conditions 2 and 3) significantly improved recognition of both fricatives for both forms of speech, except for the combination /f/+/e/: While fricative identification improves slightly in Hyperspeech, in Hypospeech, recognition decreases with respect to cond. (1). For this particular combination, an acceptable recognition rate is only achieved in the whole word con...

international conference on acoustics, speech, and signal processing | 2000

Perceptual effects of coarticulation in fricatives

Santiago Fernández; Sergio Feijóo; Ramón Balsa; Nieves Barros

The perceptual interaction between the consonant and the vowel in fricative+vowel syllables is evaluated. A set of conflicting cue stimuli was used to measure the relative importance of: (a) the influence of the vowel in the previous consonant, and (b) the influence of the fricative in the following vowel. It is concluded that the perceptual interaction between the consonant and the vowel in fricative-vowel syllables can not be explained only by the coarticulatory influence of the consonant or vowel on adjacent segments. The influence of the vowel in the previous fricative is perceptually irrelevant, while the influence of the fricative in the following vowel is more important for the identification of //spl theta// and /f/ than for /s/ and //spl int//. Actually it is perceptually irrelevant for //spl int// and a little important for /s/. Besides the influence of the fricative in the following vowel is notably dependent on the particular vowel.

Journal of the Acoustical Society of America | 1999

Influence of frequency range in the perceptual recognition of fricatives

Sergio Feijóo; Santiago Fernández; Ramón Balsa

The objective of this paper is to study the importance of various frequency bands for the identification of fricatives. Tokens were CV syllables formed by the combination of the Galician fricatives /θ,f,s,∫/ and the vowels /a,e,i,o,u/ which were pronounced in Hyperspeech form by a man and a woman. Tokens were sampled at 32 kHz and low‐pass filtered with cutoff frequencies of 11, 8, 5.5, 4, and 3 kHz. Thus, the total number of tokens was 240=4 fricatives× 5 vowels× 2 sexes× 6 frequencies. Thirty‐seven listeners carried out the perceptual experiments in two conditions: (1) whole fricative noise plus 100 ms of the following vowel, and (2) whole fricative noise. The results of the perceptual experiments show that as the cutoff frequency is lowered, (a) /s/ tends to be recognized as /θ/ in both conditions; (b) the fricative noise of /θ/ tends to be recognized as /f/, and (c) recognition of /f/ and /∫/ is affected to a lesser extent. Results suggest the importance of low‐frequency energy in the characterization...

Journal of the Acoustical Society of America | 2002

A speech perception test for children in classrooms

Sergio Feijóo; Santiago Fernández; José Manuel Álvarez

The combined effects of excessive ambient noise and reverberation in classrooms interfere with speech recognition and tend to degrade the learning process of young children. This paper reports a detailed analysis of a speech recognition test carried out with two different children populations of ages 8–9 and 10–11. Unlike English, Spanish has few minimal pairs to be used for phoneme recognition in a closed set manner. The test consisted in a series of two‐syllable nonsense words formed by the combination of all possible syllables in Spanish. The test was administered to the children as a dictation task in which they had to write down the words spoken by their female teacher. The test was administered in two blocks on different days, and later repeated to analyze its consistency. The rationale for this procedure was (a) the test should reproduce normal academic situations, (b) all phonological and lexical context effects should be avoided, (c) errors in both words and phonemes should be scored to unveil an...

Journal of the Acoustical Society of America | 2002

Temporal integration of acoustic cues in fricative perception

Santiago Fernández; Sergio Feijóo

An important issue in speech perception is to determine how the components of a syllable interact to enhance perception of both consonant and vowel. To date the mechanism underlying that integration has not yet been discovered. Different approaches to the temporal integration between fricative and vowel in a set of natural syllables were explored. Two hypotheses were considered. (a) The two segments are evaluated separately and then combined into a single percept; (b) both cues are evaluated jointly. To test those hypotheses several computational models were considered. If the F and V segments are evaluated separately, two statistical functions are available: An ‘‘OR’’ function corresponding to the perceptual hypothesis predicting that only one of the segments determines the identity of the fricative; an ‘‘AND’’ function corresponding to the perceptual hypothesis predicting the use of both cues. The hypothesis of the joint evaluation of both cues was tested using the whole FV segment. Their performances w...

Explore More