Luciano Fissore
CSELT
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Luciano Fissore.
international conference on acoustics, speech, and signal processing | 1995
Pietro Laface; Claudio Vair; Luciano Fissore
The paper presents a fast segmental Viterbi algorithm. A new search strategy particularly effective for very large vocabulary word recognition. It performs a tree based, time synchronous, left-to-right beam search that develops time-dependent acoustic and phonetic hypotheses. At any given time, it makes active a sub-word unit associated to an arc of a lexical tree only if that time is likely to be the boundary between the current and the next unit. This new technique, tested with a vocabulary of 188892 directory entries, achieves the same results obtained with the Viterbi algorithm, with a 35% speedup. Results are also presented for a 718 word, speaker independent continuous speech recognition task.
IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989
Luciano Fissore; Pietro Laface; Giorgio Micca; Roberto Pieraccini
A large-vocabulary isolated-word recognition system based on the hypothesize-and-test paradigm is described. Word preselection is achieved by segmenting and classifying the input signal in terms of broad phonetic classes. A lattice of phonetic segments is generated and organized as a graph. Word hypothesization is obtained by matching this graph against the models of all vocabulary words, where a word model is itself a phonetic representation made in terms of a graph. A modified dynamic programming matching procedure gives an efficient solution to this graph-to-graph matching problem. Hidden Markov models (HMMs) of subword units are used as a more detailed knowledge in the verification step. The word candidates generated by the previous step are represented as sequences of diphone-like subword units, and the Viterbi algorithm is used for evaluating their likelihood. Lexical knowledge is organized in a tree structure where the initial common subsequences of word descriptions are shared, and a beam-search strategy carries on the most promising paths only. The results show that a complexity reduction of about 73% can be achieved by using the two-pass approach with respect to the direct approach, while the recognition accuracy remains comparable. >
international conference on acoustics, speech, and signal processing | 1984
M. Cravero; Luciano Fissore; Roberto Pieraccini; Carlo Scagliola
This paper describes a connected speech recognition system based on Markov models. The performance of this system was analyzed and compared with that of a system which uses prototypes of words instead of Markov models. Some preliminary results are reported with reference to the recognition of connected digits.
international conference on acoustics speech and signal processing | 1988
Luciano Fissore; Egidio P. Giachin; Pietro Laface; Giorgio Micca; R. Pieraccini; Claudio Rullent
A continuous speech recognition and understanding system is presented that accepts queries about a restricted geographical domain, expressed in free but syntactically correct natural language, with a lexicon of the order of one thousand words. A lattice of word candidates hypothesized by the speaker dependent recognition level is the interface to an understanding module that performs the syntactic and semantic analysis. The recognition subsystem generates word hypotheses by exploiting hidden Markov models of sub-word units. Bottom-up constraints are also introduced to restrict the set of candidate words. The understanding module determines the most likely sequence of words and represents its meaning in a parse-tree suitable to access a database. It makes use of a modified caseframe analysis driven by the word hypotheses likelihood scores. The results of a set of experiments performed in 150 sentences collected from one speaker are given.<<ETX>>
international conference on acoustics speech and signal processing | 1988
Luciano Fissore; Pietro Laface; Giorgio Micca; R. Pieraccini
Recently a two step strategy for large vocabulary isolated word recognition has been successfully experimented. The first step consists in the hypothesization of a reduced set of word candidates on the basis of broad bottom-up features, while the second one is the verification of the hypotheses using more detailed phonetic knowledge. This paper deals with its extension to continuous speech. A tight integration between the two steps rather than a hierarchical approach has been investigated. The hypothesization and the verification modules are implemented as processes running in parallel. Both processes represent lexical knowledge by a tree. Each node of the hypothesization tree is labeled by one of 6 broad phonetic classes. The nodes of the verification tree are, instead, the states of sub-word HMMs. The two processes cooperate to detect word hypotheses along the sentence.<<ETX>>
international conference on acoustics, speech, and signal processing | 1997
Luciano Fissore; Pietro Laface; Franco Ravera
Isolated word speech recognizers with fixed vocabularies are often used to provide vocal services through the telephone line. The paper illustrates a simple postprocessing approach that allows the hypotheses produced by a hidden Markov model recognizer to be rescored taking into account the global temporal structure of the pronounced words. Our approach does not directly rely on state/word duration modeling. It models, instead, the global time variations of the spectral features of each word and their correlation in time: two important perceptual cues that are only partially exploited by standard HMMs. This method has been evaluated using three isolated word speaker independent systems with vocabulary of different size and complexity. We show that, with minimal overhead, the recognition performance improves not only for small vocabulary recognition systems such as the isolated digit one, or for the recognition of 26 Italian spelling names, but also for a system with a 475 city name vocabulary included in a vocal service that provides information about the main railway connections.
international conference on acoustics, speech, and signal processing | 1991
Luciano Fissore; Pietro Laface; G. Micca
Attention is given to a comparison of the performance of discrete and continuous density hidden Markov models (DDHMMs and CDHMMs) on a 786-word E-mail inquiry task performed by the speaker-independent word recognition component of a speech understanding system. This comparison between DDHMMs and CDHMMs has also been carried out by training speaker-dependent models. The authors also present the results of a set of experiments carried out with the aim at automatically selecting a suitable set of subword unit models by a clustering procedure. The recognizer gives word accuracy (WA) rates of 67.8% and 75.3% by using DDHMMs and CDHMMs, respectively, without any linguistic constraints. On the same task, WA rates of 87.1% and 85.9% have been obtained in the speaker-dependent mode.<<ETX>>
Speech Communication | 1988
Luciano Fissore; Giorgio Micca; R. Pieraccini; P. Laface
Abstract A large vocabulary isolated word recognition system is described on a two pass strategy: word hypothesization and verification. Word preselection is achieved by segmenting and classifying the input signal in terms of 6 broad phonetic classes. To reduce storage and computational costs, lexical knowledge is organized in a tree structure where the initial common subsequences of word descriptions are shared, and a beam-search Dynamic Programming algorithm carries on the most promising paths only. In the second pass, word verification, a detailed representation of the phonemic structure of word candidates is used for estimating the most likely words. Each word candidate is modeled by a graph of subword Hidden Markov Models. Again, a tree-structure of the whole word subset is built online for an efficient implementation of a beam-search Viterbi algorithm that estimates the likelihood of the candidates. The results show that a complexity reduction of about 73% can be achieved by using the two pass approach with respect to the direct approach, while the recognition accuracy remains comparable.
international conference on acoustics, speech, and signal processing | 1992
Luciano Fissore; Pietro Laface; P. Ruscitti
The authors describe the development of a speaker-independent isolated word recognizer for a voice dialing application operating in the car environment. Speaker-dependent and speaker-independent approaches are addressed and compared. Simple continuous hidden Markov models (HMMs) are used for speaker-dependent recognition; multiple codebook discrete and continuous HMMs are trained by speaker-independent reference data derived from a large database of speech collected inside several cars under a wide variety of driving conditions and by a large number of speakers from different Italian regions. By modeling separately two models (one for male and one for female speakers) for each word with 12 state continuous density whole word HMMs with eight diagonal covariance Gaussians per state, and performing a beam search Viterbi decoding a recognition rate of 99% has been obtained (65 errors out of 6423 words).<<ETX>>
conference of the international speech communication association | 1992
Paolo Baggia; Luciano Fissore; Elisabetta Gerbino; Egidio P. Giachin; Claudio Rullent
Abstract A parser for continuous speech has to deal with lattices where the word hypotheses of the correct sentence are not usually perfectly aligned and short function words may be missing. To cope with these problems, a two-way interaction between the recognition module and the parser, called feedback verification procedure (FVP), has been investigated. The parser generates many solutions, that are fed back to the recognizer which realigns them against the acoustical data, finds the missing function words among the given candidates, and attributes them a new score. The best scoring solution is finally selected by the parser. Results on a 787-word, speaker-independent, telephone-bandwidth continuous speech recognition task are presented.