Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Giorgio Micca is active.

Publication


Featured researches published by Giorgio Micca.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

Lexical access to large vocabularies for speech recognition

Luciano Fissore; Pietro Laface; Giorgio Micca; Roberto Pieraccini

A large-vocabulary isolated-word recognition system based on the hypothesize-and-test paradigm is described. Word preselection is achieved by segmenting and classifying the input signal in terms of broad phonetic classes. A lattice of phonetic segments is generated and organized as a graph. Word hypothesization is obtained by matching this graph against the models of all vocabulary words, where a word model is itself a phonetic representation made in terms of a graph. A modified dynamic programming matching procedure gives an efficient solution to this graph-to-graph matching problem. Hidden Markov models (HMMs) of subword units are used as a more detailed knowledge in the verification step. The word candidates generated by the previous step are represented as sequences of diphone-like subword units, and the Viterbi algorithm is used for evaluating their likelihood. Lexical knowledge is organized in a tree structure where the initial common subsequences of word descriptions are shared, and a beam-search strategy carries on the most promising paths only. The results show that a complexity reduction of about 73% can be achieved by using the two-pass approach with respect to the direct approach, while the recognition accuracy remains comparable. >


international conference on acoustics speech and signal processing | 1988

Experimental results on large-vocabulary continuous speech recognition and understanding

Luciano Fissore; Egidio P. Giachin; Pietro Laface; Giorgio Micca; R. Pieraccini; Claudio Rullent

A continuous speech recognition and understanding system is presented that accepts queries about a restricted geographical domain, expressed in free but syntactically correct natural language, with a lexicon of the order of one thousand words. A lattice of word candidates hypothesized by the speaker dependent recognition level is the interface to an understanding module that performs the syntactic and semantic analysis. The recognition subsystem generates word hypotheses by exploiting hidden Markov models of sub-word units. Bottom-up constraints are also introduced to restrict the set of candidate words. The understanding module determines the most likely sequence of words and represents its meaning in a parse-tree suitable to access a database. It makes use of a modified caseframe analysis driven by the word hypotheses likelihood scores. The results of a set of experiments performed in 150 sentences collected from one speaker are given.<<ETX>>


international conference on acoustics, speech, and signal processing | 1989

On the use of neural networks for speaker independent isolated word recognition

P. Demichelis; L. Fissore; Pietro Laface; Giorgio Micca; Elio Piccolo

The authors present results obtained by applying the connectionist approach of multilayer perceptrons (MLPs) to three tasks of practical interest: classification of speech in terms of broad phonetic classes, speaker-independent recognition of yes/no answers through the dialed-up telephone line, and speaker-independent recognition of isolated digits through the telephone line. The first task assesses the capability of a simple MLP to generate nonlinear decision surfaces that discriminate among six broad phonetic classes. The MLP performance is actually comparable to that obtained by a hierarchical polynomial classifier. The second task deals with the sequential nature of speech. As short words like SI/NO do not give relevant problems of time alignment, the effects of different parts of the signal are taken into account by means of hidden units. A 98% recognition rate is achieved. For the third task, digital recognition, where the length of the words has a large range variation, a nonlinear time alignment is used that is performed through trace segmentation.<<ETX>>


international conference on acoustics speech and signal processing | 1988

Very large vocabulary isolated utterance recognition: a comparison between one pass and two pass strategies

L. Fissore; Pietro Laface; Giorgio Micca; R. Peraccini

A system for recognizing isolated utterances belonging to a very large vocabulary is presented that follows a two-pass strategy. The first step, hypothesization, consists in the selection of a subset of word candidates, starting from the segmentation of speech into six broad phonetic classes. This module is implemented through a dynamic programming algorithm working in a three-dimensional space. The search is performed on a tree representing a coarse description of the lexicon. The second step is the search for the best N candidates according to a maximum-likelihood criterion. Each word candidate is represented by a graph of subword hidden Markov models, and a tree structure of the whole word subset is built on line for an efficient implementation of the Viterbi algorithm. A comparison with a direct approach that does not use the hypothesization module shows that the two-pass approach has the same performance with an 80% reduction in computational complexity.<<ETX>>


international conference on acoustics speech and signal processing | 1988

Interaction between fast lexical access and word verification in large vocabulary continuous speech recognition

Luciano Fissore; Pietro Laface; Giorgio Micca; R. Pieraccini

Recently a two step strategy for large vocabulary isolated word recognition has been successfully experimented. The first step consists in the hypothesization of a reduced set of word candidates on the basis of broad bottom-up features, while the second one is the verification of the hypotheses using more detailed phonetic knowledge. This paper deals with its extension to continuous speech. A tight integration between the two steps rather than a hierarchical approach has been investigated. The hypothesization and the verification modules are implemented as processes running in parallel. Both processes represent lexical knowledge by a tree. Each node of the hypothesization tree is labeled by one of 6 broad phonetic classes. The nodes of the verification tree are, instead, the states of sub-word HMMs. The two processes cooperate to detect word hypotheses along the sentence.<<ETX>>


international conference on acoustics, speech, and signal processing | 1989

A word hypothesizer for a large vocabulary continuous speech understanding system

L. Fissore; Pietro Laface; Giorgio Micca; R. Pieraccini

A continuous-speech recognition and understanding system for a thousand-word vocabulary has been designed and implemented. It is able to answer queries put to a geographical database in natural Italian language. A discussion is presented of the recognition component of the system. It can produce a word lattice that is then processed by a syntactic-semantic component. In addition, a linguistic decoder exploiting word-pair constraints has been investigated. Its results have been compared to those obtained by similar approaches reported in the literature. The system relies on word preselection through lexical access by means of broad phonetic classes and on hidden Markov modeling of subword units. The improvements to the basic approach are presented and system performance is given. Average word accuracy and correct sentence recognition obtained for speaker-dependent tests performed by two speakers pronouncing 214 sentences are 94.5% and 89.3%, respectively. The perplexity of the word-pair language model is 25.<<ETX>>


Speech Communication | 1988

Strategies for lexical access to very large vocabularies

Luciano Fissore; Giorgio Micca; R. Pieraccini; P. Laface

Abstract A large vocabulary isolated word recognition system is described on a two pass strategy: word hypothesization and verification. Word preselection is achieved by segmenting and classifying the input signal in terms of 6 broad phonetic classes. To reduce storage and computational costs, lexical knowledge is organized in a tree structure where the initial common subsequences of word descriptions are shared, and a beam-search Dynamic Programming algorithm carries on the most promising paths only. In the second pass, word verification, a detailed representation of the phonemic structure of word candidates is used for estimating the most likely words. Each word candidate is modeled by a graph of subword Hidden Markov Models. Again, a tree-structure of the whole word subset is built online for an efficient implementation of a beam-search Viterbi algorithm that estimates the likelihood of the candidates. The results show that a complexity reduction of about 73% can be achieved by using the two pass approach with respect to the direct approach, while the recognition accuracy remains comparable.


international conference on acoustics, speech, and signal processing | 1987

Experimental results on a large lexicon access task

Pietro Laface; Giorgio Micca; R. Pieraccini

In this paper the problem of lexical access to large vocabularies by means of a coarse phonetic description of words is addressed. A generate and test approach is used: first a set of word candidates is extracted from the lexicon by means of a broad phonetic description of the input utterance, then a more detailed stochastic model of each word in this set, based on sub-word phonetic units, is obtained, and the likelihood of the candidate words is estimated using the Viterbi algorithm. Results of the application of the method to a large vocabulary isolated word recognition task are given. The candidate lists produced in the generation phase include the correct word in 98 times out of 100, their average size is of the order of 50 items for a 1011 word lexicon, while they do not exceed 300 units for a 13748 word lexicon.


Speech Communication | 2000

Multilinguality in voice activated information services: the P502 EURESCOM project

J. Azevedo; N. Beires; Francis Charpentier; M. Farrell; D. Johnston; E. LeFlour; Giorgio Micca; S. Militello; K. Schroeder

Abstract The paper describes the multilingual system developed within the framework of the P502 EURESCOM project. The system described provides information about major telephone services available in the UK, Germany, France, Italy and Portugal in five languages. We present the results of a number of experiments carried out in the five countries, aiming to try and answer some fundamental questions concerned with the exploitation of a multilingual service. Both technological and interface design issues have been investigated and several alternatives have been tested. We compared speech recognition accuracy and successful transaction completion rates of GSM and PSTN networks, and evaluated cross-country and cross-language effects. Using a new methodological approach to assessment, a powerful predictive model was developed. This model allowed users’ subjective ratings to be predicted from objective measurements. The results showed that an average Transaction Success rate of more than 92% was obtained when speech recognisers exhibiting good Word Recognition Accuracy were coupled to suitable dialogue interfaces in the IVR system.


Archive | 1990

The Recognition Algorithms

Luciano Fissore; Alfred Kaltenmeier; Pietro Laface; Giorgio Micca; R. Pieraccini

Subtask 2.1 of the P26 project was devoted to the study of the problems related to the development of the front-end of a speech understanding system. In the early stages of the project it was decided to separate the front-end, referred to in the following as the recognition module, from the understanding module, that deals with syntax and semantics. This decision was drawn taking into account several considerations mainly based on a practical point of view: the research groups working on Subtask 2.1 were at their first experience with speech understanding systems and their background was mainly in developing systems for small-vocabulary isolated and connected word recognition. Approaching the speech understanding problem required a strong effort both in knowledge acquisition and software development. For instance, methodologies for dealing with phonetic transcriptions of lexical items had to be developed from the beginning. More important was the lack of any practical feeling about the problem. Nobody knew (and very few in the world did at that time) what performance could be realistically achieved using a 1000-word vocabulary with a system based on sub-word unit modeling, hence which integrated strategy should be planned to attain a reasonably good understanding of the spoken sentences. The choice of a two-module system with a one-way interaction seemed the most appropriate for starting to acquire the proper knowledge on the problem. Besides, as people working on the two modules belonged to different groups and used different techniques as well as different programming languages (stochastic modeling and FORTRAN for the reeognition group, knowledge-based parsing and LISP for the understanding group), the best solution looked like the one by which the development of the two modules did not have to suffer from unavoidable mutual time dependencies.

Collaboration


Dive into the Giorgio Micca's collaboration.

Researchain Logo
Decentralizing Knowledge