Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nobuyasu Itoh is active.

Publication


Featured researches published by Nobuyasu Itoh.


Pattern Recognition | 1990

A spelling correction method and its application to an OCR system

Hiroyasu Takahashi; Nobuyasu Itoh; Tomio Amano; Akio Yamashita

Abstract This paper describes a method of spelling correction consisting of two steps: selection of candidate words, and approximate string matching between the input word and each candidate word. Each word is classified and multi-indexed according to combinations of a constant number of characters in the word. Candidate words are selected fast and accurately, regardless of error types, as long as the number of errors is below a threshold. We applied this method to the post-processing of a printed alphanumeric OCR on a personal computer, thus making our OCR more reliable and user-friendly.


Journal of the Acoustical Society of America | 2005

Symbol insertion apparatus and method

Masafumi Nishimura; Nobuyasu Itoh; Shinsuke Mori

An apparatus and method are provided for the insertion of punctuation marks into appropriate positions in a sentence. An acoustic processor processes input utterances to extract voice data, and transforms the data into a feature vector. When the automatic insertion of punctuation marks is not performed, a language decoder processes the feature vector using only a general-purpose language model, and inserts a comma at a location marked in the voice data by the entry “ten,” for example, which is clearly a location at which a comma should be inserted. When automatic punctuation insertion is performed, the language decoder employs the general-purpose language model and the punctuation mark language model to identify an unvoiced, pause location for the insertion of a punctuation mark, such as a comma.


IEEE Computer | 1992

DRS: a workstation-based document recognition system for text entry

Tomio Amano; Akio Yamashita; Nobuyasu Itoh; Yoshinao Kobayashi; Shin Katoh; Kazuharu Toyokawa; Hiroyasu Takahashi

Document recognition system (DRS), a workstation-based prototype document analysis system that uses optical character recognition (OCR), is described. The system provides functions for image capture, block segmentation, page structure analysis, and character recognition with contextual postprocessing, as well as a user interface for error correction. All the functions except image capture and character recognition have been implemented by means of software for the Japanese edition of OS/2.<<ETX>>


international conference on acoustics, speech, and signal processing | 2009

Acoustically discriminative training for language models

Gakuto Kurata; Nobuyasu Itoh; Masafumi Nishimura

This paper introduces a discriminative training for language models (LMs) by leveraging phoneme similarities estimated from an acoustic model. To train an LM discriminatively, we needed the correct word sequences and the recognized results that Automatic Speech Recognition (ASR) produced by processing the utterances of those correct word sequences. But, sufficient utterances are not always available. We propose to generate the probable N-best lists, which the ASR may produce, directly from the correct word sequences by leveraging the phoneme similarities. We call this process the “Pseudo-ASR”. We train the LM discriminatively by comparing the correct word sequences and the corresponding N-best lists from the Pseudo-ASR. Experiments with real-life data from a Japanese call center showed that the LM trained with the proposed method improved the accuracy of the ASR.


international conference on acoustics, speech, and signal processing | 2007

Unsupervised Lexicon Acquisition from Speech and Text

Gakuto Kurata; Shinsuke Mori; Nobuyasu Itoh; Masafumi Nishimura

When introducing a large vocabulary continuous speech recognition (LVCSR) system into a specific domain, it is preferable to add the necessary domain-specific words and their correct pronunciations selectively to the lexicon, especially in the areas where the LVCSR system should be updated frequently by adding new words. In this paper, we propose an unsupervised method of word acquisition in Japanese, where no spaces exist between words. In our method, by taking advantage of the speech of the target domain, we selected the domain-specific words among an enormous number of word candidates extracted from the raw corpora. The experiments showed that the acquired lexicon was of good quality and that it contributed to the performance of the LVCSR system for the target domain.


international conference on acoustics, speech, and signal processing | 2012

N-best entropy based data selection for acoustic modeling

Nobuyasu Itoh; Tara N. Sainath; Dan Ning Jiang; Jie Zhou; Bhuvana Ramabhadran

This paper presents a strategy for efficiently selecting informative data from large corpora of untranscribed speech. Confidence-based selection methods (i.e., selecting utterances we are least confident about) have been a popular approach, though they only look at the top hypothesis when selecting utterances and tend to select outliers, therefore, not always improving overall recognition accuracy. Alternatively, we propose a method for selecting data looking at competing hypothesis by computing entropy of N-best hypothesis decoded by the baseline acoustic model. In addition we address the issue of outliers by calculating how representative a specific utterance is to all other unselected utterances via a tf-idf score. Experiments show that N-best entropy based selection (%relative 5.8 in 400-hour corpus) outperformed other conventional selection strategies; confidence based and lattice entropy based, and that tf-idf based representativeness improved the model further (%relative 6.2). A comparison with random selection is also presented. Finally model size impact is discussed.


Speech Communication | 2012

Acoustically discriminative language model training with pseudo-hypothesis

Gakuto Kurata; Abhinav Sethy; Bhuvana Ramabhadran; Ariya Rastrow; Nobuyasu Itoh; Masafumi Nishimura

Recently proposed methods for discriminative language modeling require alternate hypotheses in the form of lattices or N-best lists. These are usually generated by an Automatic Speech Recognition (ASR) system on the same speech data used to train the system. This requirement restricts the scope of these methods to corpora where both the acoustic material and the corresponding true transcripts are available. Typically, the text data available for language model (LM) training is an order of magnitude larger than manually transcribed speech. This paper provides a general framework to take advantage of this volume of textual data in the discriminative training of language models. We propose to generate probable N-best lists directly from the text material, which resemble the N-best lists produced by an ASR system by incorporating phonetic confusability estimated from the acoustic model of the ASR system. We present experiments with Japanese spontaneous lecture speech data, which demonstrate that discriminative LM training with the proposed framework is effective and provides modest gains in ASR accuracy.


international conference on acoustics, speech, and signal processing | 2011

Training of error-corrective model for ASR without using audio data

Gakuto Kurata; Nobuyasu Itoh; Masafumi Nishimura

This paper introduces a method to train an error-corrective model for Automatic Speech Recognition (ASR) without using audio data. In existing techniques, it is assumed that sufficient audio data of the target application is available and negative samples can be prepared by having ASR recognize this audio data. However, this assumption is not always true. We propose generating probable N-best lists, which the ASR may produce, directly from the text data of the target application by taking phoneme similarity into consideration. We call this process “Pseudo-ASR”. We conduct discriminative reranking with the error-corrective model by regarding the text data as positive samples and the N-best lists from the Pseudo-ASR as negative samples. Experiments with Japanese call center data showed that discriminative reranking based on the Pseudo-ASR improved the accuracy of the ASR.


international conference on computational linguistics | 2000

A stochastic parser based on a structural word prediction model

Shinsuke Mori; Masafumi Nishimura; Nobuyasu Itoh; Shiho Ogino; Hideo Watanabe

In this paper, we present a stochastic language model using dependency. This model considers a sentence as a word sequence and predicts each word from left to right. The history at each step of prediction is a sequence of partial parse trees covering the preceding words. First our model predicts the partial parse trees which have a dependency relation with the next word among them and then predicts the next word from only the trees which have a dependency relation with the next word. Our model is a generative stochastic model, thus this can be used not only as a parser but also as a language model of a speech recognizer. In our experiment, we prepared about 1,000 syntactically annotated Japanese sentences extracted from a financial newspaper and estimated the parameters of our model. We built a parser based on our model and tested it on approximately 100 sentences of the same newspaper. The accuracy of the dependency relation was 89.9%, the highest accuracy level obtained by Japanese stochastic parsers.


Ibm Journal of Research and Development | 1996

A document recognition system and its applications

Akio Yamashita; Tomio Amano; Yuki Hirayama; Nobuyasu Itoh; Shin Katoh; Takashi Mano; Kazuharu Toyokawa

This paper describes a document entry system called the Document Recognition System (DRS), which facilitates the conversion of printed documents into electronic form. DRS was developed on a personal computer (PC) with an adapter card for recognizing more than 3000 Kanji characters. It provides a flexible framework for object-oriented management of data and processing modules. The framework allows the user to change the combination of processing modules and to select pipelining (parallel processing) or sequential processing. DRS includes processing modules for layout analysis functions such as blob detection, block segmentation, and model matching, and for character recognition functions such as Kanji character recognition, Japanese postprocessing, postprocessing by a user, and error correction through a user interface. The character recognition functions on the card and the other processing-related recognition functions on the PC work cooperatively in the proposed framework. Within the basic framework, we have customized DRS for practical applications. Examples of successful applications-entry into a text database, creation of an electronic catalog, entry of family registration data, and entry of tag data in a manufacturing process-provide evidence of the processing accuracy and robustness of the framework.

Researchain Logo
Decentralizing Knowledge