Masato Tokuhisa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masato Tokuhisa is active.

Explore More

Publication

Featured researches published by Masato Tokuhisa.

soft computing | 2015

Order estimation of japanese paragraphs by supervised machine learning and various textual features

Masaki Murata; Satoshi Ito; Masato Tokuhisa; Qing Ma

Abstract In this paper, we propose a method to estimate the order of paragraphs by supervised machine learning. We use a support vector machine (SVM) for supervised machine learning. The estimation of paragraph order is useful for sentence generation and sentence correction. The proposed method obtained a high accuracy (0.84) in the order estimation experiments of the first two paragraphs of an article. In addition, it obtained a higher accuracy than the baseline method in the experiments using two paragraphs of an article. We performed feature analysis and we found that adnominals, conjunctions, and dates were effective for the order estimation of the first two paragraphs, and the ratio of new words and the similarity between the preceding paragraphs and an estimated paragraph were effective for the order estimation of all pairs of paragraphs.

Frontiers in Physiology | 2016

Genetic Network Inference Using Hierarchical Structure

Shuhei Kimura; Masato Tokuhisa; Mariko Okada-Hatakeyama

Many methods for inferring genetic networks have been proposed, but the regulations they infer often include false-positives. Several researchers have attempted to reduce these erroneous regulations by proposing the use of a priori knowledge about the properties of genetic networks such as their sparseness, scale-free structure, and so on. This study focuses on another piece of a priori knowledge, namely, that biochemical networks exhibit hierarchical structures. Based on this idea, we propose an inference approach that uses the hierarchical structure in a target genetic network. To obtain a reasonable hierarchical structure, the first step of the proposed approach is to infer multiple genetic networks from the observed gene expression data. We take this step using an existing method that combines a genetic network inference method with a bootstrap method. The next step is to extract a hierarchical structure from the inferred networks that is consistent with most of the networks. Third, we use the hierarchical structure obtained to assign confidence values to all candidate regulations. Numerical experiments are also performed to demonstrate the effectiveness of using the hierarchical structure in the genetic network inference. The improvement accomplished by the use of the hierarchical structure is small. However, the hierarchical structure could be used to improve the performances of many existing inference methods.

international conference on the computer processing of oriental languages | 2006

Pattern dictionary development based on non-compositional language model for japanese compound and complex sentences

Satoru Ikehara; Masato Tokuhisa; Jin’ichi Murakami; Masashi Saraki; Masahiro Miyazaki; Naoshi Ikeda

A large-scale sentence pattern dictionary (SP-dictionary) for Japanese compound and complex sentences has been developed. The dictionary has been compiled based on the non-compositional language model. Sentences with 2 or 3 predicates are extracted from a Japanese-to-English parallel corpus of 1 million sentences, and the compositional constituents contained within them are generalized to produce a SP-dictionary containing a total of 215,000 pattern pairs. In evaluation tests, the SP-dictionary achieved a syntactic coverage of 92% and a semantic coverage of 70%.

international conference on knowledge based and intelligent information and engineering systems | 2006

Construction and evaluation of text-dialog corpus with emotion tags focusing on facial expression in comics

Masato Tokuhisa; Jin’ichi Murakami; Satoru Ikehara

Large-scale text-dialog corpora with emotion tags are required to generate a knowledge base for emotional reasoning from text. Annotating emotion tags is known to suffer from problems with instability. These are caused by the lack of non-linguistic expressions (e.g. speech and facial expressions) in the text dialog. We aimed to construct a stable, usable text-dialog corpus with emotion tags. We first focused on facial expression in comics. Some comics contain many text dialogs that are similar to everyday conversation, and it is worth analyzing their text. We therefore extracted 29,538 sentences from 10 comic books and annotated face tags and emotion tags. Two annotators independently placed “temporary face/emotion tags” on stories and then decided what the “correct face/emotion tags” were by discussing them with each other. They acquired 16,635 correct emotion tags as a result. We evaluated the stability and usability of the corpus. We evaluated the correspondence between temporary and correct tags to assess stability, and found precision was 83.8% and recall was 78.8%. These were higher than for annotation without facial expressions (precision = 56.2%, recall = 51.5%). We extracted emotional suffix expressions from the corpus using a probabilistic method to evaluate usability. We could thus construct a text-dialog corpus with emotion tags and confirm its stability and usability.

International Conference of the Pacific Association for Computational Linguistics | 2015

Machine Translation Method Based on Non-compositional Semantics (Word-Level Sentence-Pattern-Based MT)

Jun Sakata; Jin’ichi Murakami; Masato Tokuhisa; Masaki Murata

To overcome the conventional machine translation method, Ikehara et al. proposed a machine translation scheme based on non-compositional semantics. This machine translation scheme requires many sentence patterns which can preserve the semantics of the expression structure. To use this machine translation scheme for Japanese-English machine translation, a compound and complex sentence pattern dictionary, called “ToribankSPD”, have been developed. This dictionary has three levels of sentence patterns: “word-level”, “phrase-level”, and “clause-level”. In this paper, according to the machine translation scheme based on non-compositional semantics, we implemented the Japanese-English sentence-pattern-based machine translation method using the word-level sentence patterns of ToribankSPD. In our experiments, the pattern matching rate was low (about 10 %). However, 72 out of 100 evaluated sentences used the sentence patterns that had an appropriate expression structure, and the translation accuracy of 55 sentences was high.

soft computing | 2014

Construction of concept network from large numbers of texts for information examination using TF-IDF and deletion of unrelated words

Yuta Doen; Masaki Murata; Ryuta Otake; Masato Tokuhisa; Qing Ma

We propose new methods to construct a network that describes information about the relations of things that are related to a certain keyword from electronic texts. The proposed method has two characteristics (TF-IDF and deletion of unrelated words). We extract related words using a term frequency-inverse document frequency (TF-IDF)-based method. Using TF-IDF, we extract only important words. We use TF-IDF as a weight for an edge in a network. We also delete unrelated words in the network. When expanding a network and adding words, unrelated words are likely to be added. The proposed system deletes such unrelated words using two methods, the topic-restricted and topic-related methods. We have experimentally confirmed that the proposed TF-IDF-based related word extraction method obtains better results than a method that uses conditional probabilities to extract related words. We also conducted experiments to verify the effectiveness of deleting unrelated words. We found that the topic-restricted method could delete most unrelated words and maintain approximately 0.8 of the related words from the original network. The topic-related method can delete some unrelated words and maintain most related words from the original network.

soft computing | 2014

Order estimation of Japanese paragraphs by supervised machine learning

Satoshi Ito; Masaki Murata; Masato Tokuhisa; Qing Ma

In this paper, we propose a method to estimate the order of paragraphs by supervised machine learning. We use a support vector machine (SVM) for supervised machine learning. The estimation of paragraph order is useful for sentence generation and sentence correction. The proposed method obtained a high accuracy (0.86) in the order estimation experiments of the first two paragraphs of an article and achieved the same accuracy as manual estimation. In addition, it obtained a higher accuracy than the baseline methods in the experiments using two paragraphs of an article. We performed feature analysis and we found that adnominals, conjunctions, and dates were effective for the order estimation of the first two paragraphs, and the ratio of new words and the similarity between the preceding paragraphs and an estimated paragraph were effective for the order estimation of all pairs of paragraphs. Moreover, we compared the order estimation of sentences and paragraphs and clarified differences. For the order estimation of the first two paragraphs, paragraph order estimation would be easier than sentence order estimation because paragraphs have more information than sentences. For the order estimation of all pairs of paragraphs, paragraph order estimation would be more difficult than sentence order estimation because a story may conclude in a paragraph.

international conference natural language processing | 2011

Automatic extraction of historical transition in researchers and research topics

Sanako Hori; Masaki Murata; Masato Tokuhisa; Qing Ma

It is necessary for a researcher to know historical transition in researchers and research topics. Although Web search engines can be used for obtaining such information, collecting the information across a long time period is difficult and laborious. Thus, we proposed a method for automatically extracting historical transition in researchers and research topics by using co-occurrence information. We used an original method in which a concept that co-occurred more often with a certain concept X near the time when the concept X was generated was more likely to be the root of the concept X. We compared our method with the previous method proposed by Kawanaka et al., where transition information on concepts was automatically extracted by analyzing tags that describe concepts in social bookmarks, and we confirmed that our method outperformed their method. The accuracies of the extracted transition information in researchers and research topics in our method were 0.66 and 0.65 respectively, whereas the corresponding accuracies in Kawanaka et al.s method were 0.25 and 0.61.

international conference on computational linguistics | 2008

Non-Compositional Language Model and Pattern Dictionary Development for Japanese Compound and Complex Sentences

Satoru Ikehara; Masato Tokuhisa; Jin’ichi Murakami

To realize high quality machine translation, we proposed a Non-Compositional Language Model, and developed a sentence pattern dictionary of 226,800 pattern pairs for Japanese compound and complex sentences consisting of 2 or 3 clauses. In pattern generation from a parallel corpus, Compositional Constituents that could be generalized were 74% of independent words, 24% of phrases and only 15% of clauses. This means that in Japanese-to-English MT, most of the translation results as shown in the parallel corpus could not be obtained by methods based on Compositional Semantics. This dictionary achieved a syntactic coverage of 98% and a semantic coverage of 78%. It will substantially improve translation quality.

computational intelligence in bioinformatics and computational biology | 2017

Inference of genetic networks from time-series of gene expression levels using random forests

Shuhei Kimura; Masato Tokuhisa; Mariko Okada-Hatakeyama

Huynh-Thu and colleagues initially introduce the random forest into field of genetic network inference. Their method, GENIE3, has performed well on genetic network inference problems. However, GENIE3 was designed only for analyzing static expression data that were measured under steady-state conditions. In order to infer genetic networks from time-series of gene expression data, this study proposes a new method based on the random forest. The proposed method has an ability to analyze both static and time-series data. When inferring a genetic network only from steady-state gene expression data, however, the proposed method is equivalent to GENIE3. Therefore, the proposed method can be seen as an extension of GENIE3. Through numerical experiments, we showed that the proposed method outperformed the existing inference methods on all of the 5 artificial genetic network inference problems.

Explore More