Noriko Tomuro
DePaul University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Noriko Tomuro.
computer assisted radiology and surgery | 2012
Yu Zhang; Noriko Tomuro; Jacob D. Furst; Daniela Stan Raicu
PurposeClassification of a suspicious mass (region of interest, ROI) in a mammogram as malignant or benign may be achieved using mass shape features. An ensemble system was built for this purpose and tested.MethodsMultiple contours were generated from a single ROI using various parameter settings of the image enhancement functions for the segmentation. For each segmented contour, the mass shape features were computed. For classification, the dataset was partitioned into four subsets based on the patient age (young/old) and the ROI size (large/small). We built an ensemble learning system consisting of four single classifiers, where each classifier is a specialist, trained specifically for one of the subsets. Those specialist classifiers are also an optimal classifier for the subset, selected from several candidate classifiers through preliminary experiment. In this scheme, the final diagnosis (malignant or benign) of an instance is the classification produced by the classifier trained for the subset to which the instance belongs.ResultsThe Digital Database for Screening Mammography (DDSM) from the University of South Florida was used to test the ensemble system for classification of masses, which achieved a 72% overall accuracy. This ensemble of specialist classifiers achieved better performance than single classification (56%).ConclusionAn ensemble classifier for mammography-detected masses may provide superior performance to any single classifier in distinguishing benign from malignant cases.
north american chapter of the association for computational linguistics | 2001
Noriko Tomuro
This paper describes a lexicon organized around systematic polysemy: a set of word senses that are related in systematic and predictable ways. The lexicon is derived by a fully automatic extraction method which utilizes a clustering technique called tree-cut. We compare our lexicon to WordNet cousins, and the inter-annotator disagreement observed between WordNet Semcor and DSO corpora.
Proceedings of SPIE | 2010
Yu Zhang; Noriko Tomuro; Jacob D. Furst; Daniela Stan Raicu
This paper presents a novel, edge-based segmentation method for identifying the mass contour (boundary) for a suspicious mass region (Region of Interest (ROI)) in a mammogram. The method first applies a contrast stretching function to adjust the image contrast, then uses a filtering function to reduce image noise. Next, for each pixel in a ROI, the energy descriptor (one of the Haralick descriptors) is computed from the co-occurrence matrix of the pixel; and the energy texture image of a ROI is obtained. From the energy texture image, the edges in the image are detected; and the mass region is identified from the closed-path edges. Finally, the boundary of the identified mass region is used as the contour of the segmented mass. We applied our method to ROI-marked mammogram images from the Digital Database for Screening Mammography (DDSM). Preliminary results show that the contours detected by our method outline the shape and boundary of a mass much more closely than the ROI markings made by radiologists.
international conference on semantic computing | 2007
Noriko Tomuro; Steven L. Lytinen; Kyoko Kanzaki; I. Hitoshi
This paper presents a new clustering algorithm called DSCBC which is designed to automatically discover word senses for polysemous words. DSCBC is an extension of CBC clustering (P. Pantel and D. Lin, 2002), and incorporates feature domain similarity: the similarity between the features themselves, obtained a priori from sources external to the dataset used at hand. When polysemous words are clustered, words that have similar sense patterns are often grouped together, producing polysemous clusters: a cluster in which features in several different domains are mixed in. By incorporating the feature domain similarity in clustering, DSCBC produces monosemous clusters, thereby discovering individual senses of polysemous words. In this work, we apply the algorithm to English adjectives, and compare the discovered senses against WordNet. The results show significant improvements by our algorithm over other clustering algorithms including CBC.
Simulation & Gaming | 2012
José Pablo Zagal; Noriko Tomuro; Andriy Shepitsen
Natural language processing (NLP) is a field of computer science and linguistics devoted to creating computer systems that use human (natural) language as input and/or output. The authors propose that NLP can also be used for game studies research. In this article, the authors provide an overview of NLP and describe some research possibilities that can be explored using NLP tools and techniques. The authors discuss these techniques by performing three different types of NLP analyses of a significant corpus of online videogame reviews: (a) By using techniques such as word and syllable counting, the authors analyze the readability of professionally written game reviews, finding that, across a variety of indicators, game reviews are written for a secondary education reading level; (b) the authors analyze hundreds of thousands of user-submitted game reviews using part-of-speech tagging, parsing, and clustering to examine how gameplay is described. The findings of this study in this area highlight the primary aesthetics elements of gameplay according to the general public of game players; and (c) the authors show how sentiment analysis, or the classification of opinions and feelings based on the words used in a text and the relationship between those words, can be used to explore the circumstances in which certain negatively charged words may be used positively and for what reasons in the domain of videogames. The authors conclude with ideas for future research, including how NLP can be used to complement other avenues of inquiry.
international conference on computational linguistics | 2002
Noriko Tomuro
Question terminology is a set of terms which appear in keywords, idioms and fixed expressions commonly observed in questions. This paper investigates ways to automatically extract question terminology from a corpus of questions and represent them for the purpose of classifying by question type. Our key interest is to see whether or not semantic features can enhance the representation of strongly lexical nature of question sentences. We compare two feature sets: one with lexical features only, and another with a mixture of lexical and semantic features. For evaluation, we measure the classification accuracy made by two machine learning algorithms, C5.0 and PEBLS, by using a procedure called domain cross-validation, which effectively measures the domain transferability of features.
empirical methods in natural language processing | 2014
Emilia Apostolova; Noriko Tomuro
Information in visually rich formats such as PDF and HTML is often conveyed by a combination of textual and visual features. In particular, genres such as marketing flyers and info-graphics often augment textual information by its color, size, positioning, etc. As a result, traditional text-based approaches to information extraction (IE) could underperform. In this study, we present a supervised machine learning approach to IE from online commercial real estate flyers. We evaluated the performance of SVM classifiers on the task of identifying 12 types of named entities using a combination of textual and visual features. Results show that the addition of visual features such as color, size, and positioning significantly increased classifier performance.
Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources | 2009
Noriko Tomuro; Andriy Shepitsen
One of the difficulties in using Folksonomies in computational systems is tag ambiguity: tags with multiple meanings. This paper presents a novel method for building Folksonomy tag ontologies in which the nodes are disambiguated. Our method utilizes a clustering algorithm called DSCBC, which was originally developed in Natural Language Processing (NLP), to derive committees of tags, each of which corresponds to one meaning or domain. In this work, we use Wikipedia as the external knowledge source for the domains of the tags. Using the committees, an ambiguous tag is identified as one which belongs to more than one committee. Then we apply a hierarchical agglomerative clustering algorithm to build an ontology of tags. The nodes in the derived ontology are disambiguated in that an ambiguous tag appears in several nodes in the ontology, each of which corresponds to one meaning of the tag. We evaluate the derived ontology for its ontological density (how close similar tags are placed), and its usefulness in applications, in particular for a personalized tag retrieval task. The results showed marked improvements over other approaches.
MCBR-CDS'09 Proceedings of the First MICCAI international conference on Medical Content-Based Retrieval for Clinical Decision Support | 2009
Yu Zhang; Noriko Tomuro; Jacob D. Furst; Daniela Stan Raicu
This paper presents an ensemble learning approach for classifying masses in mammograms as malignant or benign by using Breast Image Report and Data System (BI-RADS) descriptors. We first identify the most important BI-RADS descriptors based on the information gain measure. Then we quantize the fine-grained categories of those descriptors into coarse-grained categories. Finally we apply an ensemble of multiple Machine Learning classification algorithms to produce the final classification. Experimental results showed that using the coarse-grained categories achieved equivalent accuracies compared with using the full fine-grained categories, and moreover the ensemble learning method slightly improved the overall classification. Our results indicate that automatic clinical decision systems can be simplified by focusing on coarse-grained BI-RADS categories without losing any accuracy for classifying masses in mammograms.
International Conference on NLP | 2012
Kevin Raison; Noriko Tomuro; Steven L. Lytinen; José Pablo Zagal
We present our preliminary work on extracting fine-grained user opinions from game review texts. In sentiment analysis, user-generated texts such as blogs, comments and reviews are usually represented by the words which appeared in the texts. However, for complex multi-faceted objects such as games, single words are not sufficient to represent opinions on individual aspects of the object. We propose to represent such an object by pairs of aspect and each aspect’s quality/value, for example “great-graphics”. We used a large adjective-context co-occurrence matrix extracted from user reviews posted at a game site, and applied co-clustering to reduce the dimensions of the matrix. The derived co-clusters are pairs of row clusters × column clusters. By examining the derived co-clusters, we were able to discover the aspects and their qualities which the users care about strongly in games.