Emilia Apostolova
DePaul University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Emilia Apostolova.
document recognition and retrieval | 2009
Daekeun You; Emilia Apostolova; Sameer K. Antani; Dina Demner-Fushman; George R. Thoma
Biomedical images are invaluable in medical education and establishing clinical diagnosis. Clinical decision support (CDS) can be improved by combining biomedical text with automatically annotated images extracted from relevant biomedical publications. In a previous study we reported 76.6% accuracy using supervised machine learning on the feasibility of automatically classifying images by combining figure captions and image content for usefulness in finding clinical evidence. Image content extraction is traditionally applied on entire images or on pre-determined image regions. Figure images articles vary greatly limiting benefit of whole image extraction beyond gross categorization for CDS due to the large variety. However, text annotations and pointers on them indicate regions of interest (ROI) that are then referenced in the caption or discussion in the article text. We have previously reported 72.02% accuracy in text and symbols localization but we failed to take advantage of the referenced image locality. In this work we combine article text analysis and figure image analysis for localizing pointer (arrows, symbols) to extract ROI pointed that can then be used to measure meaningful image content and associate it with the identified biomedical concepts for improved (text and image) content-based retrieval of biomedical articles. Biomedical concepts are identified using National Library of Medicines Unified Medical Language System (UMLS) Metathesaurus. Our methods report an average precision and recall of 92.3% and 75.3%, respectively on identifying pointing symbols in images from a randomly selected image subset made available through the ImageCLEF 2008 campaign.
international conference of the ieee engineering in medicine and biology society | 2009
Emilia Apostolova; David S. Channin; Dina Demner-Fushman; Jacob D. Furst; Steven L. Lytinen; Daniela Stan Raicu
Clinical narratives, such as radiology and pathology reports, are commonly available in electronic form. However, they are also commonly entered and stored as free text. Knowledge of the structure of clinical narratives is necessary for enhancing the productivity of healthcare departments and facilitating research. This study attempts to automatically segment medical reports into semantic sections. Our goal is to develop a robust and scalable medical report segmentation system requiring minimum user input for efficient retrieval and extraction of information from free-text clinical narratives. Hand-crafted rules were used to automatically identify a high-confidence training set. This automatically created training dataset was later used to develop metrics and an algorithm that determines the semantic structure of the medical reports. A word-vector cosine similarity metric combined with several heuristics was used to classify each report sentence into one of several pre-defined semantic sections. This baseline algorithm achieved 79% accuracy. A Support Vector Machine (SVM) classifier trained on additional formatting and contextual features was able to achieve 90% accuracy. Plans for future work include developing a configurable system that could accommodate various medical report formatting and content standards.
empirical methods in natural language processing | 2014
Emilia Apostolova; Noriko Tomuro
Information in visually rich formats such as PDF and HTML is often conveyed by a combination of textual and visual features. In particular, genres such as marketing flyers and info-graphics often augment textual information by its color, size, positioning, etc. As a result, traditional text-based approaches to information extraction (IE) could underperform. In this study, we present a supervised machine learning approach to IE from online commercial real estate flyers. We evaluated the performance of SVM classifiers on the task of identifying 12 types of named entities using a combination of textual and visual features. Results show that the addition of visual features such as color, size, and positioning significantly increased classifier performance.
north american chapter of the association for computational linguistics | 2009
Emilia Apostolova; Dina Demner-Fushman
Detailed image annotation necessary for reliable image retrieval involves not only annotating the image as a single artifact, but also annotating specific objects or regions within the image. Such detailed annotation is a costly endeavor and the available annotated image data are quite limited. This paper explores the feasibility of using image captions from scientific journals for the purpose of automatically annotating image regions. Salient image clues, such as an object location within the image or an object color, together with the associated explicit object mention, are extracted and classified using rule-based and SVM learners.
international conference on digital image processing | 2015
Payam Pourashraf; Noriko Tomuro; Emilia Apostolova
This paper presents an image classification model developed to classify images embedded in commercial real estate flyers. It is a component in a larger, multimodal system which uses texts as well as images in the flyers to automatically classify them by the property types. The role of the image classifier in the system is to provide the genres of the embedded images (map, schematic drawing, aerial photo, etc.), which to be combined with the texts in the flyer to do the overall classification. In this work, we used an ensemble learning approach and developed a model where the outputs of an ensemble of support vector machines (SVMs) are combined by a k-nearest neighbor (KNN) classifier. In this model, the classifiers in the ensemble are strong classifiers, each of which is trained to predict a given/assigned genre. Not only is our model intuitive by taking advantage of the mutual distinctness of the image genres, it is also scalable. We tested the model using over 3000 images extracted from online real estate flyers. The result showed that our model outperformed the baseline classifiers by a large margin.
north american chapter of the association for computational linguistics | 2015
Emilia Apostolova; Payam Pourashraf; Jeffrey Sack
Marketing materials such as flyers and other infographics are a vast online resource. In a number of industries, such as the commercial real estate industry, they are in fact the only authoritative source of information. Companies attempting to organize commercial real estate inventories spend a significant amount of resources on manual data entry of this information. In this work, we propose a method for extracting structured data from free-form commercial real estate flyers in PDF and HTML formats. We modeled the problem as text categorization and Named Entity Recognition (NER) tasks and applied a supervised machine learning approach (Support Vector Machines). Our dataset consists of more than 2,200 commercial real estate flyers and associated manually entered structured data, which was used to automatically create training datasets. Traditionally, text categorization and NER approaches are based on textual information only. However, information in visually rich formats such as PDF and HTML is often conveyed by a combination of textual and visual features. Large fonts, visually salient colors, and positioning often indicate the most relevant pieces of information. We applied novel features based on visual characteristics in addition to traditional text features and show that performance improved significantly for both the text categorization and NER tasks.
text retrieval conference | 2011
Dina Demner-Fushman; Swapna Abhyankar; Antonio Jimeno-Yepes; Russell F. Loane; Bastien Rance; François-Michel Lang; Nicholas C. Ide; Emilia Apostolova; Alan R. Aronson
meeting of the association for computational linguistics | 2011
Emilia Apostolova; Noriko Tomuro; Dina Demner-Fushman
cross-language evaluation forum | 2011
Matthew S. Simpson; Md. Mahmudur Rahman; Srinivas Phadnis; Emilia Apostolova; Dina Demner-Fushman; Sameer K. Antani; George R. Thoma
language resources and evaluation | 2010
Emilia Apostolova; Sean Neilan; Gary An; Noriko Tomuro; Steven L. Lytinen