Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hong-Jun Yoon is active.

Publication


Featured researches published by Hong-Jun Yoon.


Proceedings of SPIE | 2014

Gaze as a biometric

Hong-Jun Yoon; Tandy R. Carmichael; Georgia D. Tourassi

Two people may analyze a visual scene in two completely different ways. Our study sought to determine whether human gaze may be used to establish the identity of an individual. To accomplish this objective we investigated the gaze pattern of twelve individuals viewing still images with different spatial relationships. Specifically, we created 5 visual “dotpattern” tests to be shown on a standard computer monitor. These tests challenged the viewer’s capacity to distinguish proximity, alignment, and perceptual organization. Each test included 50 images of varying difficulty (total of 250 images). Eye-tracking data were collected from each individual while taking the tests. The eye-tracking data were converted into gaze velocities and analyzed with Hidden Markov Models to develop personalized gaze profiles. Using leave-one-out cross-validation, we observed that these personalized profiles could differentiate among the 12 users with classification accuracy ranging between 53% and 76%, depending on the test. This was statistically significantly better than random guessing (i.e., 8.3% or 1 out of 12). Classification accuracy was higher for the tests where the users’ average gaze velocity per case was lower. The study findings support the feasibility of using gaze as a biometric or personalized biomarker. These findings could have implications in Radiology training and the development of personalized e-learning environments.


IEEE Journal of Biomedical and Health Informatics | 2018

Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports

John X. Qiu; Hong-Jun Yoon; Paul A. Fearn; Georgia D. Tourassi

Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. In this study, we investigated deep learning and a convolutional neural network (CNN), for extracting ICD-O-3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations as the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro- and macro-F score increases of up to 0.132 and 0.226, respectively, when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on the CNN method and cancer site. These encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.


international conference on augmented cognition | 2017

Geometry and Gesture-Based Features from Saccadic Eye-Movement as a Biometric in Radiology

Folami Alamudun; Tracy Hammond; Hong-Jun Yoon; Georgia D. Tourassi

In this study, we present a novel application of sketch gesture recognition on eye-movement for biometric identification and estimating task expertise. The study was performed for the task of mammographic screening with simultaneous viewing of four coordinated breast views as typically done in clinical practice. Eye-tracking data and diagnostic decisions collected for 100 mammographic cases (25 normal, 25 benign, 50 malignant) and 10 readers (three board certified radiologists and seven radiology residents), formed the corpus for this study. Sketch gesture recognition techniques were employed to extract geometric and gesture-based features from saccadic eye-movements. Our results show that saccadic eye-movement, characterized using sketch-based features, result in more accurate models for predicting individual identity and level of expertise than more traditional eye-tracking features.


Proceedings of SPIE | 2016

Shapelet analysis of pupil dilation for modeling visuo-cognitive behavior in screening mammography

Folami Alamudun; Hong-Jun Yoon; Tracy Hammond; Kathy Hudson; Garnetta Morin-Ducote; Georgia D. Tourassi

Our objective is to improve understanding of visuo-cognitive behavior in screening mammography under clinically equivalent experimental conditions. To this end, we examined pupillometric data, acquired using a head-mounted eye-tracking device, from 10 image readers (three breast-imaging radiologists and seven Radiology residents), and their corresponding diagnostic decisions for 100 screening mammograms. The corpus of mammograms comprised cases of varied pathology and breast parenchymal density. We investigated the relationship between pupillometric fluctuations, experienced by an image reader during mammographic screening, indicative of changes in mental workload, the pathological characteristics of a mammographic case, and the image readers’ diagnostic decision and overall task performance. To answer these questions, we extract features from pupillometric data, and additionally applied time series shapelet analysis to extract discriminative patterns in changes in pupil dilation. Our results show that pupillometric measures are adequate predictors of mammographic case pathology, and image readers’ diagnostic decision and performance with an average accuracy of 80%.


Journal of Biomedical Informatics | 2016

A novel web informatics approach for automated surveillance of cancer mortality trends

Georgia D. Tourassi; Hong-Jun Yoon; Songhua Xu

Cancer surveillance data are collected every year in the United States via the National Program of Cancer Registries (NPCR) and the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute (NCI). General trends are closely monitored to measure the nations progress against cancer. The objective of this study was to apply a novel web informatics approach for enabling fully automated monitoring of cancer mortality trends. The approach involves automated collection and text mining of online obituaries to derive the age distribution, geospatial, and temporal trends of cancer deaths in the US. Using breast and lung cancer as examples, we mined 23,850 cancer-related and 413,024 general online obituaries spanning the timeframe 2008-2012. There was high correlation between the web-derived mortality trends and the official surveillance statistics reported by NCI with respect to the age distribution (ρ=0.981 for breast; ρ=0.994 for lung), the geospatial distribution (ρ=0.939 for breast; ρ=0.881 for lung), and the annual rates of cancer deaths (ρ=0.661 for breast; ρ=0.839 for lung). Additional experiments investigated the effect of sample size on the consistency of the web-based findings. Overall, our study findings support web informatics as a promising, cost-effective way to dynamically monitor spatiotemporal cancer mortality trends.


Journal of the American Medical Informatics Association | 2016

The utility of web mining for epidemiological research: studying the association between parity and cancer risk.

Georgia D. Tourassi; Hong-Jun Yoon; Songhua Xu; Xuesong Han

BACKGROUND The World Wide Web has emerged as a powerful data source for epidemiological studies related to infectious disease surveillance. However, its potential for cancer-related epidemiological discoveries is largely unexplored. METHODS Using advanced web crawling and tailored information extraction procedures, the authors automatically collected and analyzed the text content of 79 394 online obituary articles published between 1998 and 2014. The collected data included 51 911 cancer (27 330 breast; 9470 lung; 6496 pancreatic; 6342 ovarian; 2273 colon) and 27 483 non-cancer cases. With the derived information, the authors replicated a case-control study design to investigate the association between parity (i.e., childbearing) and cancer risk. Age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for each cancer type and compared to those reported in large-scale epidemiological studies. RESULTS Parity was found to be associated with a significantly reduced risk of breast cancer (OR = 0.78, 95% CI, 0.75-0.82), pancreatic cancer (OR = 0.78, 95% CI, 0.72-0.83), colon cancer (OR = 0.67, 95% CI, 0.60-0.74), and ovarian cancer (OR = 0.58, 95% CI, 0.54-0.62). Marginal association was found for lung cancer risk (OR = 0.87, 95% CI, 0.81-0.92). The linear trend between increased parity and reduced cancer risk was dramatically more pronounced for breast and ovarian cancer than the other cancers included in the analysis. CONCLUSION This large web-mining study on parity and cancer risk produced findings very similar to those reported with traditional observational studies. It may be used as a promising strategy to generate study hypotheses for guiding and prioritizing future epidemiological studies.


Journal of the American Medical Informatics Association | 2018

Hierarchical attention networks for information extraction from cancer pathology reports

Shang Gao; Michael T. Young; John X. Qiu; Hong-Jun Yoon; James B. Christian; Paul A. Fearn; Georgia D. Tourassi; Arvind Ramanthan

Abstract Objective We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not sufficiently capture syntactic and semantic contexts from free-text documents. Materials and Methods Data for our analyses were obtained from 942 deidentified pathology reports collected by the National Cancer Institute Surveillance, Epidemiology, and End Results program. The HAN was implemented for 2 information extraction tasks: (1) primary site, matched to 12 International Classification of Diseases for Oncology topography codes (7 breast, 5 lung primary sites), and (2) histological grade classification, matched to G1–G4. Model performance metrics were compared to conventional machine learning (ML) approaches including naive Bayes, logistic regression, support vector machine, random forest, and extreme gradient boosting, and other DL models, including a recurrent neural network (RNN), a recurrent neural network with attention (RNN w/A), and a convolutional neural network. Results Our results demonstrate that for both information tasks, HAN performed significantly better compared to the conventional ML and DL techniques. In particular, across the 2 tasks, the mean micro and macroF-scores for the HAN with pretraining were (0.852,0.708), compared to naive Bayes (0.518, 0.213), logistic regression (0.682, 0.453), support vector machine (0.634, 0.434), random forest (0.698, 0.508), extreme gradient boosting (0.696, 0.522), RNN (0.505, 0.301), RNN w/A (0.637, 0.471), and convolutional neural network (0.714, 0.460). Conclusions HAN-based DL models show promise in information abstraction tasks within unstructured clinical pathology reports.


ieee embs international conference on biomedical and health informatics | 2016

Predicting lung cancer incidence from air pollution exposures using shapelet-based time series analysis

Hong-Jun Yoon; Songhua Xu; Georgia D. Tourassi

In this paper we investigated whether the geographical variation of lung cancer incidence can be predicted through examining the spatiotemporal trend of particulate matter air pollution levels. Regional trends of air pollution levels were analyzed by a novel shapelet-based time series analysis technique. First, we identified U.S. counties with reportedly high and low lung cancer incidence between 2008 and 2012 via the State Cancer Profiles provided by the National Cancer Institute. Then, we collected particulate matter exposure levels (PM2.5 and PM10) of the counties for the previous decade (1998-2007) via the AirData dataset provided by the Environmental Protection Agency. Using shapelet-based time series pattern mining, regional environmental exposure profiles were examined to identify frequently occurring sequential exposure patterns. Finally, a binary classifier was designed to predict whether a U.S. region is expected to experience high lung cancer incidence based on the regions PM2.5 and PM10 exposure the decade prior. The study confirmed the association between prolonged PM exposure and lung cancer risk. In addition, the study findings suggest that not only cumulative exposure levels but also the temporal variability of PM exposure influence lung cancer risk.


INNS Conference on Big Data | 2016

Multi-task Deep Neural Networks for Automated Extraction of Primary Site and Laterality Information from Cancer Pathology Reports

Hong-Jun Yoon; Arvind Ramanathan; Georgia D. Tourassi

Automated annotation of free-text cancer pathology reports is a critical challenge for cancer registries and the national cancer surveillance program. In this paper, we investigated deep neural networks (DNNs) for automated extraction of the primary cancer site and its laterality, two fundamental targets of cancer reporting. Our experiments showed that single-task DNNs are capable of extracting information with higher precision and recall than traditional classification methods for the more challenging target. Furthermore, a multi-task learning DNN resulted in further performance improvement. This preliminary study, indicate the strong potential for multi-task deep neural networks to extract cancer-relevant information from free-text pathology reports.


2013 Biomedical Sciences and Engineering Conference (BSEC) | 2013

A cost-effective, case-control study on the association between breast cancer and pregnancy through web mining

Hong-Jun Yoon; Songhua Xu; Georgia D. Tourassi

We report a case-control epidemiological study through mining peoples stories from the Internet. Our overarching goal is to test whether mining openly available, personal stories from the Internet is a cost-effective way for reliable epidemiological discoveries. As a case study, we focus on the association between breast cancer risk and pregnancy, which is clearly established through controlled clinical survey studies. Specifically, we automatically collected and mined 30,000 online obituary articles via a series of tailored cyber-informatics tools we developed. Replicating a case-control study design, we analyzed the collected data confirming with statistical significance that parity is associated with lower breast cancer risk. Our web mining study demonstrates promising preliminary evidence that online content mining can be a cost-effective and reliable way for epidemiological knowledge discovery.

Collaboration


Dive into the Hong-Jun Yoon's collaboration.

Top Co-Authors

Avatar

Georgia D. Tourassi

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Songhua Xu

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

John X. Qiu

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Arvind Ramanathan

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kathy Hudson

University of Tennessee

View shared research outputs
Top Co-Authors

Avatar

Mohammed Alawad

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Paul A. Fearn

Memorial Sloan Kettering Cancer Center

View shared research outputs
Top Co-Authors

Avatar

Tandy R. Carmichael

Tennessee Technological University

View shared research outputs
Researchain Logo
Decentralizing Knowledge