Dahee Lee
Yonsei University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dahee Lee.
Journal of Biomedical Informatics | 2015
Min Song; Won Chul Kim; Dahee Lee; Go Eun Heo; Keun Young Kang
Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction.
Scientometrics | 2015
Min Song; Go Eun Heo; Dahee Lee
Alzheimer’s disease (AD) is one of degenerative brain diseases, whose cause is hard to be diagnosed accurately. As the number of AD patients has increased, researchers have strived to understand the disease and develop its treatment, such as medical experiments and literature analysis. In the area of literature analysis, several traditional studies analyzed the literature at the macro level like author, journal, and institution. However, analysis of the literature both at the macro level and micro level will allow for better recognizing the AD research field. Therefore, in this study we adopt a more comprehensive approach to analyze the AD literature, which consists of productivity analysis (year, journal/proceeding, author, and Medical Subject Heading terms), network analysis (co-occurrence frequency, centrality, and community) and content analysis. To this end, we collect metadata of 96,081 articles retrieved from PubMed. We specifically perform the concept graph-based network analysis applying the five centrality measures after mapping the semantic relationship between the UMLS concepts from the AD literature. We also analyze the time-series topical trend using the Dirichlet multinomial regression topic modeling technique. The results indicate that the year 2013 is the most productive year and Journal of Alzheimer’s Disease the most productive journal. In discovery of the core biological entities and their relationships resided in the AD related PubMed literature, the relationship with glycogen storage disease is founded most frequently mentioned. In addition, we analyze 16 main topics of the AD literature and find a noticeable increasing trend in the topic of transgenic mouse.
PLOS ONE | 2017
Seung Han Baek; Dahee Lee; Minjoo Kim; Jong Ho Lee; Min Song
Background Most of earlier studies in the field of literature-based discovery have adopted Swansons ABC model that links pieces of knowledge entailed in disjoint literatures. However, the issue concerning their practicability remains to be solved since most of them did not deal with the context surrounding the discovered associations and usually not accompanied with clinical confirmation. In this study, we aim to propose a method that expands and elaborates the existing hypothesis by advanced text mining techniques for capturing contexts. We extend ABC model to allow for multiple B terms with various biological types. Results We were able to concretize a specific, metabolite-related hypothesis with abundant contextual information by using the proposed method. Starting from explaining the relationship between lactosylceramide and arterial stiffness, the hypothesis was extended to suggest a potential pathway consisting of lactosylceramide, nitric oxide, malondialdehyde, and arterial stiffness. The experiment by domain experts showed that it is clinically valid. Conclusions The proposed method is designed to provide plausible candidates of the concretized hypothesis, which are based on extracted heterogeneous entities and detailed relation information, along with a reliable ranking criterion. Statistical tests collaboratively conducted with biomedical experts provide the validity and practical usefulness of the method unlike previous studies. Applying the proposed method to other cases, it would be helpful for biologists to support the existing hypothesis and easily expect the logical process within it.
Journal of Information Science | 2014
Yong Hwan Kim; Dahee Lee; Nam Gi Han; Min Song
Differences in user behaviours appearing across different social media are yet to be explored. This paper aims to investigate aspects of the way users consume Web videos, which is a specific cultural behaviour, reflected in different social media. Specifically, we looked at YouTube K-pop videos viewed on YouTube or mentioned on Twitter, which were collected by Web crawling and used for building the respective networks. The node of the networks is the video, and the edges are the relatedness in the YouTube network and the co-link relationships in the Twitter network. Multilateral analysis is conducted to compare two networks. We found that users focused heavily on K-pop music in the YouTube network whereas they were engaged in a more diverse range of cultural contents including music, dance and TV programmes in the Twitter network. This study can be extended to the other user studies to better understand user behaviour of social media.
international world wide web conferences | 2014
Yong Hwan Kim; Dahee Lee; Jung Eun Hahm; Nam-Gi Han; Min Song
In this paper we investigated the socio-cultural behavior of users reflected in the two different social media channels, YouTube and Twitter. We conducted the comparative analysis of the networks generated from the two channels. The relationship we set for each network is the relatedness on YouTube and the co-links on Twitter. From the results, we revealed that the social media influenced the distinct socio-cultural behaviors of their users. Specifically, Twitter network better showed the actual consumption of contents in the field of the k-pop culture than YouTube. From this study, we contributed to offer a novel approach for exploring the socio-cultural behavior of users on the social media.
data and text mining in bioinformatics | 2014
Yoo Kyung Jeong; Dahee Lee; Min Song
This study investigates the world-wide collaborative networks from a geographical perspective based on clinical tests (CT) and academic researches (AR) on Acquired immune deficiency syndrome or acquired immunodeficiency syndrome (AIDS). By applying text mining technique on the AIDS related documents, we extract the spatial information and are able to discover co-location pairs for each type of research at two levels: national level and city level. Co-location networks for CT and AR are analyzed using network features, visualization, and highly-ranked betweenness centrality nodes. The analysis results reveal that the CT network is more densely compact with about twice as many nodes than the AR network. According to the analysis at the national level, the AR network is rather focused on the United States while the CT network is more spread out throughout the world. At the city level, the collaborative work is more active among closely located cities in the AR network compared to the case of the CT network (see Figure 1). The AR network has core collaboration centers mainly situated in the United States and Europe, but those of the CT network also includes Asian and African cities. Overall, our study intuitively points out the differences in the collaborative networks for CT and AR, which contributes to the understanding of the research trend involving the productivity analysis of the collaborative work associated with the regional aspect.
data and text mining in bioinformatics | 2014
Yoo Kyung Jeong; Dahee Lee; Nam-Gi Han; Won Chul Kim; Min Song
The biomedical information extraction, especially Named Entity Recognition (NER), is a primary task in biomedical text-mining due to the rapid growth of large-scale literature. Extracting biomedical entities aims at identifying specific entities (words or phrases) from those unstructured text data. In this work, we introduce a novel biomedical NER system utilizing a combination of regional and global text features: linguistic, lexical, contextual, and syntactic features. Our system adopts Conditional Random Fields (CRFs) [1] as a machine learning algorithm and consists of two major pipelines (see Figure 1). We especially focus on constructing the first pipeline for text processing in a modularized manner and discovering rich feature sets regarding comprehensive linguistics and contexts. To implement the CRF framework in the second pipeline, our system uses a modified version of Mallet [2] to take advantage of feature induction. As a result of 10-fold cross-validation, our system achieves from 0.99% up to 18.47% of F-measure improvement as well as the highest precision compared to existing open-source biomedical NER systems on GENETAG corpus [3]. We figure out that several components such as abundant key features, external resources, and feature induction contribute to the performance of the proposed system.
Communications of Mathematical Education | 2014
Dong-Joong Kim; Sung-Chul Bae; Won Kim; Dahee Lee; Sang-Ho Choi
The purpose of research is to analyze research trends and methods in a total of 709 studies published in five domestic mathematics education journals issued by Korea Citation Index (KCI) for the last 10 years (2003-2013) and strands in stages of research methods among mixed methods studies. As a result, the majority of articles in the five journals used either qualitative or quantitative methods and mixed methods research was less than 10 % of the total number of studies. The majority of mixed methods was research equally mixing qualitative and quantitative methods. The mixed methods research consisted primarily of quantitative with descriptive statistics and very little qualitative with conceptual framework based on theoretical background. Our results provide not only trends of the current research methods, but also implications for various future research paradigms in mathematics education.
Journal of Alzheimer's Disease | 2015
Dahee Lee; Won Chul Kim; Andreas Charidimou; Min Song
The Mathematical Education | 2017
Woo-Hyung Whang; Dong-Joong Kim; Won Kim; Dahee Lee; Sang-Ho Choi