Chi-Hwan Choi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chi-Hwan Choi is active.

Explore More

Publication

Featured researches published by Chi-Hwan Choi.

PLOS ONE | 2016

CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment.

Jeongsu Oh; Chi-Hwan Choi; Min-Kyu Park; Byung Kwon Kim; Kyuin Hwang; Sang-Heon Lee; Soon Gyu Hong; Arshan Nasir; Wan-Sup Cho; Kyung Mo Kim

High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.

international conference on ubiquitous and future networks | 2016

Smart answering Chatbot based on OCR and Overgenerating Transformations and Ranking

Ly Pichponreay; Jin-Hyuk Kim; Chi-Hwan Choi; Kyung-Hee Lee; Wan-Sup Cho

With rapid development of information and communication technology, people are very diverse in education, learning style, and knowledge improvement methods. This paper presents an approach of converting documents into knowledge of Chatbot system that enables users to make more benefits of it by asking and answering questions through the use of electronic documents integrated with simulate system. It is an integrated system for enrich contents of documents from popular format such as Portable Document Format (PDF) and digital photos. The workflow of this system is started from extracts texts using Optical Character Recognition (OCR) from files, then generates questions via Overgenerating Transformations and Ranking algorithm, and finally let Chatbot response to the users question when it is matched with the String pattern.

Journal of Geophysical Research | 2017

Spatial dependence of electromagnetic ion cyclotron waves triggered by solar wind dynamic pressure enhancements

Jung-Hee Cho; D.-Y. Lee; S.‐J. Noh; Hyomin Kim; Chi-Hwan Choi; Jaejin Lee; J. Hwang

In this paper, using the multisatellite (the Van Allen Probes and two GOES satellites) observations in the inner magnetosphere, we examine two electromagnetic ion cyclotron (EMIC) wave events that are triggered by Pdyn enhancements under prolonged northward interplanetary magnetic field quiet time preconditions. For both events, the impact of enhanced Pdyn causes EMIC waves at multiple points. However, we find a strong spatial dependence that EMIC waves due to enhanced Pdyn impact can occur at multiple points (likely globally but not necessarily everywhere) but with different wave properties. For Event 1, three satellites situated at a nearly same dawnside zone but at slightly different L shells see occurrence of EMIC waves but in different frequencies relative to local ion gyrofrequencies and with different polarizations. These waves are found inside or at the outer edge of the plasmasphere. Another satellite near noon observes no dramatic EMIC wave despite the strongest magnetic compression there. For Event 2, the four satellites are situated at widely separated magnetic local time zones when they see occurrence of EMIC waves. They are again found at different frequencies relative to local ion gyrofrequencies with different polarizations and all outside the plasmasphere. We propose two possible explanations that (i) if triggered by enhanced Pdyn impact, details of ion cyclotron instability growth can be sensitive to local plasma conditions related to background proton distributions, and (ii) there can be preexisting waves with a specific spatial distribution, which determines occurrence and specific properties of EMIC waves depending on satellites relative position after an enhanced Pdyn arrives.

The Journal of the Korea Contents Association | 2013

Refresh Cycle Optimization for Web Crawlers

Wan-Sup Cho; Jeong-Eun Lee; Chi-Hwan Choi

Web crawler should maintain fresh data with minimum server overhead for large amount of data in the web sites. The overhead in the server increases rapidly as the amount of data is exploding as in the big data era. The amount of web information is increasing rapidly with advanced wireless networks and emergence of diverse smart devices. Furthermore, the information is continuously being produced and updated in anywhere and anytime by means of easy web platforms, and smart devices. Now, it is becoming a hot issue how frequently updated web data has to be refreshed in data collection and integration. In this paper, we propose dynamic web-data crawling methods, which include sensitive checking of web site changes, and dynamic retrieving of web pages from target web sites based on historical update patterns. Furthermore, we implemented a Java-based web crawling application and compared efficiency between conventional static approaches and our dynamic one. Our experiment results showed 46.2% overhead benefits with more fresh data compared to the static crawling methods.

international conference on big data | 2015

A graph based representative keywords extraction model from news articles

Kaaen Kwon; Chi-Hwan Choi; Jihyeon Lee; Jisoo Jeong; Wan-Sup Cho

In an age of the deluge of information, a blizzard of documents such as news articles is being generated in a real-time. To grasp the contents of documents, keyword extraction methods have researched actively. In this paper, we propose a model to extract representative keywords of news articles based on graph model. We evaluate the accuracy of the proposed model compared with TextRank and TFIDF. The results show that proposed models accuracy is improved to 40% and 90% respectively without increasing computational time.

international conference on intelligent systems, modelling and simulation | 2013

A Private Cloud System for Web-based High-Performance Multiple Sequence Alignment Services

Seung-Hyun Jung; Jong-Hwa Na; Chi-Hwan Choi; Franco Nazareno; In-Sun Jung; Wan-Sup Cho; Min-Hyunk Tang; Sung-Hyun Jun

We proposed a LanLinux-based cloud system for ClustalW-MPI, a parallel implementation of Clustal-W based on MPI, where researchers can submit their sequence data online for multiple sequence alignment. ClustalW is one of the most widely used programs for multiple sequence alignment (MSA) in bioinformatics. However, current in-silico environmental conditions for MSAs are facing computing power problems. The proposed system uses the MPICH2 (a standard message-passing interface for distributed-memory applications used in parallel computing) for handling all the tasks associated with the multiple sequence alignment on the Web. It provides sufficient computing power for aligning large number of sequences at a time, with real-time monitoring capabilities to ensure correctness, efficiency and effectiveness.

Journal of Astronomy and Space Sciences | 2008

Statistical Characteristics of Solar Wind Dynamic Pressure Enhancements During Geomagnetic Storms

Chi-Hwan Choi; Khan-Hyuk Kim; Dae-Young Lee; Ji Hye Kim; Ensang Lee

Solar wind dynamic pressure enhancements are known to cause various types of disturbances to the magnetosphere. In particular, dynamic pressure enhancements may affect the evolution of magnetic storms when they occur during storm times. In this paper, we have investigated the statistical significance and features of dynamic pressure enhancements during magnetic storm times. For the investigation, we have used a total of 91 geomagnetic storms for 2001-2003, for which the Dst minimum ( ) is below –50 nT. Also, we have imposed a set of selection criteria for a pressure enhancement to be considered an event: The main selection criterion is that the pressure increases by %o r nPa within 30 min and remains to be elevated for 10 min or longer. For our statistical analysis, we define the storm time to be the interval from the main Dst decrease, through , to the point where the Dst index recovers by 50%. Our main results are summarized as follows. (i) 81% of the studied storms indicate at least one event of pressure enhancements. When averaged over all the 91 storms, the occurrence rate is 4.5 pressure enhancement events per storm and 0.15 pressure enhancement events per hour. (ii) The occurrence rate of the pressure enhancements is about three times higher for CME-driven storm times than for CIR-driven storm times. (iii) Only 21.1% of the pressure enhancements show a clear association with an interplanetary shock. (iv) A large number of the pressure enhancement events are accompanied with a simultaneous change of IMF and/or : For example, 73.5% of the pressure enhancement events are associated with an IMF change of either nT or nT. This last finding suggests that one should consider possible interplay effects between the simultaneous pressure and IMF changes in many situations.

The Journal of Supercomputing | 2017

KMLOD: linked open data service for Korean medical database

PhalPheaktra Chhaya; Chi-Hwan Choi; Kyung-Hee Lee; Wan-Sup Cho; Young-Sung Lee

On the Internet, massive amounts of data are being generated every day, which grows the need for effective way of connecting and sharing data on the web. In this paper, we propose an online retrieval system that connects and aggregates data from various sources through linked open data. The main data extracted from a Korean medical article database, called KMbase, are stored in a relational database which requires the need to convert relational data into RDF data. We use linked data approach to connect one data source to another, which allows user to explore the web of data. Additionally, we introduce different approaches linking to different data sources in which different data sources require different linking methods. The results are presented in a user-friendly web application providing features such as searching and visualizing articles. It also provides SPARQL endpoint where users, who are familiar with SPARQL queries, can put different types of query and retrieve the result.

international conference on ubiquitous and future networks | 2016

Using D2RQ and Ontop to publish relational database as Linked Data

PhalPheaktra Chhaya; Kyung-Hee Lee; Kwang-soo Shin; Chi-Hwan Choi; Wan-Sup Cho; Young-Sung Lee

Having dataset in relational database published and accessed on the Web of Data is a hot issue. In this paper, we present implementation of publishing relational database as Linked Data with two open-source platforms, particularly D2RQ and Ontop. Both platforms provide their own way to translate databases into RDF Graphs. However, in case of querying data, Ontop shows much better performance than D2RQ.

international conference on big data | 2015

Performance Evaluation of Apache Spark According to the Number of Nodes using Principal Component Analysis

Sungjin Hong; Sangho Kim; Jongsun Jang; Chi-Hwan Choi; In-Sun Jung; Jong-Hwa Na; Wan-Sup Cho; Suyoung Chi

With the development of big data collection and storage technology, an analysis for its utilization has recently been expanded in public sector and various industries. Especialy in manufacturing and financial sectors, there has been a very high demand for real-time analysis of big data. Existing studies on the big data analysis mainly dealt with its batch scheme as a premise. In recent years, studies related to real-time analytics using SPARK, STORM and IMDG have been underway. In this regard, this paper seeks to evaluate the processing performance of the principal component analysis using an open sourse SPARK which is in-memeory based distributed processing method. It is necessary for real-time analysis and fast operation of large amount of data. This paper shows how fast spark is by comprison with open source R and also investigate the distributed processing capability of Spark according to the Node configuration.

Explore More