Is this you? Create Your Porfile

Taehong Kim

Korea Institute of Science and Technology Information

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Taehong Kim is active.

Explore More

Publication

Featured researches published by Taehong Kim.

Archive | 2015

MapReduce-Based Bulk-Loading Algorithm for Fast Search for Billions of Triples

Jung-Ho Um; Seungwoo Lee; Taehong Kim; Chang-Hoo Jeong; Kwangik Seo; Joonho Park; Hanmin Jung

Due to the development of IT and scientific technology, huge amounts of data are continuously being created and the big data era can be said to have arrived. Therefore, triple store inserting and inquiring into knowledge bases has to be scaled up in order to deal with such large sources of data. To this end, we propose a triple store system based on a distributed database that uses bulk-loading for billions of triples to store data and to respond to user queries quickly. In order to achieve this purpose, we introduce a bulk-loading algorithm using the MapReduce framework and the SPARQL query processing engine to connect to a large distributed database. Experimental results show that the proposed bulk-loading algorithm can use 101K triples per second to load approximately 33 billion triples. This implies that we will be able to deal with billions of triples.

Neurocomputing | 2016

Semantic complex event processing model for reasoning research activities

Jung-Ho Um; Seungwoo Lee; Taehong Kim; Chang-Hoo Jeong; Sa-Kwang Song; Hanmin Jung

With recent developments in science and technology, numerous researchers around the world are producing a variety of science- and technology-related documents such as research papers and patents in order to share and further develop their respective research. In particular, the spread of social network services is allowing researchers in science and technology to share and develop their latest technologies at a fast rate. In the era of Big Data, there are increasing needs for a system that can infer analytical information about the research activities of such researchers, but no system that satisfies the requirements exists at the moment. Therefore, this study proposes a sematic complex event processing model for reasoning. The study then details an architecture in which such analytics is processed on a real-time basis in a distributed environment. With this architecture, researchers can easily monitor their individual research activities in an up to date manner and use such analytical data as a basis to set their future research directions.

The Journal of Supercomputing | 2016

Distributed RDF store for efficient searching billions of triples based on Hadoop

Jung-Ho Um; Seungwoo Lee; Taehong Kim; Chang-Hoo Jeong; Sa-Kwang Song; Hanmin Jung

As the development of IT and scientific technology, very large amounts of knowledge data are continuously being created and the big data era can be said to have arrived. Therefore, RDF store inserting and inquiring into knowledge bases has to be scaled up in order to deal with such large sources of data. To this end, we propose a scalable distributed RDF store based on a distributed database that uses bulk-loading for billions of triples to store data and to respond to user queries quickly. In order to achieve this purpose, we introduce a bulk-loading algorithm using the MapReduce framework and the SPARQL query processing engine to connect to a large distributed database. Experimental results show that the proposed bulk-loading algorithm achieves 67.893K triples per second to load approximately 33 billion triples. Therefore, the experiment proves proposed RDF store can manage billions of triples scale data.

innovative mobile and internet services in ubiquitous computing | 2013

Collecting Korean-English Pairs for Translation of Technical Terms

Myunggwon Hwang; Do-Heon Jeong; Taehong Kim; Sa-Kwang Song; Jinhee Lee; Hanmin Jung; Donald J. Kim

Many web applications provide a fully automatic machine translation service, and users can easily access and understand the information they are interested in. However, the services still have inaccurate results when translating technical terms. Therefore, we suggest a new method that collects reliable translations of technical terms between Korean and English. To collect the pairs, we utilize the metadata of Korean scientific papers and make a new statistical model to adapt the metadata characteristics appropriately. The collected Korean-English pairs are evaluated in terms of reliability and compared with the results of Google translator. Through evaluation and comparison, we confirm that this research can produce highly reliable data and improve the translation quality of technical terms.

International Conference on U- and E-Service, Science and Technology | 2011

OntoURIResolver: URI Resolution and Recommendation Service Using LOD

Taehong Kim; Pyung Kim; Seungwoo Lee; Hanmin Jung; Won-Kyung Sung

LOD (Linked Open Data) provides an infrastructure to exchange multiples ontologies with standard formats. LOD recommends some rules about data publishing, URI (Uniform Resource Identifier) assignment, URI reusing and ontology accessing. To make a local ontology interoperable and sharable with LOD, we have to make linkages URIs between local ontology and LOD. In order to decide appropriate URI for specific entity, OntoURIResolver collects RDF (Resource Description Framework) triples of multiple ontologies by SPARQL, divides URIs into several groups by comparing RDF triples of URIs and recommends a canonical URI and entity name for each group using statistics of RDF triples. We experiment comparison of sameas.org and OntoURIResolver with top 10 ranked authors of DBLP. Users can find a specific URI for entity and make interconnections with LOD to maximize the effectiveness of ontology through this service.

international conference on knowledge and smart technology | 2017

Experimental study of time series-based dataset selection for effective text classification

Yeonghun Chae; Do-Heon Jeong; Taehong Kim

Conventional automatic document classification methods are currently faced with challenges in terms of learning time and computing power, owing to the ever-increasing amount of data on the web. In this paper, we propose an efficient classification method that uses time series-based dataset selection. In the proposed method, the dataset is split based on time series data and the best set of testing documents selected. The results of classification performance tests conducted using a Naïve Bayes classifier indicate that using a small amount of data divided in terms of time series is more efficient than using the entire dataset for learning.

Wireless Personal Communications | 2016

Entity Resolution Approach of Data Stream Management Systems

Taehong Kim; Mi-Nyeong Hwang; Young-Min Kim; Do-Heon Jeong

Owing to the technological advancements in Semantic Web and sensor networks, a large amount of data has been produced in association with the open data policy. However, data stream management systems that process stream data have focused on the processing of a large amount of data with little priority on data identification, integration, and external linkage. Furthermore, entity resolution is focused mainly on static database-based technologies. In this study, a real-time stream data processing architecture that can perform the integration and entity resolution of streaming-type heterogeneous input data and interlink with external data is designed. To achieve this goal, a light adapter to integrate heterogeneous data into standard scheme and blocking technique to reduce comparison candidates are applied. The implemented data adapters shows 4 times higher throughput than open source data parsers and the entity resolution results with streaming data shows similar performance with the static data sets. The proposed streaming data entity resolution architecture is expected to form the basis of data integration research that can integrate various information sources of data efficiently, enrich internal data.

Software - Practice and Experience | 2015

Translation of technical terminologies between English and Korean based on textual big data

Taehong Kim; Myunggwon Hwang; Mi-Nyeong Hwang; Sa-Kwang Song; Do-Heon Jeong; Hanmin Jung

A number of web applications provide completely automated machine translation services, allowing users to easily translate information of interest. However, these services still generate inaccurate results when translating technical terminologies. Therefore, we propose a new method that collects reliable pairs of English–Korean technical terms and translates the given English terminology to Korean. To collect the pairs, we utilize textual big data, such as Korean academic papers, and develop a new statistical model to determine appropriate characteristics. Our method is evaluated in terms of the reliability of English–Korean pairs and the precision of translation. We thus confirm that our method can produce highly reliable data and can positively influence the translation quality of technical terminologies. Copyright

Archive | 2016