Jongik Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jongik Kim is active.

Explore More

Publication

Featured researches published by Jongik Kim.

BMC Bioinformatics | 2014

Improving read mapping using additional prefix grams

Jongik Kim; Chen Li; Xiaohui Xie

BackgroundNext-generation sequencing (NGS) enables rapid production of billions of bases at a relatively low cost. Mapping reads from next-generation sequencers to a given reference genome is an important first step in many sequencing applications. Popular read mappers, such as Bowtie and BWA, are optimized to return top one or a few candidate locations of each read. However, identifying all mapping locations of each read, instead of just one or a few, is also important in some sequencing applications such as ChIP-seq for discovering binding sites in repeat regions, and RNA-seq for transcript abundance estimation.ResultsHere we present Hobbes2, a software package designed for fast and accurate alignment of NGS reads and specialized in identifying all mapping locations of each read. Hobbes2 efficiently identifies all mapping locations of reads using a novel technique that utilizes additional prefix q-grams to improve filtering. We extensively compare Hobbes2 with state-of-the-art read mappers, and show that Hobbes2 can be an order of magnitude faster than other read mappers while consuming less memory space and achieving similar accuracy.ConclusionsWe propose Hobbes2 to improve the accuracy of read mapping, specialized in identifying all mapping locations of each read. Hobbes2 is implemented in C++, and the source code is freely available for download at http://hobbes.ics.uci.edu.

Engineering Applications of Artificial Intelligence | 2012

Hierarchical querying scheme of human motions for smart home environment

Yoon Sik Tak; Jongik Kim; Eenjun Hwang

With the recent development of ubiquitous technologies, many new applications have been emerging for smart home implementation. Usually, such applications are based on diverse sensors. One fundamental operation in the applications is to find out semantically meaningful events or activities from huge sensor data stream. Usually, such event or activity is represented by a salient sequence pattern. Among the diverse research issues, detecting salient sequence patterns of human motions from image sensor data stream has received much attention for security and surveillance purposes. In the case of detecting human motions from image sensor data, finding and matching their salient sequence patterns could become more complicated since semantically same motions could show diverse variations such as different motion time. Based on this observation, in this paper, we propose a new querying and answering scheme for continuous sensor data stream to detect abnormal human motions. More specifically, we first present a new hierarchical querying scheme to consider variable length of semantically same human motions. Secondly, we present an indexing scheme to efficiently find semantically meaningful motion sequences in the sensor data stream. Thirdly, we present Dynamic Group Warping algorithm to effectively filter out unnecessary human motions. Through extensive experiments, we show that our proposed method achieves outstanding performance.

Information Sciences | 2006

Advanced structural joins using element distribution

Jongik Kim

For accelerating a structural join operation, current techniques focus on skipping elements that do not contribute to the results. They make use of external index structures (e.g. B+ tree) to determine a bunch of elements to be skipped. However, external indexes are too heavy for a structural join and the overhead of index lookups can reduce the benefit of skipping. In this paper, we proposed element trees and distribution encoded bitmaps for efficient element skipping. With proposed techniques, we can exploit the distribution of elements as well as the context information of a query for efficient skipping of unnecessary elements.

international conference on data engineering | 2012

Efficient Exact Similarity Searches Using Multiple Token Orderings

Jongik Kim; Hongrae Lee

Similarity searches are essential in many applications including data cleaning and near duplicate detection. Many similarity search algorithms first generate candidate records, and then identify true matches among them. A major focus of those algorithms has been on how to reduce the number of candidate records in the early stage of similarity query processing. One of the most commonly used techniques to reduce the candidate size is the prefix filtering principle, which exploits the document frequency ordering of tokens. In this paper, we propose a novel partitioning technique that considers multiple token orderings based on token co-occurrence statistics. Experimental results show that the proposed technique is effective in reducing the number of candidate records and as a result improves the performance of existing algorithms significantly.

data and knowledge engineering | 2004

A partition index for XML and semi-structured data

Jongik Kim; Hyoung-Joo Kim

XML and other semi-structured data can be represented by a graph model. The paths in a data graph are used as a basic constructor of a query. Especially, by using patterns on paths, a user can formulate more expressive queries. Patterns in a path enlarge the search space of a data graph and current research for indexing semi-structured data focuses on reducing the search space. However, the existing indexes cannot reduce the search space when a data graph has some references.In this paper, we introduce a partitioning technique for all paths in a data graph and an index graph which can effectively find appropriate path partitions for a path query with patterns.

Information & Software Technology | 2003

Efficient processing of regular path joins using PID

Jongik Kim; Hyoung-Joo Kim

Abstract XML is data that has no fixed structure. So it is hard to design a schema for storing and querying an XML data. Instead of a fixed schema, graph-based data models are widely adopted for querying XML. Queries on XML are based on paths in a data graph. A meaningful query usually has several paths in it, but much of recent research is more concerned with optimizing a single path in a query. In this paper, we present an efficient technique for processing multiple path expressions in a query. We implemented our technique and present preliminary performance results.

international conference on advanced communication technology | 2007

Service Bundle Providing System in Open Telematics Environment

Chul-Su Kim; Jongik Kim; Daesub Yoon; Myungjin Lee; Hyunsuk Kim; Kwang-Seok Kwon

As telematics technology has rapidly developed, many kinds of telematics service applications and terminals are provided to cars. Telematics service users however can not use various services because service providers provide services using their own terminal in which the service applications are built and do not provide methods to download and install new applications. It means that to use many kinds of services, users have to buy a lot of terminals. In order to solve this problem, we suggest telematics service bundle providing system which enables service providers to register their service applications and enable service users to search, download and install service applications.

Information Processing Letters | 2004

Efficient structural joins with clustered extents

Jongik Kim; Sang-Ho Lee; Hyoung-Joo Kim

In order to retrieve an extent in the first step, inverted index built on selection predicates is us For each selection predicate, an extent can be e retrieved by looking up the inverted index. Finding occurrences of the s tructural matches in th second step is a core operation in XML query proce ing. To solve this sub-problem, Zhang et al. [6] p posed the multi-predicate merge join (MPMGJN) gorithm, which is an extension of the traditional mer join algorithm. Al-Khalifa et al. [1] generalized th MPMGJN algorithm to the tree-merge join algorithm that consider the order of join results. Furthermo they proposed the stack-tree join algorithms that improve the tree-merge join algorithms. Those al rithms are dependent on the representation of p tions of XML elements (and string values) to det mine structural relationships between tree nodes. cently, Chien et al. [3] proposed a structural join alg

international conference on data engineering | 2016

Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings

Jongik Kim; Chen Li; Xiaohui Xie

Recent advances in DNA sequencing have enabled a flood of sequencing-based applications for studying biology and medicine. A key requirement of these applications is to rapidly and accurately map DNA subsequences to a reference genome. This DNA subsequence mapping problem shares core technical challenges with the similarity query processing problem studied in the database research literature. To solve this problem, existing techniques first extract signatures from a query, then retrieve candidate mapping positions from an index using the extracted signatures, and finally verify the candidate positions. The efficiency of these techniques depends critically on signatures selected from queries, while signature selection relies on an indexing scheme of a reference genome. The q-gram inverted indexing, one of the most widely used indexing schemes, can discover candidate positions quickly, but has the limitation that signatures of queries are restricted to fixed-length q-grams. To address the problem, we propose a flexible way to generate variable-length signatures using a fixed-length q-gram index. The proposed technique groups a few q-grams into a variable-length signature, and generates candidate positions for the variable-length signature using the inverted lists of the q-grams. We also propose a novel dynamic programming algorithm to balance between the filtering power of signatures and the overhead of generating candidate positions for the signatures. Through extensive experiments on both simulated and real genomic data, we show that our technique substantially improves the performance of read mapping in terms of both mapping speed and accuracy.

Information Systems | 2015

An effective candidate generation method for improving performance of edit similarity query processing

Jongik Kim

In this paper, we study edit similarity query processing to find strings similar to a query string from a collection of strings. To solve the problem, many algorithms have been proposed under a filter-and-verification framework, where candidate strings are generated and refined using a few filters and then verified to find true matches. A major focus of those algorithms has been on generating candidates as small as possible in an early stage of the query processing. A typical approach to generate candidates is to extract some signatures from a query and take union of string ids in the inverted lists of the extracted signatures. However, the number of candidates generated from existing techniques is extremely larger than the number of answer strings and costs for refinement and verification are expensive. To address the problem, we propose an intersection-based candidate generation scheme, which generates a substantially smaller number of candidates. Given some signatures of a query, the proposed scheme first categorizes signatures into several groups. Then, it takes intersection of string ids in the inverted lists of the signatures in each group. Finally, it takes union of the intersections to generate candidates. To minimize the number of candidates under our scheme, we propose a novel algorithm which judiciously selects an optimal signature group. We show through experiments that our technique is very effective in reducing the number of candidates and significantly improves the performance. HighlightsWe develop an intersection based candidate generation scheme.We prove that candidates can be generated using a DNF of inverted lists.We propose a dynamic programming algorithm to select an optimal generation plan.We experimentally show that the proposed technique outperforms existing techniques.

Explore More