Heejin Park | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Heejin Park is active.

Explore More

Publication

Featured researches published by Heejin Park.

combinatorial pattern matching | 2003

Linear-time construction of suffix arrays

Dong Kyue Kim; Jeong Seop Sim; Heejin Park; Kunsoo Park

The time complexity of suffix tree construction has been shown to be equivalent to that of sorting: O(n) for a constant-size alphabet or an integer alphabet and O(n log n) for a general alphabet. However, previous algorithms for constructing suffix arrays have the time complexity of O(n log n) even for a constant-size alphabet. In this paper we present a linear-time algorithm to construct suffix arrays for integer alphabets, which do not use suffix trees as intermediate data structures during its construction. Since the case of a constant-size alphabet can be subsumed in that of an integer alphabet, our result implies that the time complexity of directly constructing suffix arrays matches that of constructing suffix trees.

Journal of Discrete Algorithms | 2005

Constructing suffix arrays in linear time

Dong Kyue Kim; Jeong Seop Sim; Heejin Park; Kunsoo Park

Abstract The time complexity of suffix tree construction has been shown to be equivalent to that of sorting: O ( n ) for a constant-size alphabet or an integer alphabet and O ( n log n ) for a general alphabet. However, previous algorithms for constructing suffix arrays have the time complexity of O ( n log n ) even for a constant-size alphabet. In this paper we present a linear-time algorithm to construct suffix arrays for integer alphabets, which do not use suffix trees as intermediate data structures during its construction. Since the case of a constant-size alphabet can be subsumed in that of an integer alphabet, our result implies that the time complexity of directly constructing suffix arrays matches that of constructing suffix trees.

Nucleic Acids Research | 2006

MODi : a powerful and convenient web server for identifying multiple post-translational peptide modifications from tandem mass spectra

Sangtae Kim; Seungjin Na; Ji Woong Sim; Heejin Park; Jaeho Jeong; Hokeun Kim; Younghwan Seo; Jawon Seo; Kong-Joo Lee; Eunok Paek

MODi () is a powerful and convenient web service that facilitates the interpretation of tandem mass spectra for identifying post-translational modifications (PTMs) in a peptide. It is powerful in that it can interpret a tandem mass spectrum even when hundreds of modification types are considered and the number of potential PTMs in a peptide is large, in contrast to most of the methods currently available for spectra interpretation that limit the number of PTM sites and types being used for PTM analysis. For example, using MODi, one can consider for analysis both the entire PTM list published on the unimod webpage () and user-defined PTMs simultaneously, and one can also identify multiple PTM sites in a spectrum. MODi is convenient in that it can take various input file formats such as .mzXML, .dta, .pkl and .mgf files, and it is equipped with a graphical tool called MassPective developed to display MODis output in a user-friendly manner and helps users understand MODis output quickly. In addition, one can perform manual de novo sequencing using MassPective.

Molecular & Cellular Proteomics | 2008

Unrestrictive Identification of Multiple Post-translational Modifications from Tandem Mass Spectrometry Using an Error-tolerant Algorithm Based on an Extended Sequence Tag Approach

Seungjin Na; Jaeho Jeong; Heejin Park; Kong-Joo Lee; Eunok Paek

Identification of post-translational modifications (PTMs) is important to understanding the biological functions of proteins. MS/MS is a useful tool to identify PTMs. Most existing search tools are restricted to take only a few types of PTMs as input. Here we describe a new algorithm, called MODi (pronounced “mod eye”), that rapidly searches for all known types of PTMs at once without limiting a multitude of modified sites in a peptide. MODi introduces the notion of a tag chain, a combination structure made from multiple sequence tags, that effectively localizes modified regions within a spectrum and overcomes de novo sequencing errors common in tag-based approaches. MODi showed its performance competence by identifying various types of PTMs in analysis of PTM-rich proteins such as glyceraldehyde-3-phosphate dehydrogenase and lens protein. We demonstrated that MODi innovatively manages the computational complexity of identifying multiple PTMs in a peptide, which may exist in a greater variety than usually expected. In addition, it is suggested that MODi has great potential to discover novel modifications.

Analytical Chemistry | 2008

Isotopic Peak Intensity Ratio Based Algorithm for Determination of Isotopic Clusters and Monoisotopic Masses of Polypeptides from High-Resolution Mass Spectrometric Data

Kunsoo Park; Joo Young Yoon; Sunho Lee; Eunok Paek; Heejin Park; Hee-Jung Jung; Sang-Won Lee

Determining isotopic clusters and their monoisotopic masses is a first step in interpreting complex mass spectra generated by high-resolution mass spectrometers. We propose a mathematical model for isotopic distributions of polypeptides and an effective interpretation algorithm. Our model uses two types of ratios: intensity ratio of two adjacent peaks and intensity ratio product of three adjacent peaks in an isotopic distribution. These ratios can be approximated as simple functions of a polypeptide mass, the values of which fall within certain ranges, depending on the polypeptide mass. Given a spectrum as a peak list, our algorithm first finds all isotopic clusters consisting of two or more peaks. Then, it scores clusters using the ranges of ratio functions and computes the monoisotopic masses of the identified clusters. Our method was applied to high-resolution mass spectra obtained from a Fourier transform ion cyclotron resonance (FTICR) mass spectrometer coupled to reverse-phase liquid chromatography (RPLC). For polypeptides whose amino acid sequences were identified by tandem mass spectrometry (MS/MS), we applied both THRASH-based software implementations and our method. Our method was observed to find more masses of known peptides when the numbers of the total clusters identified by both methods were fixed. Experimental results show that our method performed better for isotopic mass clusters of weak intensity where the isotopic distributions deviate significantly from their theoretical distributions. Also, it correctly identified some isotopic clusters that were not found by THRASH-based implementations, especially those for which THRASH gave 1 Da mismatches. Another advantage of our method is that it is very fast, much faster than THRASH that calculates the least-squares fit.

Lecture Notes in Computer Science | 2004

A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets

Dong Kyue Kim; Junha Jo; Heejin Park

The suffix array of a string T is basically a sorted list of all the suffixes of T. Suffix arrays have been fundamental index data structures in computational biology. If we are to search a DNA sequence in a genome sequence, we construct the suffix array for the genome sequence and then search the DNA sequence in the suffix array. In this paper, we consider the construction of the suffix array of T of length n where the size of the alphabet is fixed. It has been well-known that one can construct the suffix array of T in O(n) time by constructing suffix tree of T and traversing the suffix tree. Although this approach takes O(n) time, it is not appropriate for practical use because it uses a lot of spaces and it is complicated to implement. Recently, almost at the same time, several algorithms have been developed to directly construct suffix arrays in O(n) time. However, these algorithms are developed for integer alphabets and thus do not exploit the properties given when the size of the alphabet is fixed. We present a fast algorithm for constructing suffix arrays for the fixed-size alphabet. Our algorithm constructs suffix arrays faster than any other algorithms developed for integer or general alphabets when the size of the alphabet is fixed. For example, we reduced the time required for constructing suffix arrays for DNA sequences by 25%-38%. In addition, we do not sacrifice the space to improve the running time. The space required by our algorithm is almost equal to or even less than those required by previous fast algorithms.

Theoretical Computer Science | 2001

Parallel algorithms for red-black trees

Heejin Park; Kunsoo Park

Abstract We present parallel algorithms for the following four operations on red–black trees: construction, search, insertion, and deletion. Our parallel algorithm for constructing a red–black tree from a sorted list of n items runs in O (1) time with n processors on the CRCW PRAM and runs in O ( log log n) time with n/ log log n processors on the EREW PRAM. Our construction algorithm does not require the assumptions that previous construction algorithms used. Each of our parallel algorithms for search, insertion, and deletion in red–black trees runs in O ( log n+ log k) time with k processors on the EREW PRAM, where k is the number of unsorted items to search for, insert, or delete and n is the number of nodes in a red–black tree.

international workshop on combinatorial algorithms | 2013

Suffix Tree of Alignment: An Efficient Index for Similar Data

Joong Chae Na; Heejin Park; Maxime Crochemore; Jan Holub; Costas S. Iliopoulos; Laurent Mouchard; Kunsoo Park

We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings A and B is a compacted trie representing all suffixes in A and B. It has |A| + |B| leaves and can be constructed in O(|A| + |B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of A and B.

string processing and information retrieval | 2009

Consensus Optimizing Both Distance Sum and Radius

Amihood Amir; Gad M. Landau; Joong Chae Na; Heejin Park; Kunsoo Park; Jeong Seop Sim

The consensus string problem is finding a representative string (consensus) of a given set

Proteomics | 2014

Compact variant‐rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses

Heejin Park; J. Bae; Hyunwoo Kim; Sangok Kim; Hokeun Kim; Dong Gi Mun; Yoonsung Joh; Wonyeop Lee; Sehyun Chae; Sanghyuk Lee; Hark Kyun Kim; Daehee Hwang; Sang Won Lee; Eunok Paek

\mathbb{S}

Explore More