Sunho Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sunho Lee is active.

Explore More

Publication

Featured researches published by Sunho Lee.

Analytical Chemistry | 2008

Isotopic Peak Intensity Ratio Based Algorithm for Determination of Isotopic Clusters and Monoisotopic Masses of Polypeptides from High-Resolution Mass Spectrometric Data

Kunsoo Park; Joo Young Yoon; Sunho Lee; Eunok Paek; Heejin Park; Hee-Jung Jung; Sang-Won Lee

Determining isotopic clusters and their monoisotopic masses is a first step in interpreting complex mass spectra generated by high-resolution mass spectrometers. We propose a mathematical model for isotopic distributions of polypeptides and an effective interpretation algorithm. Our model uses two types of ratios: intensity ratio of two adjacent peaks and intensity ratio product of three adjacent peaks in an isotopic distribution. These ratios can be approximated as simple functions of a polypeptide mass, the values of which fall within certain ranges, depending on the polypeptide mass. Given a spectrum as a peak list, our algorithm first finds all isotopic clusters consisting of two or more peaks. Then, it scores clusters using the ranges of ratio functions and computes the monoisotopic masses of the identified clusters. Our method was applied to high-resolution mass spectra obtained from a Fourier transform ion cyclotron resonance (FTICR) mass spectrometer coupled to reverse-phase liquid chromatography (RPLC). For polypeptides whose amino acid sequences were identified by tandem mass spectrometry (MS/MS), we applied both THRASH-based software implementations and our method. Our method was observed to find more masses of known peptides when the numbers of the total clusters identified by both methods were fixed. Experimental results show that our method performed better for isotopic mass clusters of weak intensity where the isotopic distributions deviate significantly from their theoretical distributions. Also, it correctly identified some isotopic clusters that were not found by THRASH-based implementations, especially those for which THRASH gave 1 Da mismatches. Another advantage of our method is that it is very fast, much faster than THRASH that calculates the least-squares fit.

combinatorial pattern matching | 2007

Dynamic rank-select structures with applications to run-length encoded texts

Sunho Lee; Kunsoo Park

Given an n-length text over a σ-size alphabet, we propose a dynamic rank-select structure that supports O((1 + log σ/log log n) log n) time operations in n log σ + o(n log σ) bits space. If σ < log n, then the operation time is O(log n). In addition, we consider both static and dynamic rank-select structures on the run-length encoding (RLE) of a text. For an n′-length RLE of an n-length text, we present a static structure that gives O(1) time select and O(log log σ) time rank using n′ log σ + O(n) bits and a dynamic structure that provides O((1 + log σ/log log n) log n) time operations in n′ log σ + o(n′ log σ) + O(n) bits.

string processing and information retrieval | 2013

Suffix Array of Alignment: A Practical Index for Similar Data

Joong Chae Na; Heejin Park; Sunho Lee; Minsung Hong; Thierry Lecroq; Laurent Mouchard; Kunsoo Park

The suffix tree of alignment is an index data structure for similar strings. Given an alignment of similar strings, it stores all suffixes of the alignment, called alignment-suffixes. An alignment-suffix represents one suffix of a string or suffixes of multiple strings starting at the same position in the alignment. The suffix tree of alignment makes good use of similarity in strings theoretically. However, suffix trees are not widely used in biological applications because of their huge space requirements, and instead suffix arrays are used in practice. In this paper we propose a space-economical version of the suffix tree of alignment, named the suffix array of alignment (SAA). Given an alignment i¾? of similar strings, the SAA for i¾? is a lexicographically sorted list of all the alignment-suffixes of i¾?. The SAA supports pattern search as efficiently as the generalized suffix array. Our experiments show that our index uses only 14% of the space used by the generalized suffix array to index 11 human genome sequences. The space efficiency of our index increases as the number of the genome sequences increases. We also present an efficient algorithm for constructing the SAA.

Journal of Proteome Research | 2010

Improved Quantitative Analysis of Mass Spectrometry using Quadratic Equations

Joo Young Yoon; Kyung Young Lim; Sunho Lee; Kunsoo Park; Eunok Paek; Un-Beom Kang; Jeonghun Yeom; Cheolju Lee

Protein quantification is one of the principal computational problems in mass spectrometry (MS) based proteomics. For robust and trustworthy protein quantification, accurate peptide quantification must be preceded. In recent years, stable isotope labeling has become the most popular method for relative quantification of peptides. However, some stable isotope labeling methods may carry a critical problem, which is an overlap of isotopic clusters. If the mass difference between the light- and heavy-labeled peptides is very small, the overlap of their isotopic clusters becomes larger as the mass of original peptide increases. Here we propose a new algorithm for peptide quantification that separates overlapping isotopic clusters using quadratic equations. It can be easily applied in Trans-Proteomic Pipeline (TPP) instead of XPRESS. For the mTRAQ-labeled peptides obtained by an Orbitrap mass spectrometer, it showed more accurate ratios and better standard deviations than XPRESS. Especially, for the peptides that do not contain lysine, the ratio difference between XPRESS and our algorithm became larger as the peptide masses increased. We expect that this algorithm can also be applied to other labeling methods such as (18)O labeling and acrylamide labeling.

Journal of computing science and engineering | 2009

Dynamic Compressed Representation of Texts with Rank/Select

Sunho Lee; Kunsoo Park

Given an n-length text T over a σ-size alphabet, we present a compressed representation of T which supports retrieving queries of rank/select/access and updating queries of insert/delete. For a measure of compression, we use the empirical entropy H(T), which defines a lower bound nH(T) bits for any algorithm to compress T of n log σ bits. Our representation takes this entropy bound of T, i.e., nH(T) ≤ n log σ bits, and an additional bits less than the text size, i.e., o(n log σ) + O(n) bits. In compressed space of nH(T) + o(n log σ) + O(n) bits, our representation supports O(log n) time queries for a log n-size alphabet and its extension provides O((1+logσ/log log n) log n) time queries for a σ-size alphabet.

Journal of Proteome Research | 2012

Monoisotopic Mass Determination Algorithm for Selenocysteine-Containing Polypeptides from Mass Spectrometric Data Based on Theoretical Modeling of Isotopic Peak Intensity Ratios

Jin Wook Kim; Sunho Lee; Kunsoo Park; Seungjin Na; Eunok Paek; Hyung Seo Park; Heejin Park; Kong-Joo Lee; Jaeho Jeong; Hwa-Young Kim

Selenoproteins, containing selenocysteine (Sec, U) as the 21st amino acid in the genetic code, are well conserved from bacteria to human, except yeast and higher plants that miss the Sec insertion machinery. Determination of Sec association is important to find substrates and to understand redox action of selenoproteins. While mass spectrometry (MS) has become a common and powerful tool to determine an amino acid sequence of a protein, identification of a protein sequence containing Sec was not easy using MS because of the limited stability of Sec in selenoproteins. Se has six naturally occurring isotopes, ⁷⁴Se, ⁷⁶Se, ⁷⁷Se, ⁷⁸Se, ⁸⁰Se, and ⁸²Se, and ⁸⁰Se is the most abundant isotope. These characteristics provide a good indicator for selenopeptides but make it difficult to detect selenopeptides using software analysis tools developed for common peptides. Thus, previous reports verified MS scans of selenopeptides by manual inspection. None of the fully automated algorithms have taken into account the isotopes of Se, leading to the wrong interpretation for selenopeptides. In this paper, we present an algorithm to determine monoisotopic masses of selenocysteine-containing polypeptides. Our algorithm is based on a theoretical model for an isotopic distribution of a selenopeptide, which regards peak intensities in an isotopic distribution as the natural abundances of C, H, N, O, S, and Se. Our algorithm uses two kinds of isotopic peak intensity ratios: one for two adjacent peaks and another for two distant peaks. It is shown that our algorithm for selenopeptides performs accurately, which was demonstrated with two LC-MS/MS data sets. Using this algorithm, we have successfully identified the Sec-Cys and Sec-Sec cross-linking of glutaredoxin 1 (GRX1) from mass spectra obtained by UPLC-ESI-q-TOF instrument.

database systems for advanced applications | 2017

Optimizing Scalar User-Defined Functions in In-Memory Column-Store Database Systems

Cheol Ryu; Sunho Lee; Kihong Kim; Kunsoo Park; Yong Sik Kwon; Sang Kyun Cha; Changbin Song; Emanuel Ziegler; Stephan Muench

User-defined functions such as currency conversion and factory calendar are important ingredients in many business applications. Since currency conversion and factory calendar are expensive user-defined functions, optimizing these functions is essential to high performance business applications. We optimize scalar user-defined functions by caching function call results. In this paper we investigate which method for function result caching is best in the context of in-memory column-store database systems. Experiments show that our method, which implements a function result cache as an array, combined with SAP HANA in-memory column store provides the high performance required by real-time global business applications.

very large data bases | 2014

Interval disaggregate: a new operator for business planning

Sang Kyun Cha; Kunsoo Park; Changbin Song; Kihong Kim; Cheol Ryu; Sunho Lee

Business planning as well as analytics on top of large-scale database systems is valuable to decision makers, but planning operations known and implemented so far are very basic. In this paper we propose a new planning operation called interval disaggregate, which goes as follows. Suppose that the planner, typically the management of a company, plans sales revenues of its products in the current year. An interval of the expected revenue for each product in the current year is computed from historical data in the database as the prediction interval of linear regression on the data. A total target revenue for the current year is given by the planner. The goal of the interval disaggregate operation is to find an appropriate disaggregation of the target revenue, considering the intervals. We formulate the problem of interval disaggregation more precisely and give solutions for the problem. Multidimensional geometry plays a crucial role in the problem formulation and the solutions. We implemented interval disaggregation into the planning engine of SAP HANA and did experiments on real-world data. Our experiments show that interval disaggregation gives more appropriate solutions with respect to historical data than the known basic disaggregation called referential disaggregation. We also show that interval disaggregation can be combined with the deseasonalization technique when the dataset shows seasonal fluctuations.

Water Science and Technology | 1996