Mi-Jung Choi
Kangwon National University
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Mi-Jung Choi.
Security and Communication Networks | 2017
Sang-Pil Kim; Myeong-Sun Gil; Hajin Kim; Mi-Jung Choi; Yang-Sae Moon; Hee-Sun Won
Secure similar document detection (SSDD) identifies similar documents of two parties while each party does not disclose its own sensitive documents to another party. In this paper, we propose an efficient 2-step protocol that exploits a feature selection as the lower-dimensional transformation and presents discriminative feature selections to maximize the performance of the protocol. For this, we first analyze that the existing 1-step protocol causes serious computation and communication overhead for high dimensional document vectors. To alleviate the overhead, we next present the feature selection-based 2-step protocol and formally prove its correctness. The proposed 2-step protocol works as follows: (1) in the filtering step, it uses low dimensional vectors obtained by the feature selection to filter out non-similar documents; (2) in the post-processing step, it identifies similar documents only from the non-filtered documents by using the 1-step protocol. As the feature selection, we first consider the simplest one, random projection (RP), and propose its 2-step solution SSDD-RP. We then present two discriminative feature selections and their solutions: SSDD-LF (local frequency) which selects a few dimensions locally frequent in the current querying vector and SSDD-GF (global frequency) which selects ones globally frequent in the set of all document vectors. We finally propose a hybrid one, SSDD-HF (hybrid frequency), that takes advantage of both SSDD-LF and SSDD-GF. We empirically show that the proposed 2-step protocol outperforms the 1-step protocol by three or four orders of magnitude.
Multimedia Tools and Applications | 2014
Bum-Soo Kim; Yang-Sae Moon; Mi-Jung Choi; Jinho Kim
In this paper we propose a time-series matching-based approach that provides the interactive boundary image matching with noise control for a large-scale image database. To achieve the noise reduction effect in boundary image matching, we exploit the moving average transform of time-series matching. We are motivated by a simple intuition that the moving average transform might reduce the noise of boundary images as well as that of time-series data. To confirm this intuition, we first propose a new notion of k-order image matching, which applies the moving average transform to boundary image matching. A boundary image can be represented as a sequence in the time-series domain, and our k-order image matching identifies similar boundary images in this time-series domain by comparing the k-moving average transformed sequences. We then propose an index-based method that efficiently performs k-order image matching on a large image database, and formally prove its correctness. We also formally analyze the relationship of orders and their matching results and present an interactive approach of controlling the noise reduction effect. Experimental results show that our k-order image matching exploits the noise reduction effect well, and our index-based method outperforms the sequential scan by one or two orders of magnitude. These results indicate that our k-order image matching and its index-based solution provide a very practical way of realizing the noise control boundary image matching. To our best knowledge, the proposed interactive approach for large-scale image databases is the first attempt to solve the noise control problem in the time-series domain rather than the image domain by exploiting the efficient time-series matching techniques. Thus, our approach can be widely used in removing other types of distortions in image matching areas.
Security and Communication Networks | 2016
Hee-Sun Won; Sang-Pil Kim; Sanghun Lee; Mi-Jung Choi; Yang-Sae Moon
Privacy preservation becomes an important issue in recent big data analysis, and many secure multiparty computations have been proposed for the purpose of privacy preservation in the environment of distributed nodes. As a secure multiparty computations of principal component analysis PCA, in this paper, we propose S-PCA, which compute PCA securely among the distributed nodes. PCA is widely used in many applications including time-series analysis, text mining, and image compression. In general, we compute PCA after concentrating all data in a single server, but this approach discloses data privacy of each node. In contrast, the proposed S-PCA computes PCA without disclosing the sensitive data of individual nodes. In S-PCA, the nodes share non-sensitive mean vectors first and compute covariance matrices and PCA securely using the shared mean vectors. In this paper, we formally prove the correctness and secureness of S-PCA and apply it to an application of secure similar document detection. Experimental results show that the performance of S-PCA is slightly worse than that of PCA due to guarantee of secureness, but it significantly improves the performance of secure similar document detection by up to two orders of magnitudes. Copyright
international conference on big data and smart computing | 2015
Myeong-Seon Gil; Bum-Soo Kim; Mi-Jung Choi; Yang-Sae Moon
In this paper we address a problem of how we can construct a multidimensional index efficiently in distortion-free subsequence matching. In the previous distortion-free subsequence matching, the index construction is a very time-consuming process since it generates a huge number of data subsequences to consider all possible positions and all possible query lengths. The real experimental results show that, the index construction time reaches several hours for a time-series with a million entries, and this means that the index construction itself is very difficult for large time-series databases. To solve this problem, we first formally analyze the index construction steps, then try to optimize the performance of each step, and finally propose two advanced algorithms of constructing a multidimensional index very fast. In particular, we present the novel concept of store-and-reuse principle, a dynamic programming technique, which stores the intermediate results and reuses them repeatedly in the next steps. Through the store-and-reuse principle, the proposed algorithms construct a multidimensional index much faster than the previous algorithm. Analytical and empirical evaluations showcase the superiority of the proposed algorithms. For a time-series of length 300,000, we reduce the index construction time from 100 minutes to 7.5 minutes, which is one or two orders of magnitude.
Archive | 2014
Sanghun Lee; Bum-Soo Kim; Mi-Jung Choi; Yang-Sae Moon
In this paper, we propose an approximate solution to the multi-step k-NN search. The traditional multi-step k-NN search (1) determines a tolerance through a k-NN query on a multidimensional index and (2) retrieves the final k results by evaluating the tolerance-based range query on the index and by accessing the actual database. The proposed tolerance reduction-based (approximate) solution reduces a large number of candidates by adjusting the tolerance of the range query on the index. To obtain the tight tolerance, the proposed solution forcibly decreases the tolerance by the average ratio of high-dimensional and low-dimensional distances. Experimental results show that the proposed approximate solution significantly reduces the number of candidates and the k-NN search time over the existing one.
Multimedia Tools and Applications | 2018
Sanghun Lee; Hajin Kim; Mi-Jung Choi; Yang-Sae Moon
In this paper, we address the problem of boundary image matching that supports symmetric invariance. Supporting the symmetric invariance is an important factor to provide more intuitive and more correct results in boundary image matching. Previous boundary image matching methods, however, deal with mainly image rotations without consideration of symmetric transformations. In this paper, we propose a time-series-based boundary image matching that supports the symmetric invariance as well as the previous rotation invariance. For this, we first formally define the concept of a boundary time-series and its symmetric time-series. We then present a novel notion of symmetric-rotation property that the rotation-invariant matching result is always the same for all possible symmetric angles. We next discuss how to efficiently extract a symmetric time-series from an image boundary by presenting the domain independent property that both time-series domain and image domain methods produce the same symmetric time-series. Experimental results show that the proposed symmetric-invariant matching provides the more intuitive result compared with the previous rotation-invariant matching. To our best knowledge, this is the first attempt that solves the symmetric-invariant boundary matching problem in the simple time-series domain rather than in the complex image domain.
International Journal of Distributed Sensor Networks | 2018
Hajin Kim; Myeong-Seon Gil; Yang-Sae Moon; Mi-Jung Choi
In order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random sampling in the stream environment. For this, we first analyze the uniformity confidence of KSample and then derive two uniformity confidence degradation problems: (1) initial degradation, which rapidly decreases the uniformity confidence in the initial stage, and (2) continuous degradation, which gradually decreases the uniformity confidence in the later stages. We note that the initial degradation is caused by the sample range limitation and the past sample invariance, and the continuous degradation by the sampling range increase. For each problem, we present a corresponding solution, that is, we provide the sample range extension for sample range limitation, the past sample change for past sample invariance, and the use of UC-window for sampling range increase. By reflecting these solutions, we then propose a novel sampling method, named UC-KSample, which largely improves the uniformity confidence. Experimental results show that UC-KSample improves the uniformity confidence over KSample by 2.2 times on average, and it always keeps the uniformity confidence higher than the user-specified threshold. We also note that the sampling accuracy of UC-KSample is higher than that of KSample in both numeric sensor data and text data. The uniformity confidence is an important sampling metric in sensor data streams, and this is the first attempt to apply uniformity confidence to KSample. We believe that the proposed UC-KSample is an excellent approach that adopts an advantage of KSample, dynamic sampling over a fixed sampling ratio, while improving the uniformity confidence.
international conference on big data and smart computing | 2015
Bum-Soo Kim; Myeong-Seon Gil; Mi-Jung Choi; Yang-Sae Moon
Removing noise, called denoising, is an essential factor for achieving the intuitive and accurate results in boundary image matching. This paper deals with a partial denoising problem that tries to allow a limited amount of noise embedded in boundary images. To solve this problem, we first define partial denoising time-series that can be generated from an original image time-series by removing a variety of partial noises. We then propose an efficient mechanism that quickly obtains those partial denoising time-series in the time-series domain rather than the image domain. Next, we present the partial denoising distance, which is the minimum distance from a query time-series to all possible partial denoising time-series generated from a data time-series. We then use this partial denoising distance as a similarity measure in boundary image matching. Using the partial denoising distance, however, incurs a severe computational overhead since there are a large number of partial denoising time-series to be considered. To solve this problem, we derive a tight lower bound for the partial denoising distance and formally prove its correctness. We also propose partial denoising boundary image matching exploiting the partial denoising distance in boundary image matching. Through extensive experiments, we finally show that our lower bound-based approach improves search performance by up to an order of magnitude in partial denoising-based boundary image matching.
Journal of KIISE | 2015
Sanghun Lee; Bum-Soo Kim; Mi-Jung Choi; Yang-Sae Moon
In this paper, we address the problem of improving the performance of multi-step k-NN search using multi-dimensional indexes. Due to information loss by lower-dimensional transformations, existing multi-step k-NN search solutions produce a large tolerance (i.e., a large search range), and thus, incur a large number of candidates, which are retrieved by a range query. Those many candidates lead to overwhelming I/O and CPU overheads in the postprocessing step. To overcome this problem, we propose two efficient solutions that improve the search performance by reducing the tolerance of a range query, and accordingly, reducing the number of candidates. First, we propose a tolerance reduction-based (approximate) solution that forcibly decreases the tolerance, which is determined by a k-NN query on the index, by the average ratio of high- and low-dimensional distances. Second, we propose a coefficient control-based (exact) solution that uses c k instead of k in a k-NN query to obtain a tigher tolerance and performs a range query using this tigher tolerance. Experimental results show that the proposed solutions significantly reduce the number of candidates, and accordingly, improve the search performance in comparison with the existing multi-step k-NN solution.
international conference on hybrid information technology | 2012
Myeong-Seon Gil; Bum-Soo Kim; Yang-Sae Moon; Mi-Jung Choi
In this paper, we address the problem of constructing a multi-dimensional index for time-series similarity search with handling distortions, called distortion-free subsequence matching. A naive algorithm for index construction in distortion-free subsequence matching is a very time-consuming process since it generates a huge number of data subsequences to consider all possible positions and all possible query lengths. In this paper, we formally analyze the index construction step and discuss how to improve the performance of each step. To improve the performance, we present a concept of DF-bucket, which stores the intermediate results and reuse them repeatedly in the next steps. We also present a novel notion of store-and-reuse principle, and using the principle we build a multidimensional index much faster than a naive algorithm.
