Justin JongSu Song
Inha University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Justin JongSu Song.
Cluster Computing | 2017
Jumi Kim; Wookey Lee; Justin JongSu Song; Soo Bok Lee
As a new data processing era like Big Data, Cloud Computing, and Internet of Things approaches, the amount of data being collected in databases far exceeds the ability to reduce and analyze these data without the use of automated analysis techniques, data mining. As the importance of data mining has grown, one of the critical issues to emerge is how to scale data mining techniques to larger and complex databases so that it is particularly imperative for computationally intensive data mining tasks such as identifying natural clusters of instances. In this paper, we suggest an optimized combinatorial clustering algorithm for noisy performance which is essential for large data with random sampling. The algorithm outperforms conventional approaches through various numerical and qualitative thresholds like mean and standard deviation of accuracy and computation speed.
international conference on cloud and green computing | 2012
Wookey Lee; Carson Kai-Sang Leung; Justin JongSu Song; Chris Soo-Hyun Eom
Due to its popularity, influence propagation model has been recently exploited in several social network applications. However, there are some limitations in applying the model to the social network, in which negative information is propagated. In this paper, we present an effective information propagation model to overcome these limitations. Our minimum cost flow model effectively propagates influences to neighbouring nodes with minimum costs in each path of the social network. The model removes noise associated with social network marketing information and propagates influences without overlapping in information nodes.
asia pacific web conference | 2011
Wookey Lee; Justin JongSu Song; Carson Kai-Sang Leung
Skyline query is an effective method to process large-sized multidimensional data sets as it can pinpoint the target data so that dominated data (say, 95% of data) can be efficiently excluded as unnecessary data objects. However, most of the conventional skyline algorithms were developed to handle numerical data. Thus, most of the text data were excluded from being processed by the algorithms. In this paper, we pioneer an entirely new domain for skyline query--namely, the categorical data--with which the corresponding ranking measures for the skyline queries are developed. We tested our proposed algorithm using the ACM Computing Classification System.
The Journal of Supercomputing | 2017
Justin JongSu Song; Wookey Lee
High-recall retrieval problem, aiming at finding the full set of relevant documents in a huge result set by effective mining techniques, is particularly useful for patent information retrieval, legal document retrieval, medical document retrieval, market information retrieval, and literature review. The existing high-recall retrieval methods, however, have been far from satisfactory to retrieve all relevant documents due to not only high-recall and precision threshold measurements but also a sheer minimize the number of reviewed documents. To address this gap, we generalize the problem to a novel high-recall retrieval model, which can be represented as finding all needles in a giant haystack. To compute candidate groups consisting of k relevant documents efficiently, we propose dynamic diverse retrieval algorithms specialized for the patent-searching method, in which an effective dynamic interactive retrieval can be achieved. In the various types of datasets, the dynamic ranking method shows considerable improvements with respect to time and cost over the conventional static ranking approaches.
Multimedia Tools and Applications | 2015
Simon Soon-Hyoung Park; Justin JongSu Song; James Jung-Hoon Lee; Wookey Lee; Sang-Bok Ree
How to measure similarity or distance for multiple categorical data? It is an important step for Data Mining and Knowledge Management process to measure similarity or distance between objects appropriately. Measurements for continuous data have been well-defined and relatively easy to be calculated. However, the notion of similarity for categorical data is not simple, since categorical data usually is not simply translated into the numerical format, and they also have their own priority with structures and data distribution. In this paper, we propose a new measure for multiple categorical data sets using data distribution. Our new measure, MCSM (Multiple Categorical Similarity Measure), can solve conventional drawbacks of multiple categorical data sets successfully in which we prove the verification of our measure with mathematical proofs and experimentation. The experimental result shows that our measure is powerful for multiple categorical data sets with proper data distributions.
ieee international conference on dependable, autonomic and secure computing | 2011
Wookey Lee; James Jung-Hun Lee; Justin JongSu Song; Chris Soo-Hyun Eom
As the overall size of the social networks has been exponentially expanding, the technological advance of efficient social networks search can flourish in the academic sphere, business corporations, and public institutions. Additionally, facing environmental upheaval in the field of information fusion and consilience, an effective development of various social network searches is gaining significance. In this research, we propose the use of the Maximum Reliable Tree algorithm that is newly developed based on a graph-based method, as a generic technique which facilitates effective social network search and that can be the most reliable social network search method for the promptly appearing smart phone technologies. We presented the methods excellence in performance by demonstrating core arguments in formal descriptions, and illustrating experimental results.
international conference on big data and cloud computing | 2014
Wookey Lee; Carson Kai-Sang Leung; Justin JongSu Song
Patents have been considered as key enablers for many knowledge- and information-centric companies as well as institutes. The higher the required patent capability, the more important is the need for an effective and efficient patent search system. Many conventional patent search systems produce unsatisfactory results for patent queries because the inherent search systems come from traditional keyword-based models, which inevitably lead to too many unrelated items in the search results. Consequently, these systems cost the patent experts lots of time to iteratively refine search results manually. In this paper, we propose a specialized patent-searching method, in which relationships between the keywords within each document and their implication for each patent document are investigated. With this elaborated ranking capability, keywords for valid patents are placed in higher ranks and those for noise patents are placed in sub-ranked data positions. As a benefit, our method significantly eliminates noisy data from the search results. Hence, our method is very useful for recall-oriented search for patents. Experimental results with real-life datasets show that our method outperformed many conventional patent search systems with respect to time and cost.
international conference on big data | 2017
Justin JongSu Song; Jiyoung Lim; Chris Soo-Hyun Eom; Wookey Lee
The objective recall-oriented retrieval tasks such as patent retrieval, legal search, and e-Discovery is to find all relevant documents. These information retrieval tasks are usually conducted by the domain experts so the retrieval cost might be high. However, the existing evaluation metrics for recall-oriented retrieval tasks have obvious limitations which consider only high recall value. If the relevant documents are at the lower position in the total ranked result, the user should check the whole result up to the end. Therefore, the evaluation measure for recall-oriented retrieval tasks is strongly required reducing the review efforts. In this paper, we study the feature of the various evaluation metrics according to the ranks of the query result in the recall-oriented retrieval tasks.
international conference on big data and smart computing | 2016
Justin JongSu Song; Wookey Lee; Jafar Afshar
Patent has currently been captured strong attention as a key enabler for the knowledge and information centric companies and institutes. The higher the patent capability required, the more important an effective and efficient patent retrieval system needed. The conventional patent retrieval systems, however, have produced unsatisfactory results for the patent queries, since the inherent search systems would have come from the traditional keyword based models so that it has been inevitable to result in too many unrelated items. This has made the patent experts keep spending a lot of time to refine the results manually. We propose two dynamic ranking algorithms specialized patent-searching method, in which the dynamic interactive retrieval can be achieved. In the real USPTO dataset experiment, the dynamic ranking method shows substantial improvements with respect to time and cost over conventional static ranking approaches.
Proceedings of the Sixth International Conference on Emerging Databases | 2016
Justin JongSu Song; Jiyoung Lim; Wookey Lee; Jafar Afshar
How to resolve query ambiguity and how to avoid redundancies in a search result? The redundancy in returned results(e.g., near duplicates) has a negative effect on retrieval effectiveness(i.e., user satisfaction) and there is less benefit in representing relevant yet redundant results to the user repeatedly. So the ambiguity of query needs to be reflected in the returned results to account for the uncertainty on the users information need. In a diversity context, the user is usually interested in retrieving various types of relevant documents (the number of information needs) more than the ones which are at the top of the result list. In this paper, we present a new document re-ranking method for information retrieval using Data Envelopment Analysis (DEA) and Diversity Retrieval Measure (DRM). The goal of the proposed re-ranking system is to diversify the documents results from the original ranking list. The experimentation is performed on hundreds of random Decision Making Units (DMUs) and the consequence achieved is compared with the existing system. The result demonstrates that the new method satisfies the unspecified individuals when the query is ambiguous. It also shows that the diversifying method is effective to satisfy the user who wants to get many types of information.