SeungJin Lim
Utah State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by SeungJin Lim.
international conference on semantic computing | 2007
Marco A. Alvarez; SeungJin Lim
The problem of measuring the semantic similarity between pairs of words has been considered a fundamental operation in data mining and information retrieval. Nevertheless, developing a computational method capable of generating satisfactory results close to what humans would perceive is still a difficult task somewhat owed to the subjective nature of similarity. In this paper, it is presented a novel algorithm for scoring the semantic similarity (SSA) between words. Given two input words w1and w2, SSA exploits their corresponding concepts, relationships, and descriptive glosses available in WordNet in order to build a rooted weighted graph Gsim. The output score is calculated by exploring the concepts present in Gsim and selecting the minimal distance between any two concepts c1 and c2 of w1 and w2 respectively. The definition of distance is a combination of: 1) the depth of the nearest common ancestor between c1 and c2 in Gsim, 2) the intersection of the descriptive glosses of c1 and c2, and 3) the shortest distance between c1 and c2 in Gsim. A correlation of 0.913 has been achieved between the results by SSA and the human ratings reported by Miller and Charles (1991) for a dataset of 28 pairs of nouns. Furthermore, using the full dataset of 65 pairs presented by Rubenstein and Goodenough (1965), the correlation between SSA results and the known human ratings is 0.903, which is higher than all other reported algorithms for the same dataset. The high correlations of SSA with human ratings suggest that SSA would be convenient in solving several data mining and information retrieval problems.
acm symposium on applied computing | 2009
Zhongshan Lin; SeungJin Lim
Existing level-wise spatial co-location algorithms suffer from generating extra, non-clique candidate instances and thus requires cliqueness checking at every level. In this paper, we propose a novel, spatial co-location mining algorithm which automatically generates co-located spatial features without generating any non-clique candidates at any level. Subsequently our algorithm generates less candidates than other existing level-wise co-location algorithms without losing any information. The benefit of our algorithm has been clearly observed at an earlier stage in the mining process.
conference on information and knowledge management | 2008
Zhongshan Lin; SeungJin Lim
In this paper, we propose a novel, spatial co-location mining algorithm which automatically generates co-located spatial features without generating any non-clique candidates at each level. Subsequently our algorithm is more efficient than other existing level-wise co-location algorithms because no cliqueness checking is performed in our algorithm. In addition, our algorithm produces a smaller number of co-location candidates than the other existing algorithms.
multimedia and ubiquitous engineering | 2007
Dennis Muhlestein; SeungJin Lim
On-line communities on the Internet are highly self- organizing, dynamic and ubiquitous. The prime interest of peers in this community is often sharing common interest, even when compromising privacy. This paper presents a peer coordination strategy and a data sharing process for peers on the Internet which allows them to discover their common interest in terms of sets of frequently visited URLs. To this end, sample data was collected by randomly following links on popular Websites to simulate the algorithm in operation. Experiments were then performed to compare the number of discovered frequently visited URL sets and association rules with the overhead induced by our network.
acm symposium on applied computing | 2009
Omar U. Florez; SeungJin Lim
We propose a novel algorithm to extract time series from video to characterize the type of motion embedded in the video. Our method relies on describing the motion exposed in a video as a collection of spatiotemporal gradients. Each gradient models high variation in the respective region of the video both in space and time with respect to its spatiotemporal neighborhood. Rather than obtaining a coarse sampling of the motion by taking one event per frame, we obtain a continuous function by considering all the events that fall in the short-time slicing window of time length equal to the value of the temporal variance. The result is a composed time series that represents the motion in the video independent of rotation and scale. As an empirical demonstration of the viability of our method, we are able to cluster human motions contained in 114 videos into hand-based motions and foot-based motions with the precision of 86.0% and 75.9% respectively.
international symposium on multimedia | 2008
SeungJin Lim
Many relationships in data mining such as frequent itemsets and similarity are undirected. An effective visualization of such relationships at a large scale offers a valuable visual feedback about the dataset or data mining results to the user by which the users understanding of the target information is greatly enhanced. We present a highly interactive 3D visual data exploration tool with a high graphical quality in this paper. Visual data exploration using this tool on various real and synthetic datasets in main memory was effective and promising.
database and expert systems applications | 2008
Omar U. Florez; SeungJin Lim
Indexing is the most effective technique to speed up queries in databases. While traditional indexing approaches are used for exact search, a query object may not be always identical to an existing data object in similarity search. This paper proposes a new dynamic data structure called Hypherspherical Region Graph (HRG) to efficiently index a large volume of data objects as a graph for similarity search in metric spaces. HRG encodes the given dataset in a smaller number of vertices than the known graph index, Incremental-RNG, while providing flexible traversal without incurring backtracking as observed in tree-based indices. An empirical analysis performed on search time shows that HRG outperforms Incremental-RNG in both cases. HRG, however, outperforms tree-based indices in range search only when the data dimensionality is not so high.
international conference on digital information management | 2007
Marco A. Alvarez; SeungJin Lim
This paper presents a solution for the problem of finding interchangeable words in the context of an input collection of strings. Interchangeable words are words that can be replaced indistinctly in phrases or free text without deviating its actual meaning. Under restricted conditions, pairs of interchangeable might be useful for data deduplication, copy detection, software localization, among others. The calculation of the degree of interchangeability involves the accurate calculation of semantic similarity between pairs of words and the search for candidate pairs in the overall search space imposed by the input collection. The solution presented in this paper is composed by a search method for candidate pairs using the Levenshtein distance algorithm and a novel algorithm - SSA -for calculating the semantic similarity between words. The proposed solution was implemented and tested within a real world application related to a string message database from a software development company. The system was used to build an ontology with clusters of interchangeable words.
International Journal of Web Information Systems | 2006
SeungJin Lim; Youngrae Ko
Web resource mining for one‐stop learning is an effort to turn the Web into a convenient and valuable resource for education for the self‐motivated, knowledge seeking student. It is aimed at providing an efficient and effective algorithm to generate an extremely small set of self‐contained Web pages which are adequate for the student to study well about the technical subject of her choice on her own pace without requiring clicking through numerous linked resources. In this paper, we present three different scoring measures which can be plugged into such an algorithm designed for the objective stated above. We also demonstrate the effectiveness of the algorithms proposed in this paper which are equipped with a choice of the three scoring measures by showing their promising experimental results. Our algorithms achieved up to 87% of precision in average in automatically finding relatively suitable Web resources for one‐stop learning as opposed to 9% of precision offered by general purpose search engines.
international symposium on multimedia | 2016
SeungJin Lim
With the unprecedented wave of Big Data, the importance of information visualization is catching greater momentum. Understanding the underlying relationships between constituent entities is a common task in every branch of science, and visualization of such relationships is a critical part of data analysis. While the techniques for the visualization of binary relationships are widespread, visualization techniques for ternary or higher relationships are lacking. In this paper, we propose a visualization primitive which is suitable for multiway relationships. Its effectiveness is demonstrated in 3-D visualization.