King-Ip Lin
University of Memphis
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by King-Ip Lin.
very large data bases | 1994
King-Ip Lin; H. V. Jagadish; Christos Faloutsos
We propose a file structure to index high-dimensionality data, which are typically points in some feature space. The idea is to use only a few of the features, using additional features only when the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such “varying length” feature vectors. Finally, we report simulation results, comparing the proposed structure with theR*-tree, which is one of the most successful methods for low-dimensionality spaces.The results illustrate the superiority of our method, which saves up to 80% in disk accesses.
international conference on data engineering | 2001
Congyun Yang; King-Ip Lin
The Reverse Nearest Neighbor (RNN) problem is to find all points in a given data set whose nearest neighbor is a given query point. Just like the Nearest Neighbor (NN) queries, the RNN queries appear in many practical situations such as marketing and resource management. Thus, efficient methods for the RNN queries in databases are required. The paper introduces a new index structure, the Rdnn-tree, that answers both RNN and NN queries efficiently. A single index structure is employed for a dynamic database, in contrast to the use of multiple indexes in previous work. This leads to significant savings in dynamically maintaining the index structure. The Rdnn-tree outperforms existing methods in various aspects. Experiments on both synthetic and real world data show that our index structure outperforms previous methods by a significant margin (more than 90% in terms of number of leaf nodes accessed) in RNN queries. It also shows improvement in NN queries over standard techniques. Furthermore, performance in insertion and deletion is significantly enhanced by the ability to combine multiple queries (NN and RNN) in one traversal of the tree. These facts make our index structure extremely preferable in both static and dynamic cases.
Storage and Retrieval for Image and Video Databases | 1997
Vikrant Kobla; David S. Doermann; King-Ip Lin; Christos Faloutsos
Development of various multimedia applications hinges on the availability of fast and efficient storage, browsing, indexing, and retrieval techniques. Given that video is typically stored efficiently in a compressed format, if we can analyze the compressed representation directly, we can avoid the costly overhead of decompressing and operating at the pixel level. Compressed domain parsing of video has been presented in earlier work where a video clip is divided into shots, subshots, and scenes. In this paper, we describe key frame selection, feature extraction, and indexing and retrieval techniques that are directly applicable to MPEG compressed video. We develop a frame-type independent representation of the various types of frames present in an MPEG video in which al frames can be considered equivalent. Features are derived from the available DCT, macroblock, and motion vector information and mapped to a low-dimensional space where they can be accessed with standard database techniques. The spatial information is used as primary index while the temporal information is used to enhance the robustness of the system during the retrieval process. The techniques presented enable fast archiving, indexing, and retrieval of video. Our operational prototype typically takes a fraction of a second to retrieve similar video scenes from our database, with over 95% success.
Proceedings of SPIE | 1996
Vikrant Kobla; David S. Doermann; King-Ip Lin
Fast and efficient storage, indexing, browsing, and retrieval of video is a necessity for the development of various multimedia database applications. This can be achieved by analyzing the video directly in the compressed domain, thereby avoiding the overhead of decompressing video into individual frames in the pixel domain. Our compressed domain parsing of video performs shot change detection and motion detection using the data readily accessible from MPEG, with minimal decoding. Key frames are identified and are used for indexing, retrieval, and browsing. In this paper, we describe feature extraction and key frame indexing and retrieval techniques that are directly applicable to compressed video. The features are derived form the available DCT, macroblock, and motion vector information and the techniques enable fast parsing and archiving of video.
knowledge discovery and data mining | 1999
Jason Tsong-Li Wang; Xiong Wang; King-Ip Lin; Dennis E. Shasha; Bruce A. Shapiro; Kaizhong Zhang
A distance-mapping algorithm takes a set of objects and a distance metric and then maps those objects to a Euclidean or pseudoEuclidean space in such a way that the distances among objects are approximately preserved. Distance mapping algorithms are a useful tool for clustering and visualization in data intensive applications, because they replace expensive distance calculations by sum-of-square calculations. This can make clustering in large databases with expensive distance metrics practical. In this paper we present five distance-mapping algorithms and conduct experiments to compare their performance in data clustering applications. These include two algorithms called FastMap and MetricMap, and three hybrid heuristics that combine the two algorithms in different ways. Experimental results on both synthetic and RNA data show the superiority of the hybrid algorithms. The results imply that FastMap and MetricMap capture complementary information about distance metrics and therefore can be used together to great benefit. The net effect is that multi-day computations may be done in minutes.
database systems for advanced applications | 2001
King-Ip Lin; Ravikumar Kondadadi
Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects, hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (similarity-based soft clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC requires only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.
Knowledge and Information Systems | 2000
Xiong Wang; Jason Tsong-Li Wang; King-Ip Lin; Dennis E. Shasha; Bruce A. Shapiro; Kaizhong Zhang
Abstract. In this paper we present an index structure, called MetricMap, that takes a set of objects and a distance metric and then maps those objects to a k-dimensional space in such a way that the distances among objects are approximately preserved. The index structure is a useful tool for clustering and visualization in data-intensive applications, because it replaces expensive distance calculations by sum-of-square calculations. This can make clustering in large databases with expensive distance metrics practical. We compare the index structure with another data mining index structure, FastMap, recently proposed by Faloutsos and Lin, according to two criteria: relative error and clustering accuracy. For relative error, we show that (i) FastMap gives a lower relative error than MetricMap for Euclidean distances, (ii) MetricMap gives a lower relative error than FastMap for non-Euclidean distances (i.e., general distance metrics), and (iii) combining the two reduces the error yet further. A similar result is obtained when comparing the accuracy of clustering. These results hold for different data sizes. The main qualitative conclusion is that these two index structures capture complementary information about distance metrics and therefore can be used together to great benefit. The net effect is that multi-day computations can be done in minutes.
data and knowledge engineering | 2006
Eric Lo; Kevin Y. Yip; King-Ip Lin; David W. Cheung
Skyline queries return a set of interesting data points that are not dominated on all dimensions by any other point. Most of the existing algorithms focus on skyline computation in centralized databases, and some of them can progressively return skyline points upon identification rather than all in a batch. Processing skyline queries over the Web is a more challenging task because in many Web applications, the target attributes are stored at different sites and can only be accessed through restricted external interfaces. In this paper, we develop PDS (progressive distributed skylining), a progressive algorithm that evaluates skyline queries efficiently in this setting. The algorithm is also able to estimate the percentage of skyline objects already retrieved, which is useful for users to monitor the progress of long running skyline queries. Our performance study shows that PDS is efficient and robust to different data distributions and achieves its progressive goal with a minimal overhead.
international database engineering and applications symposium | 2003
King-Ip Lin; Michael Nolen; Congjun Yang
Reverse nearest neighbors queries have emerged as an important class of queries for spatial and other types of databases. The Rdnn-tree is an R-tree based structure that has been shown to perform outstandingly for such kind of queries. However, one practical problem facing it (as well as other type of indexes) is how to effectively construct the index from stretch. In this case, the cost of constructing and maintaining a Rdnn-Tree is about twice the cost of an R-Tree. Normal insertion into a Rdnn-Tree is performed one point at a time, known as single point insertion. The question arises, can insertion be improved there by reducing the construction and maintenance cost. In this paper we propose a bulk-loading technique, which is capable of significantly, improve the performance of constructing the index from stretch, as well as insert a large amount of data. Experiments show that our method outperforms the single point insertion significantly.
conference on information and knowledge management | 2001
Andrian Marcus; Jonathan I. Maletic; King-Ip Lin
A new extension of the Boolean association rules, ordinal association rules, that incorporates ordinal relationships among data items, is introduced. One use for ordinal rules is to identify possible errors in data. A method that finds these rules and identifies potential errors in data is proposed.