Eric P. Jiang
University of San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eric P. Jiang.
Linear Algebra and its Applications | 2000
Eric P. Jiang; Michael W. Berry
Abstract The singular value decomposition (SVD) is a well-known theoretical and numerical tool used in numerous scientific and engineering applications. Recently, an interesting nonlinear generalization of the SVD, referred to as the Riemannian SVD (R-SVD), has been proposed by De Moor for applications in systems and control. This decomposition can be modified and used to formulate an enhanced implementation of latent semantic indexing (LSI) for conceptual information retrieval. LSI is an SVD-based conceptual retrieval technique and employs a rank-reduced model of the original (sparse) term-by-document matrix. Updating LSI models based on user feedback can be accomplished using constraints modeled by the R-SVD of a low-rank approximation to the original term-by-document matrix. In this work, a new algorithm for computing the R-SVD is described. When used to update an LSI model, this R-SVD algorithm can be a highly effective information filtering technique. Experiments demonstrate that a 20% improvement (in retrieval) over the current LSI model is possible.
Lecture Notes in Computer Science | 1998
Eric P. Jiang; Michael W. Berry
The Riemannian SVD (or R-SVD) is a recent nonlinear generalization of the SVD which has been used for specific applications in systems and control. This decomposition can be modified and used to formulate a filtering-based implementation of Latent Semantic Indexing (LSI) for conceptual information retrieval. With LSI, the underlying semantic structure of a collection is represented in k-dimensional space using a rank-k approximation to the corresponding (sparse) term-bydocument matrix. Updating LSI models based on user feedback can be accomplished using constraints modeled by the R-SVD of a low-rank approximation to the original term-by-document matrix.
Numerical Linear Algebra With Applications | 2005
Ricardo D. Fierro; Eric P. Jiang
Variations of the latent semantic indexing (LSI) method in information retrieval (IR) require the computation of singular subspaces associated with the k dominant singular values of a large m× n sparse matrix A, where k min(m; n). The Riemannian SVD was recently generalized to low-rank matrices arising in IR and shown to be an e ective approach for formulating an enhanced semantic model that captures the latent term-document structure of the data. However, in terms of storage and computation requirements, its implementation can be much improved for large-scale applications. We discuss an e cient and reliable algorithm, called SPK-RSVD-LSI, as an alternative approach for deriving the enhanced semantic model. The algorithm combines the generalized Riemannian SVD and the Lanczos method with full reorthogonalization and explicit restart strategies. We demonstrate that our approach performs as well as the original low-rank Riemannian SVD method by comparing their retrieval performance on a well-known benchmark document collection. Copyright ? 2004 John Wiley & Sons, Ltd.
international conference on intelligent computing | 2006
Eric P. Jiang
As a semantic vector space model for information retrieval (IR), Latent Semantic Indexing (LSI) employs singular value decomposition (SVD) to transform individual documents into the statistically derived semantic vectors. In this paper a new junk email (spam) filtering model, 2LSI-SF, is proposed and it is based on the augmented category LSI spaces and classifies email messages by their content. The model utilizes the valuable discriminative information in the training data and incorporates several pertinent feature selection and message classification algorithms. The experiments of 2LSI-SF on a benchmark spam testing corpus (PU1) and a newly compiled Chinese spam corpus (ZH1) have been conducted. The results from the experiments and performance comparison with the popular Support Vector Machines (SVM) and naive Bayes classifiers have shown that 2LSI-SF is capable of filtering spam effectively.
International Journal of Knowledge-based and Intelligent Engineering Systems | 2007
Eric P. Jiang
Over the years, various spam email filtering technology and anti-spam software products have been developed and deployed. Some are designed to stop spam email at the server level, and others apply machine learning algorithms at the client level to identify spam email based on message content. In this paper, a new spam filtering model, RBF-SF, is proposed that detects and classifies email messages by a radial basis function (RBF) network. The model utilizes the valuable email discriminative information from training data and can incorporate additional background email in its learning process. The empirical results of RBF-SF on two benchmark spam testing corpora and a performance comparison with several other popular text classifiers have shown that the model is capable of filtering spam email effectively.
intelligent data analysis | 2009
Eric P. Jiang
Semi-supervised text classification has numerous applications and is particularly applicable to the problems where large quantities of unlabeled data are readily available while only a small number of labeled training samples are accessible. The paper proposes a semi-supervised classifier that integrates a clustering based Expectation Maximization (EM) algorithm into radial basis function (RBF) neural networks and can learn for classification from a very small number of labeled training samples and a large pool of unlabeled data effectively. A generalized centroid clustering algorithm is also investigated in this work to balance predictive values between labeled and unlabeled training data and to improve classification accuracy. Experimental results with three popular text classification corpora show that the proper use of additional unlabeled data in this semi-supervised approach can reduce classification errors by up to 26%.
advanced data mining and applications | 2007
Darren Davis; Eric P. Jiang
This work addresses the problem of Web searching for pages relevant to a query URL. Based on an approach that uses a deep linkage analysis among vicinity pages, we investigate the Web page content structures and propose two new algorithms that integrate content and linkage analysis for more effective page relationship discovery and relevance ranking. A prototypical Web searching system has recently been implemented and experiments on the system have shown that the new content and linkage based searching methods deliver improved performance and are effective in identifying semantically relevant Web pages.
international conference on computer science and information technology | 2010
Eric P. Jiang
The paper deals with the text classification problem where labeled training samples are very limited while unlabeled data are readily available in large quantities. The paper proposes an efficient classification algorithm that incorporates a weighted k-means clustering scheme into an Expectation Maximization (EM) process. It aims to balance predictive values between labeled and unlabeled training data and improve classification accuracy. Since the algorithm is based on a fast clustering method, it can be applied to classify documents in large datasets. Preliminary experiments with several text classification collections show that the proper use of unlabeled data built in this proposed text classification algorithm could significantly improve classification accuracy.
asia information retrieval symposium | 2008
Eric P. Jiang
Text classification is a problem applied to natural language texts that assigns a document into one or more predefined categories, based on its content. In this paper, we present an automatic text classification model that is based on the Radial Basis Function (RBF) networks. It utilizes valuable discriminative information in training data and incorporates background knowledge in model learning. This approach can be particularly advantageous for applications where labeled training data are in short supply. The proposed model has been applied for classifying spam email, and the experiments on some benchmark spam testing corpus have shown that the model is effective in learning to classify documents based on content and represents a competitive alternative to the well-known text classifiers such as naive Bayes and SVM.
intelligent information systems | 2004
Eric P. Jiang
Latent semantic indexing (LSI) is a rank-reduced vector space model and has demonstrated an improved retrieval performance over traditional lexical searching methods. By applying the singular value decomposition (SVD) to the original term by document space, LSI transforms individual terms into the statistically derived conceptual indices and is capable of retrieving information based on the semantic content. Recently, an updated LSI model, referred to as RSVD- LSI, has been proposed [5,6] for effective information retrieval. It updates LSI based on user feedback and can be formulated By a modified Riemannian SVD for a low-rank matrix. In this paper, an new efficient implementation of RSVD-LSI is discribed and the applications and performance analysis of RSVD-LSI on dynamic document collections are discussed. The effectiveness of RSVD-LSI as a conceptual information retrieval technique is demonstrated by experiments on some document collections.