Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eric P. Jiang is active.

Publication


Featured researches published by Eric P. Jiang.


Linear Algebra and its Applications | 2000

Solving total least-squares problems in information retrieval☆

Eric P. Jiang; Michael W. Berry

Abstract The singular value decomposition (SVD) is a well-known theoretical and numerical tool used in numerous scientific and engineering applications. Recently, an interesting nonlinear generalization of the SVD, referred to as the Riemannian SVD (R-SVD), has been proposed by De Moor for applications in systems and control. This decomposition can be modified and used to formulate an enhanced implementation of latent semantic indexing (LSI) for conceptual information retrieval. LSI is an SVD-based conceptual retrieval technique and employs a rank-reduced model of the original (sparse) term-by-document matrix. Updating LSI models based on user feedback can be accomplished using constraints modeled by the R-SVD of a low-rank approximation to the original term-by-document matrix. In this work, a new algorithm for computing the R-SVD is described. When used to update an LSI model, this R-SVD algorithm can be a highly effective information filtering technique. Experiments demonstrate that a 20% improvement (in retrieval) over the current LSI model is possible.


Lecture Notes in Computer Science | 1998

Information Filtering Using the Riemannian SVD (R-SVD)

Eric P. Jiang; Michael W. Berry

The Riemannian SVD (or R-SVD) is a recent nonlinear generalization of the SVD which has been used for specific applications in systems and control. This decomposition can be modified and used to formulate a filtering-based implementation of Latent Semantic Indexing (LSI) for conceptual information retrieval. With LSI, the underlying semantic structure of a collection is represented in k-dimensional space using a rank-k approximation to the corresponding (sparse) term-bydocument matrix. Updating LSI models based on user feedback can be accomplished using constraints modeled by the R-SVD of a low-rank approximation to the original term-by-document matrix.


Numerical Linear Algebra With Applications | 2005

Lanczos and the Riemannian SVD in information retrieval applications

Ricardo D. Fierro; Eric P. Jiang

Variations of the latent semantic indexing (LSI) method in information retrieval (IR) require the computation of singular subspaces associated with the k dominant singular values of a large m× n sparse matrix A, where k min(m; n). The Riemannian SVD was recently generalized to low-rank matrices arising in IR and shown to be an e ective approach for formulating an enhanced semantic model that captures the latent term-document structure of the data. However, in terms of storage and computation requirements, its implementation can be much improved for large-scale applications. We discuss an e cient and reliable algorithm, called SPK-RSVD-LSI, as an alternative approach for deriving the enhanced semantic model. The algorithm combines the generalized Riemannian SVD and the Lanczos method with full reorthogonalization and explicit restart strategies. We demonstrate that our approach performs as well as the original low-rank Riemannian SVD method by comparing their retrieval performance on a well-known benchmark document collection. Copyright ? 2004 John Wiley & Sons, Ltd.


international conference on intelligent computing | 2006

Learning to Semantically Classify Email Messages

Eric P. Jiang

As a semantic vector space model for information retrieval (IR), Latent Semantic Indexing (LSI) employs singular value decomposition (SVD) to transform individual documents into the statistically derived semantic vectors. In this paper a new junk email (spam) filtering model, 2LSI-SF, is proposed and it is based on the augmented category LSI spaces and classifies email messages by their content. The model utilizes the valuable discriminative information in the training data and incorporates several pertinent feature selection and message classification algorithms. The experiments of 2LSI-SF on a benchmark spam testing corpus (PU1) and a newly compiled Chinese spam corpus (ZH1) have been conducted. The results from the experiments and performance comparison with the popular Support Vector Machines (SVM) and naive Bayes classifiers have shown that 2LSI-SF is capable of filtering spam effectively.


International Journal of Knowledge-based and Intelligent Engineering Systems | 2007

Detecting spam email by radial basis function networks

Eric P. Jiang

Over the years, various spam email filtering technology and anti-spam software products have been developed and deployed. Some are designed to stop spam email at the server level, and others apply machine learning algorithms at the client level to identify spam email based on message content. In this paper, a new spam filtering model, RBF-SF, is proposed that detects and classifies email messages by a radial basis function (RBF) network. The model utilizes the valuable email discriminative information from training data and can incorporate additional background email in its learning process. The empirical results of RBF-SF on two benchmark spam testing corpora and a performance comparison with several other popular text classifiers have shown that the model is capable of filtering spam email effectively.


intelligent data analysis | 2009

Semi-supervised Text Classification Using RBF Networks

Eric P. Jiang

Semi-supervised text classification has numerous applications and is particularly applicable to the problems where large quantities of unlabeled data are readily available while only a small number of labeled training samples are accessible. The paper proposes a semi-supervised classifier that integrates a clustering based Expectation Maximization (EM) algorithm into radial basis function (RBF) neural networks and can learn for classification from a very small number of labeled training samples and a large pool of unlabeled data effectively. A generalized centroid clustering algorithm is also investigated in this work to balance predictive values between labeled and unlabeled training data and to improve classification accuracy. Experimental results with three popular text classification corpora show that the proper use of additional unlabeled data in this semi-supervised approach can reduce classification errors by up to 26%.


advanced data mining and applications | 2007

Exploring Content and Linkage Structures for Searching Relevant Web Pages

Darren Davis; Eric P. Jiang

This work addresses the problem of Web searching for pages relevant to a query URL. Based on an approach that uses a deep linkage analysis among vicinity pages, we investigate the Web page content structures and propose two new algorithms that integrate content and linkage analysis for more effective page relationship discovery and relevance ranking. A prototypical Web searching system has recently been implemented and experiments on the system have shown that the new content and linkage based searching methods deliver improved performance and are effective in identifying semantically relevant Web pages.


international conference on computer science and information technology | 2010

Learning to integrate unlabeled data in text classification

Eric P. Jiang

The paper deals with the text classification problem where labeled training samples are very limited while unlabeled data are readily available in large quantities. The paper proposes an efficient classification algorithm that incorporates a weighted k-means clustering scheme into an Expectation Maximization (EM) process. It aims to balance predictive values between labeled and unlabeled training data and improve classification accuracy. Since the algorithm is based on a fast clustering method, it can be applied to classify documents in large datasets. Preliminary experiments with several text classification collections show that the proper use of unlabeled data built in this proposed text classification algorithm could significantly improve classification accuracy.


asia information retrieval symposium | 2008

Integrating background knowledge into RBF networks for text classification

Eric P. Jiang

Text classification is a problem applied to natural language texts that assigns a document into one or more predefined categories, based on its content. In this paper, we present an automatic text classification model that is based on the Radial Basis Function (RBF) networks. It utilizes valuable discriminative information in training data and incorporates background knowledge in model learning. This approach can be particularly advantageous for applications where labeled training data are in short supply. The proposed model has been applied for classifying spam email, and the experiments on some benchmark spam testing corpus have shown that the model is effective in learning to classify documents based on content and represents a competitive alternative to the well-known text classifiers such as naive Bayes and SVM.


intelligent information systems | 2004

An Enhanced Semantic Indexing Implementation for Conceptual Information Retrieval

Eric P. Jiang

Latent semantic indexing (LSI) is a rank-reduced vector space model and has demonstrated an improved retrieval performance over traditional lexical searching methods. By applying the singular value decomposition (SVD) to the original term by document space, LSI transforms individual terms into the statistically derived conceptual indices and is capable of retrieving information based on the semantic content. Recently, an updated LSI model, referred to as RSVD- LSI, has been proposed [5,6] for effective information retrieval. It updates LSI based on user feedback and can be formulated By a modified Riemannian SVD for a low-rank matrix. In this paper, an new efficient implementation of RSVD-LSI is discribed and the applications and performance analysis of RSVD-LSI on dynamic document collections are discussed. The effectiveness of RSVD-LSI as a conceptual information retrieval technique is demonstrated by experiments on some document collections.

Collaboration


Dive into the Eric P. Jiang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Darren Davis

University of San Diego

View shared research outputs
Top Co-Authors

Avatar

Ricardo D. Fierro

California State University San Marcos

View shared research outputs
Researchain Logo
Decentralizing Knowledge