Shuting Xu
Virginia State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shuting Xu.
Knowledge and Information Systems | 2006
Shuting Xu; Jun Zhang; Dianwei Han; Jie Wang
Privacy-preserving is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.
acm southeast regional conference | 2007
Beibei Li; Shuting Xu; Jun Zhang
Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.
intelligence and security informatics | 2005
Shuting Xu; Jun Zhang; Dianwei Han; Jie Wang
Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.
International Journal of Information and Computer Security | 2008
Jie Wang; Jun Zhang; Shuting Xu; Weijun Zhong
Data privacy preservation has become one of the major concerns in the design of practical data-mining applications. In this paper, a novel data distortion approach based on structural partition and Sparsified Singular Value Decomposition (SSVD) technique is proposed. Three schemes are designed to balance privacy protection in centralised datasets and mining accuracy. Some metrics are used to evaluate the performance of the proposed new strategies. Data utility of the three proposed schemes is examined by a binary classification based on the support vector machine. Furthermore, we examine three sparsification strategies. The effect of method parameters on data distortion level and utility is also studied experimentally. Our experimental results on synthetic and real datasets indicate that, in comparison with standard data distortion techniques, the proposed schemes are efficient in balancing data distortion level and data utility. They afford a feasible solution with a good promise for mining accuracy and a significant reduction in the computational cost from SVD.
acm southeast regional conference | 2008
Dianwei Han; Shuting Xu; Jun Zhang
Condition number of a matrix is an important measure in numerical analysis and linear algebra. It is a measure of stability or sensitivity of a matrix to numerical operations. However, the direct computation of the condition number of a matrix is very expensive in terms of CPU and memory cost, and becomes prohibitive for large size matrices. We propose to use data mining techniques to estimate the condition number of a given sparse matrix. In particular, we will use Support Vector Machine (SVM) to predict the condition numbers. That is, after computing the sparsity pattern features of a matrix, we use support vector regression (SVR) to predict its condition number. This Online Condition Number Query System (OCNQS) allows the users to submit their matrices and to obtain predicted condition numbers for their matrices. The accuracy of our prediction methods may not be as precise as the direct computation methods, but it is much faster. Our online system accepts matrices in Harwell-Boeing (HB) format and in standard MATLAB format. The users can use our system to estimate the condition number of their matrices through LAPACK software as well.
acm southeast regional conference | 2008
Dianwei Han; Shuting Xu; Jun Zhang
Solving very large sparse linear systems are often encountered in many scientific and engineering applications. Generally there are two classes of methods available to solve the sparse linear systems. The first class is the direct solution methods, represented by the Gauss elimination method. The second class is the iterative solution methods, of which the preconditioned Krylov subspace methods are considered to be the most effective ones currently available in this field. The sparsity structure and the numerical value distribution which are considered as features of the sparse matrices may have important effect on the iterative solution of linear systems. We first extract the matrix features, and then preconditioned iterative methods are used to the linear system. Our experiments show that a few features that may affect, positively or negatively, the solving status of a sparse matrix with the level-based preconditioners.
IKE | 2006
Jie Wang; Weijun Zhong; Jun Zhang; Shuting Xu
Encyclopedia of Data Warehousing and Mining | 2009
Jun Zhang; Jie Wang; Shuting Xu
Archive | 2007
Jun Zhang; Jie Wang; Shuting Xu
siam international conference on data mining | 2005
Shuting Xu; Jun Zhang