Shuting Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shuting Xu is active.

Explore More

Publication

Featured researches published by Shuting Xu.

Knowledge and Information Systems | 2006

Singular value decomposition based data distortion strategy for privacy protection

Shuting Xu; Jun Zhang; Dianwei Han; Jie Wang

Privacy-preserving is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.

acm southeast regional conference | 2007

Enhancing clustering blog documents by utilizing author/reader comments

Beibei Li; Shuting Xu; Jun Zhang

Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.

intelligence and security informatics | 2005

Data distortion for privacy protection in a terrorist analysis system

Shuting Xu; Jun Zhang; Dianwei Han; Jie Wang

Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.

International Journal of Information and Computer Security | 2008

A novel data distortion approach via selective SSVD for privacy protection

Jie Wang; Jun Zhang; Shuting Xu; Weijun Zhong

Data privacy preservation has become one of the major concerns in the design of practical data-mining applications. In this paper, a novel data distortion approach based on structural partition and Sparsified Singular Value Decomposition (SSVD) technique is proposed. Three schemes are designed to balance privacy protection in centralised datasets and mining accuracy. Some metrics are used to evaluate the performance of the proposed new strategies. Data utility of the three proposed schemes is examined by a binary classification based on the support vector machine. Furthermore, we examine three sparsification strategies. The effect of method parameters on data distortion level and utility is also studied experimentally. Our experimental results on synthetic and real datasets indicate that, in comparison with standard data distortion techniques, the proposed schemes are efficient in balancing data distortion level and data utility. They afford a feasible solution with a good promise for mining accuracy and a significant reduction in the computational cost from SVD.

acm southeast regional conference | 2008

An online condition number query system

Dianwei Han; Shuting Xu; Jun Zhang

Condition number of a matrix is an important measure in numerical analysis and linear algebra. It is a measure of stability or sensitivity of a matrix to numerical operations. However, the direct computation of the condition number of a matrix is very expensive in terms of CPU and memory cost, and becomes prohibitive for large size matrices. We propose to use data mining techniques to estimate the condition number of a given sparse matrix. In particular, we will use Support Vector Machine (SVM) to predict the condition numbers. That is, after computing the sparsity pattern features of a matrix, we use support vector regression (SVR) to predict its condition number. This Online Condition Number Query System (OCNQS) allows the users to submit their matrices and to obtain predicted condition numbers for their matrices. The accuracy of our prediction methods may not be as precise as the direct computation methods, but it is much faster. Our online system accepts matrices in Harwell-Boeing (HB) format and in standard MATLAB format. The users can use our system to estimate the condition number of their matrices through LAPACK software as well.

acm southeast regional conference | 2008

The relationship between the features of sparse matrix and the matrix solving status

Dianwei Han; Shuting Xu; Jun Zhang

Solving very large sparse linear systems are often encountered in many scientific and engineering applications. Generally there are two classes of methods available to solve the sparse linear systems. The first class is the direct solution methods, represented by the Gauss elimination method. The second class is the iterative solution methods, of which the preconditioned Krylov subspace methods are considered to be the most effective ones currently available in this field. The sparsity structure and the numerical value distribution which are considered as features of the sparse matrices may have important effect on the iterative solution of linear systems. We first extract the matrix features, and then preconditioned iterative methods are used to the linear system. Our experiments show that a few features that may affect, positively or negatively, the solving status of a sparse matrix with the level-based preconditioners.

IKE | 2006