Jimeng Sun | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jimeng Sun is active.

Explore More

Publication

Featured researches published by Jimeng Sun.

knowledge discovery and data mining | 2007

GraphScope: parameter-free mining of large time-evolving graphs

Jimeng Sun; Christos Faloutsos; Spiros Papadimitriou; Philip S. Yu

How can we find communities in dynamic networks of socialinteractions, such as who calls whom, who emails whom, or who sells to whom? How can we spot discontinuity time-points in such streams of graphs, in an on-line, any-time fashion? We propose GraphScope, that addresses both problems, using information theoretic principles. Contrary to the majority of earlier methods, it needs no user-defined parameters. Moreover, it is designed to operate on large graphs, in a streaming fashion. We demonstrate the efficiency and effectiveness of our GraphScope on real datasets from several diverse domains. In all cases it produces meaningful time-evolving patterns that agree with human intuition.

knowledge discovery and data mining | 2006

Beyond streams and graphs: dynamic tensor analysis

Jimeng Sun; Dacheng Tao; Christos Faloutsos

How do we find patterns in author-keyword associations, evolving over time? Or in Data Cubes, with product-branch-customer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks and many more. However, they have only two orders, like author and keyword, in the above example.We propose to envision such higher order data as tensors,and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce the dynamic tensor analysis (DTA) method, and its variants. DTA provides a compact summary for high-order and high-dimensional data, and it also reveals the hidden correlations. Algorithmically, we designed DTA very carefully so that it is (a) scalable, (b) space efficient (it does not need to store the past) and (c) fully automatic with no need for user defined parameters. Moreover, we propose STA, a streaming tensor analysis method, which provides a fast, streaming approximation to DTA.We implemented all our methods, and applied them in two real settings, namely, anomaly detection and multi-way latent semantic indexing. We used two real, large datasets, one on network flow data (100GB over 1 month) and one from DBLP (200MB over 25 years). Our experiments show that our methods are fast, accurate and that they find interesting patterns and outliers on the real datasets.

international conference on data mining | 2005

Neighborhood formation and anomaly detection in bipartite graphs

Jimeng Sun; Huiming Qu; Deepayan Chakrabarti; Christos Faloutsos

Many real applications can be modeled using bipartite graphs, such as users vs. files in a P2P system, traders vs. stocks in a financial trading system, conferences vs. authors in a scientific publication network, and so on. We introduce two operations on bipartite graphs: 1) identifying similar nodes (Neighborhood formation), and 2) finding abnormal nodes (Anomaly detection). And we propose algorithms to compute the neighborhood for each node using random walk with restarts and graph partitioning; we also propose algorithms to identify abnormal nodes, using neighborhood information. We evaluate the quality of neighborhoods based on semantics of the datasets, and we also measure the performance of the anomaly detection algorithm with manually injected anomalies. Both effectiveness and efficiency of the methods are confirmed by experiments on several real datasets.

IEEE Transactions on Circuits and Systems for Video Technology | 2008

Bayesian Tensor Approach for 3-D Face Modeling

Dacheng Tao; Mingli Song; Xuelong Li; Jialie Shen; Jimeng Sun; Xindong Wu; Christos Faloutsos; Stephen J. Maybank

Effectively modeling a collection of three-dimensional (3-D) faces is an important task in various applications, especially facial expression-driven ones, e.g., expression generation, retargeting, and synthesis. These 3-D faces naturally form a set of second-order tensors-one modality for identity and the other for expression. The number of these second-order tensors is three times of that of the vertices for 3-D face modeling. As for algorithms, Bayesian data modeling, which is a natural data analysis tool, has been widely applied with great success; however, it works only for vector data. Therefore, there is a gap between tensor-based representation and vector-based data analysis tools. Aiming at bridging this gap and generalizing conventional statistical tools over tensors, this paper proposes a decoupled probabilistic algorithm, which is named Bayesian tensor analysis (BTA). Theoretically, BTA can automatically and suitably determine dimensionality for different modalities of tensor data. With BTA, a collection of 3-D faces can be well modeled. Empirical studies on expression retargeting also justify the advantages of BTA.

international conference on data engineering | 2004

Querying about the past, the present, and the future in spatio-temporal databases

Jimeng Sun; Dimitris Papadias; Yufei Tao; Bin Liu

Moving objects (e.g., vehicles in road networks) continuously generate large amounts of spatio-temporal information in the form of data streams. Efficient management of such streams is a challenging goal due to the highly dynamic nature of the data and the need for fast, online computations. We present a novel approach for approximate query processing about the present, past, or the future in spatio-temporal databases. In particular, we first propose an incrementally updateable, multidimensional histogram for present-time queries. Second, we develop a general architecture for maintaining and querying historical data. Third, we implement a stochastic approach for predicting the results of queries that refer to the future. Finally, we experimentally prove the effectiveness and efficiency of our techniques using a realistic simulation.

international conference on data mining | 2006

Local Correlation Tracking in Time Series

Spiros Papadimitriou; Jimeng Sun; Philip S. Yu

We address the problem of capturing and tracking local correlations among time evolving time series. Our approach is based on comparing the local auto-covariance matrices (via their spectral decompositions) of each series and generalizes the notion of linear cross-correlation. In this way, it is possible to concisely capture a wide variety of local patterns or trends. Our method produces a general similarity score, which evolves over time, and accurately reflects the changing relationships. Finally, it can also be estimated incrementally, in a streaming setting. We demonstrate its usefulness, robustness and efficiency on a wide range of real datasets.

international conference on data engineering | 2003

Selectivity estimation for predictive spatio-temporal queries

Yufei Tao; Jimeng Sun; Dimitris Papadias

We propose a cost model for selectivity estimation of predictive spatio-temporal window queries. Initially, we focus on uniform data proposing formulae that capture both points and rectangles, and any type of object/query mobility combination (i.e., dynamic objects, dynamic queries or both). Then, we apply the model to nonuniform datasets by introducing spatio-temporal histograms, which in addition to the spatial, also consider the velocity distributions during partitioning. The advantages of our techniques are (i) high accuracy (1-2 orders of magnitude lower error than previous techniques), (ii) ability to handle all query types, and (iii) efficient handling of updates.

ACM Transactions on Database Systems | 2003

Analysis of predictive spatio-temporal queries

Yufei Tao; Jimeng Sun; Dimitris Papadias

Given a set of objects S, a spatio-temporal window query q retrieves the objects of S that will intersect the window during the (future) interval qT. A nearest neighbor query q retrieves the objects of S closest to q during qT. Given a threshold d, a spatio-temporal join retrieves the pairs of objects from two datasets that will come within distance d from each other during qT. In this article, we present probabilistic cost models that estimate the selectivity of spatio-temporal window queries and joins, and the expected distance between a query and its nearest neighbor(s). Our models capture any query/object mobility combination (moving queries, moving objects or both) and any data type (points and rectangles) in arbitrary dimensionality. In addition, we develop specialized spatio-temporal histograms, which take into account both location and velocity information, and can be incrementally maintained. Extensive performance evaluation verifies that the proposed techniques produce highly accurate estimation on both uniform and non-uniform data.

Sigkdd Explorations | 2005

Relevance search and anomaly detection in bipartite graphs

Jimeng Sun; Huiming Qu; Deepayan Chakrabarti; Christos Faloutsos

Many real applications can be modeled using bipartite graphs, such as users vs. files in a P2P system, traders vs. stocks in a financial trading system, conferences vs. authors in a scientific publication network, and so on. We introduce two operations on bipartite graphs: 1) identifying similar nodes (relevance search), and 2) finding nodes connecting irrelevant nodes (anomaly detection). And we propose algorithms to compute the relevance score for each node using random walk with restarts and graph partitioning; we also propose algorithms to identify anomalies, using relevance scores. We evaluate the quality of relevance search based on semantics of the datasets, and we also measure the performance of the anomaly detection algorithm with manually injected anomalies. Both effectiveness and efficiency of the methods are confirmed by experiments on several real datasets.

international conference on data engineering | 2007

Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking

Feifei Li; Jimeng Sun; Spiros Papadimitriou; George A. Mihaila; Ioana Stanoi

We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a well-studied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be performed incrementally, using limited processing time and buffer space, making batch approaches unsuitable. Second, the characteristics of streams evolve over time. Consequently, approaches based on global analysis of the data are not adequate. We show that it is possible to efficiently and effectively track the correlation and autocorrelation structure of multivariate streams and leverage it to add noise which maximally preserves privacy, in the sense that it is very hard to remove. Our techniques achieve much better results than previous static, global approaches, while requiring limited processing time and memory. We provide both a mathematical analysis and experimental evaluation on real data to validate the correctness, efficiency, and effectiveness of our algorithms.

Explore More