Christos Faloutsos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christos Faloutsos is active.

Explore More

Publication

Featured researches published by Christos Faloutsos.

acm special interest group on data communication | 1999

On power-law relationships of the Internet topology

Michalis Faloutsos; Petros Faloutsos; Christos Faloutsos

Despite the apparent randomness of the Internet, we discover some surprisingly simple power-laws of the Internet topology. These power-laws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period. We show that our power-laws fit the real data very well resulting in correlation coefficients of 96% or higher.Our observations provide a novel perspective of the structure of the Internet. The power-laws describe concisely skewed distributions of graph properties such as the node outdegree. In addition, these power-laws can be used to estimate important parameters such as the average neighborhood size, and facilitate the design and the performance analysis of protocols. Furthermore, we can use them to generate and select realistic topologies for simulation purposes.

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms | 1993

Efficient Similarity Search In Sequence Databases

Rakesh Agrawal; Christos Faloutsos; Arun N. Swami

We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parsevals theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

international conference on management of data | 1994

Fast subsequence matching in time-series databases

Christos Faloutsos; M. Ranganathan; Yannis Manolopoulos

We present an efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequences into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*-tree [9]. In more detail, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into sub-trails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times.

ACM Transactions on Knowledge Discovery From Data | 2007

Graph evolution: Densification and shrinking diameters

Jure Leskovec; Jon M. Kleinberg; Christos Faloutsos

How do real graphs evolve over time? What are normal growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)). Existing graph generation models do not exhibit these types of behavior even at a qualitative level. We provide a new graph generator, based on a forest fire spreading process that has a simple, intuitive justification, requires very few parameters (like the flammability of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study. We also notice that the forest fire model exhibits a sharp transition between sparse graphs and graphs that are densifying. Graphs with decreasing distance between the nodes are generated around this transition point. Last, we analyze the connection between the temporal evolution of the degree distribution and densification of a graph. We find that the two are fundamentally related. We also observe that real networks exhibit this type of relation between densification and the degree distribution.

international conference on data engineering | 2003

LOCI: fast outlier detection using the local correlation integral

Spiros Papadimitriou; Hiroyuki Kitagawa; Phillip B. Gibbons; Christos Faloutsos

Outlier detection is an integral part of data mining and has attracted much attention recently [M. Breunig et al., (2000)], [W. Jin et al., (2001)], [E. Knorr et al., (2000)]. We propose a new method for evaluating outlierness, which we call the local correlation integral (LOCI). As with the best previous methods, LOCI is highly effective for detecting outliers and groups of outliers (a.k.a. micro-clusters). In addition, it offers the following advantages and novelties: (a) It provides an automatic, data-dictated cutoff to determine whether a point is an outlier-in contrast, previous methods force users to pick cut-offs, without any hints as to what cut-off value is best for a given dataset. (b) It can provide a LOCI plot for each point; this plot summarizes a wealth of information about the data in the vicinity of the point, determining clusters, micro-clusters, their diameters and their inter-cluster distances. None of the existing outlier-detection methods can match this feature, because they output only a single number for each point: its outlierness score, (c) Our LOCI method can be computed as quickly as the best previous methods, (d) Moreover, LOCI leads to a practically linear approximate method, aLOCI (for approximate LOCI), which provides fast highly-accurate outlier detection. To the best of our knowledge, this is the first work to use approximate computations to speed up outlier detection. Experiments on synthetic and real world data sets show that LOCI and aLOCI can automatically detect outliers and micro-clusters, without user-required cut-offs, and that they quickly spot both expected and unexpected outliers.

international conference on data mining | 2009

PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

U Kang; Charalampos E. Tsourakakis; Christos Faloutsos

In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components etc.) are essentially a repeated matrix-vector multiplication. In this paper we describe a very important primitive for PEGASUS, called GIM-V (Generalized Iterated Matrix-Vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web Graphs, thanks to Yahoo!, with 6,7 billion edges.

symposium on reliable distributed systems | 2003

Epidemic spreading in real networks: an eigenvalue viewpoint

Yang Wang; Deepayan Chakrabarti; Chenxi Wang; Christos Faloutsos

How will a virus propagate in a real network? Does an epidemic threshold exist for a finite graph? How long does it take to disinfect a network given particular values of infection rate and virus death rate? We answer the first question by providing equations that accurately model virus propagation in any network including real and synthesized network graphs. We propose a general epidemic threshold condition that applies to arbitrary graphs: we prove that, under reasonable approximations, the epidemic threshold for a network is closely related to the largest eigenvalue of its adjacency matrix. Finally, for the last question, we show that infections tend to zero exponentially below the epidemic threshold. We show that our epidemic threshold model subsumes many known thresholds for special-case graphs (e.g., Erdos-Renyi, BA power-law, homogeneous); we show that the threshold tends to zero for infinite power-law graphs. We show that our threshold condition holds for arbitrary graphs.

IEEE Transactions on Knowledge and Data Engineering | 2001

Analysis of the clustering properties of the Hilbert space-filling curve

Bongki Moon; H. V. Jagadish; Christos Faloutsos; Joel H. Saltz

Several schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatio-temporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering (Abel and Mark, 1990; Jagadish, 1990). We analyze the clustering property of the Hilbert space-filling curve by deriving closed-form formulas for the number of clusters in a given query region of an arbitrary shape (e.g., polygons and polyhedra). Both the asymptotic solution for the general case and the exact solution for a special case generalize previous work. They agree with the empirical results that the number of clusters depends on the hypersurface area of the query region and not on its hypervolume. We also show that the Hilbert curve achieves better clustering than the z curve. From a practical point of view, the formulas given provide a simple measure that can be used to predict the required disk access behaviors and, hence, the total access time.

very large data bases | 1994

The TV-tree: an index structure for high-dimensional data

King-Ip Lin; H. V. Jagadish; Christos Faloutsos

We propose a file structure to index high-dimensionality data, which are typically points in some feature space. The idea is to use only a few of the features, using additional features only when the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such “varying length” feature vectors. Finally, we report simulation results, comparing the proposed structure with theR*-tree, which is one of the most successful methods for low-dimensionality spaces.The results illustrate the superiority of our method, which saves up to 80% in disk accesses.

ACM Computing Surveys | 2006

Graph mining: Laws, generators, and algorithms

Deepayan Chakrabarti; Christos Faloutsos

How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M : N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs?” To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.

Explore More