Anita Zakrzewska
Georgia Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anita Zakrzewska.
advances in social networks analysis and mining | 2015
Anita Zakrzewska; David A. Bader
A variety of massive datasets, such as social networks and biological data, are represented as graphs that reveal underlying connections, trends, and anomalies. Community detection is the task of discovering dense groups of vertices in a graph. Its one specific form is seed set expansion, which finds the best local community for a given set of seed vertices. Greedy, agglomerative algorithms, which are commonly used in seed set expansion, have been previously designed only for a static, unchanging graph. However, in many applications, new data is constantly produced, and vertices and edges are inserted and removed from a graph. We present an algorithm for dynamic seed set expansion, which incrementally updates the community as the underlying graph changes. We show that our dynamic algorithm outputs high quality communities that are similar to those found when using a standard static algorithm. The dynamic approach also improves performance compared to re-computation, achieving speedups of up to 600x.
Social Network Analysis and Mining | 2016
Anita Zakrzewska; David A. Bader
A variety of massive datasets, such as social networks and biological data, are represented as graphs that reveal underlying connections, trends, and anomalies. Community detection is the task of discovering dense groups of vertices in a graph. Its one specific form is seed set expansion, which finds the best local community for a given set of seed vertices. Greedy, agglomerative algorithms, which are commonly used in seed set expansion, have been previously designed only for a static, unchanging graph. However, in many applications, new data are constantly produced, and vertices and edges are inserted and removed from a graph. We present an algorithm for dynamic seed set expansion, which maintains a local community over time by incrementally updating as the underlying graph changes. We show that our dynamic algorithm outputs high-quality communities that are similar to those found when using a standard static algorithm. It works well both when beginning with an already existing graph and in the fully streaming case when starting with no data. The dynamic approach is also faster than re-computation when low latency updates are needed.
advances in social networks analysis and mining | 2017
Anita Zakrzewska; David A. Bader
Many graph datasets originating from online social network, financial or biological sources are too large to store or analyze. The analysis of such networks may be made more tractable if they are reduced to smaller subgraphs via sampling. While most of the known graph sampling methods are designed with static graphs in mind, many real datasets are massive and rapidly growing, making streaming methods necessary. We present two new techniques, Randomly Induced Edge Sampling (RIES) and Weighted Edge Sampling (WES). Both methods sample a stream of edges in a single pass, without the need to know future properties of the stream. In contrast to previous work that focused on limiting only the number of vertices, our methods restrict the number of edges, thus truly limiting the size of the sampled subgraph. We compare the performance of RIES and WES against the previously known streaming Random Edge (RE) method on eight social network datasets. Using four structural graph properties, we find that both RIES and WES produce subgraphs that are more structurally similar to the original graph than are the subgraphs produced by streaming RE. We also examine the sensitivity of the two algorithms with respect to their parameters. The parameters of WES affect its performance in a more predictable manner and are easier to set. Both new algorithms represent an improvement in the available streaming graph analysis toolkit.
Algorithms | 2017
Eisha Nathan; Anita Zakrzewska; E. Jason Riedy; David A. Bader
Analyzing massive graphs poses challenges due to the vast amount of data available. Extracting smaller relevant subgraphs allows for further visualization and analysis that would otherwise be too computationally intensive. Furthermore, many real data sets are constantly changing, and require algorithms to update as the graph evolves. This work addresses the topic of local community detection, or seed set expansion, using personalized centrality measures, specifically PageRank and Katz centrality. We present a method to efficiently update local communities in dynamic graphs. By updating the personalized ranking vectors, we can incrementally update the corresponding local community. Applying our methods to real-world graphs, we are able to obtain speedups of up to 60× compared to static recomputation while maintaining an average recall of 0.94 of the highly ranked vertices returned. Next, we investigate how approximations of a centrality vector affect the resulting local community. Specifically, our method guarantees that the vertices returned in the community are the highly ranked vertices from a personalized centrality metric.
advances in social networks analysis and mining | 2016
James P. Fairbanks; Anita Zakrzewska; David A. Bader
Spectral partitioning (clustering) algorithms use eigenvectors to solve network analysis problems. The relationship between numerical accuracy and network mining quality is insufficiently understood. We show that analyzing numerical accuracy and network mining quality together leads to an algorithmic improvement. Specifically, we study spectral partitioning using sweep cuts of approximate eigenvectors of the normalized graph Laplacian. We introduce a novel, theoretically sound, parameter free stopping criterion for iterative eigensolvers designed for graph partitioning. On a corpus of social networks, we validate this stopping criterion by showing the number of iterations is reduced by a factor of 4.15 on average, and the conductance is increased by only a factor of 1.24 on average. Regression analysis of these results shows that the decrease in the number of iterations needed is greater for problems with a small spectral gap, thus our stopping criterion helps more on harder problems. Experiments show that alternative stopping criteria are insufficient to ensure low conductance partitioning on real world networks. While our method guarantees partitions that satisfy the Cheeger Inequality, we find that it typically beats this guarantee on real world graphs.
international conference on parallel processing | 2013
Anita Zakrzewska; David A. Bader
The increasing energy consumption of high performance computing has resulted in rising operational and environmental costs. Therefore, reducing the energy consumption of computation is an emerging area of interest. We study the approach of data sampling to reduce the energy costs of sparse graph algorithms. The resulting error levels for several graph metrics are measured to analyze the trade-off between energy consumption reduction and error. The three types of graphs studied, real graphs, synthetic random graphs, and synthetic small-world graphs, each show distinct behavior. Across all graphs, the error cost is initially relatively low. For example, four of the five real graphs studied needed less than a third of total energy to retain a degree centrality rank correlation coefficient of \(0.85\) when random vertices were removed. However, the error incurred for further energy reduction grows at an increasing rate, providing diminishing returns.
international conference on cluster computing | 2017
Eisha Nathan; E. Jason Riedy; Anita Zakrzewska; Chunxing Yin
Applications in computer network security, social media analysis, and other areas rely on analyzing a changing environment. The data is rich in relationships and lends itself to graph analysis. Traditional static graph analysis cannot keep pace with network security applications analyzing nearly one million events per second and social networks like Facebook collecting 500 thousand comments per second. Streaming frameworks like STINGER support ingesting up three million of edge changes per second but there are few streaming analysis kernels that keep up with these rates. Here we introduce a new, non-stop model and use it to decouple the analysis from the data ingest.
advances in social networks analysis and mining | 2016
Anita Zakrzewska; Eisha Nathan; James P. Fairbanks; David A. Bader
In this work we present a new local, vertex-level measure of community change. Our measure detects vertices that change community membership due to the actions (edges) of a vertex itself and not only due to global community shifts. The local nature of our measure is important for analyzing real graphs because communities may change to a large degree from one snapshot in time to the next. Using both real and synthetic graphs, we compare our measure to an alternative, global approach. Both approaches detect community switching vertices in a synthetic example with little overall community change. However, when communities do not evolve smoothly over time, the global approach flags a very large number of vertices, while our local method does not.
advances in social networks analysis and mining | 2016
Anita Zakrzewska; David A. Bader
Dynamic graphs are used to represent changing relational data. In order to create a dynamic graph representing relationships or interactions over time, it is necessary to choose a method of adding new data and removing, or otherwise de-emphasizing, past data to decrease its influence. In particular, the question of aging edges is new to dynamic graphs and has not been thoroughly studied. In this work, we address the problem of aging vertices and edges to create a dynamic graph from a stream of temporal data. We provide two new methods, active vertex and active edge, and also evaluate two methods from the literature, sliding window and weight decay. By analyzing various properties of the dynamic graphs created by each aging method, we provide practitioners with quantitative comparisons. We find several interesting similarities and differences. The active vertex and weight decay methods reduce the variability over time of several vertex level measures compared to sliding window and active edge. This means that in practice, active vertex or weight decay may be more useful if graph stability is preferred, while sliding window or active edge may be preferred if the graph should be sensitive to changes in the underlying data stream. Each method also differently affects global measures. The most connected graph is produced by active vertex, while the most disconnected by weight decay. We observe that despite the differences, the graphs produced by each method experience similar types of changes at similar points in time.
international conference on parallel processing | 2015
Anita Zakrzewska; David A. Bader
Community detection, or graph clustering, is the problem of finding dense groups in a graph. This is important for a variety of applications, from social network analysis to biological interactions. While most work in community detection has focused on static graphs, real data is usually dynamic, changing over time. We present a new algorithm for dynamic community detection that incrementally updates clusters when the graph changes. The method is based on a greedy, modularity maximizing static approach and stores the history of merges in order to backtrack. On synthetic graph tests with known ground truth clusters, it can detect a variety of structural community changes for both small and large batches of edge updates.