Ryan N. Lichtenwalter
University of Notre Dame
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ryan N. Lichtenwalter.
knowledge discovery and data mining | 2010
Ryan N. Lichtenwalter; Jake T. Lussier; Nitesh V. Chawla
This paper examines important factors for link prediction in networks and provides a general, high-performance framework for the prediction task. Link prediction in sparse networks presents a significant challenge due to the inherent disproportion of links that can form to links that do form. Previous research has typically approached this as an unsupervised problem. While this is not the first work to explore supervised learning, many factors significant in influencing and guiding classification remain unexplored. In this paper, we consider these factors by first motivating the use of a supervised framework through a careful investigation of issues such as network observational period, generality of existing methods, variance reduction, topological causes and degrees of imbalance, and sampling approaches. We also present an effective flow-based predicting algorithm, offer formal bounds on imbalance in sparse network link prediction, and employ an evaluation method appropriate for the observed imbalance. Our careful consideration of the above issues ultimately leads to a completely general framework that outperforms unsupervised link prediction methods by more than 30% AUC.
advances in social networks analysis and mining | 2011
Darcy A. Davis; Ryan N. Lichtenwalter; Nitesh V. Chawla
Many important real-world systems, modeled naturally as complex networks, have heterogeneous interactions and complicated dependency structures. Link prediction in such networks must model the influences between heterogenous relationships and distinguish the formation mechanisms of each link type, a task which is beyond the simple topological features commonly used to score potential links. In this paper, we introduce a novel probabilistically weighted extension of the Adamic/Adar measure for heterogenous information networks, which we use to demonstrate the potential benefits of diverse evidence, particularly in cases where homogeneous relationships are very sparse. However, we also expose some fundamental flaws of traditional a priori link prediction. In accordance with previous research on homogeneous networks, we further demonstrate that a supervised approach to link prediction can enhance performance and is easily extended to the heterogeneous case. Finally, we present results on three diverse, real-world heterogeneous information networks and discuss the trends and tradeoffs of supervised and unsupervised link prediction in a multi-relational setting.
Social Network Analysis and Mining | 2013
Darcy A. Davis; Ryan N. Lichtenwalter; Nitesh V. Chawla
Many important real-world systems, modeled naturally as complex networks, have heterogeneous interactions and complicated dependency structures. Link prediction in such networks must model the influences between heterogenous relationships and distinguish the formation mechanisms of each link type, a task which is beyond the simple topological features commonly used to score potential links. In this paper, we introduce a novel probabilistically weighted extension of the Adamic/Adar measure for heterogenous information networks, which we use to demonstrate the potential benefits of diverse evidence, particularly in cases where homogeneous relationships are very sparse. However, we also expose some fundamental flaws of traditional unsupervised link prediction. We develop supervised learning approaches for relationship (link) prediction in multi-relational networks, and demonstrate that a supervised approach to link prediction can enhance performance. We present results on three diverse, real-world heterogeneous information networks and discuss the trends and tradeoffs of supervised and unsupervised link prediction in a multi-relational setting.
Knowledge and Information Systems | 2015
Yang Yang; Ryan N. Lichtenwalter; Nitesh V. Chawla
Link prediction is a popular research area with important applications in a variety of disciplines, including biology, social science, security, and medicine. The fundamental requirement of link prediction is the accurate and effective prediction of new links in networks. While there are many different methods proposed for link prediction, we argue that the practical performance potential of these methods is often unknown because of challenges in the evaluation of link prediction, which impact the reliability and reproducibility of results. We describe these challenges, provide theoretical proofs and empirical examples demonstrating how current methods lead to questionable conclusions, show how the fallacy of these conclusions is illuminated by methods we propose, and develop recommendations for consistent, standard, and applicable evaluation metrics. We also recommend the use of precision-recall threshold curves and associated areas in lieu of receiver operating characteristic curves due to complications that arise from extreme imbalance in the link prediction classification problem.
international world wide web conferences | 2012
Ryan N. Lichtenwalter; Nitesh V. Chawla
We introduce the concept of a vertex collocation profile (VCP) for the purpose of topological link analysis and prediction. VCPs provide nearly complete information about the surrounding local structure of embedded vertex pairs. The VCP approach offers a new tool for domain experts to understand the underlying growth mechanisms in their networks and to analyze link formation mechanisms in the appropriate sociological, biological, physical, or other context. The same resolution that gives VCP its analytical power also enables it to perform well when used in supervised models to discriminate potential new links. We first develop the theory, mathematics, and algorithms underlying VCPs. Then we demonstrate VCP methods performing link prediction competitively with unsupervised and supervised methods across several different network families. We conclude with timing results that introduce the comparative performance of several existing algorithms and the practicability of VCP computations on large networks.
knowledge discovery and data mining | 2009
Ryan N. Lichtenwalter; Nitesh V. Chawla
Streaming data is pervasive in a multitude of data mining applications. One fundamental problem in the task of mining streaming data is distributional drift over time. Streams may also exhibit high and varying degrees of class imbalance, which can further complicate the task. In scenarios like these, class imbalance is particularly difficult to overcome and has not been as thoroughly studied. In this paper, we comprehensively consider the issues of changing distributions in conjunction with high degrees of class imbalance in streaming data. We propose new approaches based on distributional divergence and meta-classification that improve several performance metrics often applied in the study of imbalanced classification. We also propose a new distance measure for detecting distributional drift and examine its utility in weighting ensemble base classifiers. We employ a sequential validation framework, which we believe is the most meaningful option in the context of streaming imbalanced data.
advances in social networks analysis and mining | 2011
Ryan N. Lichtenwalter; Nitesh V. Chawla
With the rise of network science as an exciting interdisciplinary research topic, efficient graph algorithms are in high demand. Problematically, many such algorithms measuring important properties of networks have asymptotic lower bounds that are quadratic, cubic, or higher in the number of vertices. For analysis of social networks, transportation networks, communication networks, and a host of others, computation is intractable. In these networks computation in serial fashion requires years or even decades. Fortunately, these same computational problems are often naturally parallel. We present here the design and implementation of a master-worker framework for easily computing such results in these circumstances. The user needs only to supply two small fragments of code describing the fundamental kernel of the computation. The framework automatically divides and distributes the workload and manages completion using an arbitrary number of heterogeneous computational resources. In practice, we have used thousands of machines and observed commensurate speedups. Writing only 31 lines of standard C++ code, we computed betweenness centrality on a network of 4.7M nodes in 25 hours.
SpringerPlus | 2014
Ryan N. Lichtenwalter; Nitesh V. Chawla
We describe the vertex collocation profile (VCP) concept. VCPs provide rich information about the surrounding local structure of embedded vertex pairs. VCP analysis offers a new tool for researchers and domain experts to understand the underlying growth mechanisms in their networks and to analyze link formation mechanisms in the appropriate sociological, biological, physical, or other context. The same resolution that gives the VCP method its analytical power also enables it to perform well when used to accomplish link prediction. We first develop the theory, mathematics, and algorithms underlying VCPs. We provide timing results to demonstrate that the algorithms scale well even for large networks. Then we demonstrate VCP methods performing link prediction competitively with unsupervised and supervised methods across different network families. Unlike many analytical tools, VCPs inherently generalize to multirelational data, which provides them with unique power in complex modeling tasks. To demonstrate this, we apply the VCP method to longitudinal networks by encoding temporally resolved information into different relations. In this way, the transitions between VCP elements represent temporal evolutionary patterns in the longitudinal network data. Results show that VCPs can use this additional data, typically challenging to employ, to improve predictive model accuracies. We conclude with our perspectives on the VCP method and its future in network science, particularly link prediction.
international parallel and distributed processing symposium | 2008
Nitesh V. Chawla; Douglas Thain; Ryan N. Lichtenwalter; David A. Cieslak
Both users and administrators of computing grids are presented with enormous challenges in debugging and troubleshooting. Diagnosing a problem with one application on one machine is hard enough, but diagnosing problems in workloads of millions of jobs running on thousands of machines is a problem of a new order of magnitude. Suppose that a user submits one million jobs to a grid, only to discover some time later that half of them have failed, Users of large scale systems need tools that describe the overall situation, indicating what problems are commonplace versus occasional, and which are deterministic versus random. Machine learning techniques can be used to debug these kinds of problems in large scale systems. We present a comprehensive framework from data to knowledge discovery as an important step towards achieving this vision.
Journal of intelligent systems | 2010
Ryan N. Lichtenwalter; Katerina Lichtenwalter; Nitesh V. Chawla
There exist several music composition systems that generate blues chord progressions, jazz improvisation, or classical pieces. Such systems often work by applying a set of rules explicitly provided to the system to determine what sequence of output values is appropriate. Others use pattern recognition and generation techniques such as Markov models. These systems often suffer from mediocre performance and limited generality. We propose a system that goes from raw musical data to feature vector representation to classification models. We employ sliding window sequential machine learning techniques to generate classifiers that correspond to a training set of musical data. Our approach has the advantages of greater generality than explicitly specified musical grammar rules and the potential to apply a wide variety of powerful existing nonsequential learning algorithms. We present the design and implementation of the composition system. We demonstrate the efficacy of the method, show and analyze successful samples of its output, and discuss ways in which it might be improved.