Rumi Ghosh
Bosch
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rumi Ghosh.
mining and learning with graphs | 2010
Kristina Lerman; Rumi Ghosh; Jeon-Hyung Kang
Centrality is an important notion in network analysis and is used to measure the degree to which network structure contributes to the importance of a node in a network. While many different centrality measures exist, most of them apply to static networks. Most networks, on the other hand, are dynamic in nature, evolving over time through the addition or deletion of nodes and edges. A popular approach to analyzing such networks represents them by a static network that aggregates all edges observed over some time period. This approach, however, under or overestimates centrality of some nodes. We address this problem by introducing a novel centrality metric for dynamic network analysis. This metric exploits an intuition that in order for one node in a dynamic network to influence another over some period of time, there must exist a path that connects the source and destination nodes through intermediaries at different times. We demonstrate on an example network that the proposed metric leads to a very different ranking than analysis of an equivalent static network. We use dynamic centrality to study a dynamic citations network and contrast results to those reached by static network analysis.
social network mining and analysis | 2008
Rumi Ghosh; Kristina Lerman
The growing popularity of online social networks gave researchers access to large amount of network data and renewed interest in methods for automatic community detection. Existing algorithms, including the popular modularity-optimization methods, look for regions of the network that are better connected internally, e.g., have higher than expected number of edges within them. We believe, however, that edges do not give the true measure of network connectivity. Instead, we argue that influence, which we define as the number of paths, of any length, that exist between two nodes, gives a better measure of network connectivity. We use the influence metric to partition a network into groups or communities by looking for regions of the network where nodes have more influence over each other than over nodes outside the community. We evaluate our approach on several networks and show that it often outperforms the edge-based modularity algorithm.
Journal of Informetrics | 2015
Pietro Della Briotta Parolo; Raj Kumar Pan; Rumi Ghosh; Bernardo A. Huberman; Kimmo Kaski; Santo Fortunato
The exponential growth in the number of scientific papers makes it increasingly difficult for researchers to keep track of all the publications relevant to their work. Consequently, the attention that can be devoted to individual papers, measured by their citation counts, is bound to decay rapidly. In this work we make a thorough study of the life-cycle of papers in different disciplines. Typically, the citation rate of a paper increases up to a few years after its publication, reaches a peak and then decreases rapidly. This decay can be described by an exponential or a power law behavior, as in ultradiffusive processes, with exponential fitting better than power law for the majority of cases. The decay is also becoming faster over the years, signaling that nowadays papers are forgotten more quickly. However, when time is counted in terms of the number of published papers, the rate of decay of citations is fairly independent of the period considered. This indicates that the attention of scholars depends on the number of published items, and not on real time.
web search and data mining | 2011
Rumi Ghosh; Kristina Lerman
How does information flow in online social networks? How does the structure and size of the information cascade evolve in time? How can we efficiently mine the information contained in cascade dynamics? We approach these questions empirically and present an efficient and scalable mathematical framework for quantitative analysis of cascades on networks. We define a cascade generating function that captures the details of the microscopic dynamics of the cascades. We show that this function can also be used to compute the macroscopic properties of cascades, such as their size, spread, diameter, number of paths, and average path length. We present an algorithm to efficiently compute cascade generating function and demonstrate that while significantly compressing information within a cascade, it nevertheless allows us to accurately reconstruct its structure. We use this framework to study information dynamics on the social network of Digg. Digg allows users to post and vote on stories, and easily see the stories that friends have voted on. As a story spreads on Digg through voting, it generates cascades. We extract cascades of more than 3,500 Digg stories and calculate their macroscopic and microscopic properties. We identify several trends in cascade dynamics: spreading via chaining, branching and community. We discuss how these affect the spread of the story through the Digg social network. Our computational framework is general and offers a practical solution to quantitative analysis of the microscopic structure of even very large cascades.
Physical Review E | 2011
Rumi Ghosh; Kristina Lerman
A variety of metrics have been proposed to measure the relative importance of nodes in a network. One of these, alpha-centrality [P. Bonacich, Am. J. Sociol. 92, 1170 (1987)], measures the number of attenuated paths that exist between nodes. We introduce a normalized version of this metric and use it to study network structure, for example, to rank nodes and find community structure of the network. Specifically, we extend the modularity-maximization method for community detection to use this metric as the measure of node connectivity. Normalized alpha-centrality is a powerful tool for network analysis, since it contains a tunable parameter that sets the length scale of interactions. Studying how rankings and discovered communities change when this parameter is varied allows us to identify locally and globally important nodes and structures. We apply the proposed metric to several benchmark networks and show that it leads to better insights into network structure than alternative metrics.
knowledge discovery and data mining | 2014
Rumi Ghosh; Shang-Hua Teng; Kristina Lerman; Xiaoran Yan
We study the interplay between a dynamic process and the structure of the network on which it is defined. Specifically, we examine the impact of this interaction on the quality-measure of network clusters and node centrality. This enables us to effectively identify network communities and important nodes participating in the dynamics. As the first step towards this objective, we introduce an umbrella framework for defining and characterizing an ensemble of dynamic processes on a network. This framework generalizes the traditional Laplacian framework to continuous-time biased random walks and also allows us to model some epidemic processes over a network. For each dynamic process in our framework, we can define a function that measures the quality of every subset of nodes as a potential cluster (or community) with respect to this process on a given network. This subset-quality function generalizes the traditional conductance measure for graph partitioning. We partially justify our choice of the quality function by showing that the classic Cheegers inequality, which relates the conductance of the best cluster in a network with a spectral quantity of its Laplacian matrix, can be extended from the Laplacian-conductance setting to this more general setting.
computational science and engineering | 2009
Rumi Ghosh; Kristina Lerman
Heterogeneous networks play a key role in the evolution of communities and the decisions individuals make. These networks link different types of entities, for example, people and the events they attend. Network analysis algorithms usually project such networks unto simple graphs composed of entities of a single type. In the process, they conflate relations between entities of different types and loose important structural information.We develop a mathematical framework that can be used to compactly represent and analyze heterogeneous networks that combine multiple entity and link types.We generalize Bonacich centrality, which measures connectivity between nodes by the number of paths between them, to heterogeneous networks and use this measure to study network structure. Specifically, we extend the popular modularity-maximization method for community detection to use this centrality metric. We also rank nodes based on their connectivity to other nodes. One advantage of this centrality metric is that it has a tunable parameter we can use to set the length scale of interactions. By studying how rankings change with this parameter allows us to identify important nodes in the network.We apply the proposed method to analyze the structure of several heterogeneous networks. We show that exploiting additional sources of evidence corresponding to links between, as well as among, different entity types yields new insights into network structure.
Physical Review E | 2012
Kristina Lerman; Rumi Ghosh
Network structure is a product of both its topology and interactions between its nodes. We explore this claim using the paradigm of distributed synchronization in a network of coupled oscillators. As the network evolves to a global steady state, nodes synchronize in stages, revealing the networks underlying community structure. Traditional models of synchronization assume that interactions between nodes are mediated by a conservative process similar to diffusion. However, social and biological processes are often nonconservative. We propose a model of synchronization in a network of oscillators coupled via nonconservative processes. We study the dynamics of synchronization of a synthetic and real-world networks and show that the traditional and nonconservative models of synchronization reveal different structures within the same network.
Physical Review E | 2013
Laura M. Smith; Kristina Lerman; Cristina Garcia-Cardona; Allon G. Percus; Rumi Ghosh
Spectral clustering is widely used to partition graphs into distinct modules or communities. Existing methods for spectral clustering use the eigenvalues and eigenvectors of the graph Laplacian, an operator that is closely associated with random walks on graphs. We propose a spectral partitioning method that exploits the properties of epidemic diffusion. An epidemic is a dynamic process that, unlike the random walk, simultaneously transitions to all the neighbors of a given node. We show that the replicator, an operator describing epidemic diffusion, is equivalent to the symmetric normalized Laplacian of a reweighted graph with edges reweighted by the eigenvector centralities of their incident nodes. Thus, more weight is given to edges connecting more central nodes. We describe a method that partitions the nodes based on the componentwise ratio of the replicators second eigenvector to the first and compare its performance to traditional spectral clustering techniques on synthetic graphs with known community structure. We demonstrate that the replicator gives preference to dense, clique-like structures, enabling it to more effectively discover communities that may be obscured by dense intercommunity linking.
IEEE Intelligent Systems | 2017
Prasanth Lade; Rumi Ghosh; Soundar Srinivasan
Over the last two decades, manufacturing across the globe has evolved to be more intel-ligent and data driven. In the age of industrial Internet of Things, a smart production unit can be perceived as a large connected industrial system of materials, parts, machines, tools, inventory, and logistics that can relay data and communicate with each other. While, traditionally, the focus has been on machine health and predictive maintenance, the manufacturing industry has also started focusing on analyzing data from the entire production line. These applications bring a new set of analytics challenges. Unlike tradi-tional data mining analysis, which consists of lean datasets (that is, datasets with few fea-tures), manufacturing has fat datasets. In addition, previous approaches to manufacturing analytics restricted themselves to small time periods of data. The latest advances in big data analytics allows researchers to do a deep dive into years of data. Bosch collects and utilizes all available information about its products to increase its understanding of complex linear and nonlinear relationships between parts, machines, and assembly lines. This helps in use cases such as the discovery of the root cause of internal defects. This article presents a case study and provides detail about challenges and approaches in data extraction, modeling, and visualization.