Featured Researches

Social And Information Networks

A Hidden Challenge of Link Prediction: Which Pairs to Check?

The traditional setup of link prediction in networks assumes that a test set of node pairs, which is usually balanced, is available over which to predict the presence of links. However, in practice, there is no test set: the ground-truth is not known, so the number of possible pairs to predict over is quadratic in the number of nodes in the graph. Moreover, because graphs are sparse, most of these possible pairs will not be links. Thus, link prediction methods, which often rely on proximity-preserving embeddings or heuristic notions of node similarity, face a vast search space, with many pairs that are in close proximity, but that should not be linked. To mitigate this issue, we introduce LinkWaldo, a framework for choosing from this quadratic, massively-skewed search space of node pairs, a concise set of candidate pairs that, in addition to being in close proximity, also structurally resemble the observed edges. This allows it to ignore some high-proximity but low-resemblance pairs, and also identify high-resemblance, lower-proximity pairs. Our framework is built on a model that theoretically combines Stochastic Block Models (SBMs) with node proximity models. The block structure of the SBM maps out where in the search space new links are expected to fall, and the proximity identifies the most plausible links within these blocks, using locality sensitive hashing to avoid expensive exhaustive search. LinkWaldo can use any node representation learning or heuristic definition of proximity, and can generate candidate pairs for any link prediction method, allowing the representation power of current and future methods to be realized for link prediction in practice. We evaluate LinkWaldo on 13 networks across multiple domains, and show that on average it returns candidate sets containing 7-33% more missing and future links than both embedding-based and heuristic baselines' sets.

Read more
Social And Information Networks

A Large-Scale Study of the Twitter Follower Network to Characterize the Spread of Prescription Drug Abuse Tweets

In this article, we perform a large-scale study of the Twitter follower network, involving around 0.42 million users who justify DA, to characterize the spreading of DA tweets across the network. Our observations reveal the existence of a very large giant component involving 99% of these users with dense local connectivity that facilitates the spreading of such messages. We further identify active cascades over the network and observe that the cascades of DA tweets get spread over a long distance through the engagement of several closely connected groups of users. Moreover, our observations also reveal a collective phenomenon, involving a large set of active fringe nodes (with a small number of follower and following) along with a small set of well-connected nonfringe nodes that work together toward such spread, thus potentially complicating the process of arresting such cascades. Furthermore, we discovered that the engagement of the users with respect to certain drugs, such as Vicodin, Percocet, and OxyContin, that were observed to be most mentioned in Twitter is instantaneous. On the other hand, for drugs, such as Lortab, that found lesser mentions, the engagement probability becomes high with increasing exposure to such tweets, thereby indicating that drug abusers engaged on Twitter remain vulnerable to adopting newer drugs, aggravating the problem further.

Read more
Social And Information Networks

A Latent Space Model for Multilayer Network Data

In this work, we propose a Bayesian statistical model to simultaneously characterize two or more social networks defined over a common set of actors. The key feature of the model is a hierarchical prior distribution that allows us to represent the entire system jointly, achieving a compromise between dependent and independent networks. Among others things, such a specification easily allows us to visualize multilayer network data in a low-dimensional Euclidean space, generate a weighted network that reflects the consensus affinity between actors, establish a measure of correlation between networks, assess cognitive judgements that subjects form about the relationships among actors, and perform clustering tasks at different social instances. Our model's capabilities are illustrated using several real-world data sets, taking into account different types of actors, sizes, and relations.

Read more
Social And Information Networks

A Longitudinal Analysis of a Social Network of Intellectual History

The history of intellectuals consists of a complicated web of influences and interconnections of philosophers, scientists, writers, their work, and ideas. How did these influences evolve over time? Who were the most influential scholars in a period? To answer these questions, we mined a network of influence of over 12,500 intellectuals, extracted from the Linked Open Data provider YAGO. We enriched this network with a longitudinal perspective, and analysed time-sliced projections of the complete network differentiating between within-era, inter-era, and accumulated-era networks. We thus identified various patterns of intellectuals and eras, and studied their development in time. We show which scholars were most influential in different eras, and who took prominent knowledge broker roles. One essential finding is that the highest impact of an era's scholar was on their contemporaries, as well as the inter-era influence of each period was strongest to its consecutive one. Further, we see quantitative evidence that there was no re-discovery of Antiquity during the Renaissance, but a continuous reception since the Middle Ages.

Read more
Social And Information Networks

A Machine Learning Approach to Predicting Continuous Tie Strengths

Relationships between people constantly evolve, altering interpersonal behavior and defining social groups. Relationships between nodes in social networks can be represented by a tie strength, often empirically assessed using surveys. While this is effective for taking static snapshots of relationships, such methods are difficult to scale to dynamic networks. In this paper, we propose a system that allows for the continuous approximation of relationships as they evolve over time. We evaluate this system using the NetSense study, which provides comprehensive communication records of students at the University of Notre Dame over the course of four years. These records are complemented by semesterly ego network surveys, which provide discrete samples over time of each participant's true social tie strength with others. We develop a pair of powerful machine learning models (complemented by a suite of baselines extracted from past works) that learn from these surveys to interpret the communications records as signals. These signals represent dynamic tie strengths, accurately recording the evolution of relationships between the individuals in our social networks. With these evolving tie values, we are able to make several empirically derived observations which we compare to past works.

Read more
Social And Information Networks

A Multi-Semantic Metapath Model for Large Scale Heterogeneous Network Representation Learning

Network Embedding has been widely studied to model and manage data in a variety of real-world applications. However, most existing works focus on networks with single-typed nodes or edges, with limited consideration of unbalanced distributions of nodes and edges. In real-world applications, networks usually consist of billions of various types of nodes and edges with abundant attributes. To tackle these challenges, in this paper we propose a multi-semantic metapath (MSM) model for large scale heterogeneous representation learning. Specifically, we generate multi-semantic metapath-based random walks to construct the heterogeneous neighborhood to handle the unbalanced distributions and propose a unified framework for the embedding learning. We conduct systematical evaluations for the proposed framework on two challenging datasets: Amazon and Alibaba. The results empirically demonstrate that MSM can achieve relatively significant gains over previous state-of-arts on link prediction.

Read more
Social And Information Networks

A Multitask Deep Learning Approach for User Depression Detection on Sina Weibo

In recent years, due to the mental burden of depression, the number of people who endanger their lives has been increasing rapidly. The online social network (OSN) provides researchers with another perspective for detecting individuals suffering from depression. However, existing studies of depression detection based on machine learning still leave relatively low classification performance, suggesting that there is significant improvement potential for improvement in their feature engineering. In this paper, we manually build a large dataset on Sina Weibo (a leading OSN with the largest number of active users in the Chinese community), namely Weibo User Depression Detection Dataset (WU3D). It includes more than 20,000 normal users and more than 10,000 depressed users, both of which are manually labeled and rechecked by professionals. By analyzing the user's text, social behavior, and posted pictures, ten statistical features are concluded and proposed. In the meantime, text-based word features are extracted using the popular pretrained model XLNet. Moreover, a novel deep neural network classification model, i.e. FusionNet (FN), is proposed and simultaneously trained with the above-extracted features, which are seen as multiple classification tasks. The experimental results show that FusionNet achieves the highest F1-Score of 0.9772 on the test dataset. Compared to existing studies, our proposed method has better classification performance and robustness for unbalanced training samples. Our work also provides a new way to detect depression on other OSN platforms.

Read more
Social And Information Networks

A Network Based Approach to Characterize Twenty-First-Century Populism in Colombia

Populism is a political phenomenon of democratic illiberalism centered on the figure of a strong leader. By modeling person/node connections of prominent figures of the recent Colombian political landscape we map, quantify, and analyze the position and influence of Alvaro Uribe as a populist leader. We found that Uribe is a central hub in the political alliances networks, cutting through traditional party alliances, . but is not the most central figure in the state machinery. The article first presents the framing of the problem, followed by the historical context of the case in study, the methodology employed and data collection, analysis, conclusions and further research paths. This study has implications for offering a new way of applying quantitative methods to the studies of populist regimes

Read more
Social And Information Networks

A Privacy-Preserving Architecture for the Protection of Adolescents in Online Social Networks

Online social networks (OSN) constitute an integral part of people's every day social activity. Specifically, mainstream OSNs such as Twitter, YouTube, and Facebook are especially prominent in adolescents' lives for communicating with other people online, expressing and entertain themselves, and finding information. However, adolescents face a significant number of threats when using online platforms. Some of these threats include aggressive behavior and cyberbullying, sexual grooming, false news and fake activity, radicalization, and exposure of personal information and sensitive content. There is a pressing need for parental control tools and Internet content filtering techniques to protect the vulnerable groups that use online platforms. Existing parental control tools occasionally violate the privacy of adolescents, leading them to use other communication channels to avoid moderation. In this work, we design and implement a user-centric Family Advice Suite with Guardian Avatars aiming at preserving the privacy of the individuals towards their custodians and towards the advice tool itself. Moreover, we present a systematic process for designing and developing state of the art techniques and a system architecture to prevent minors' exposure to numerous risks and dangers while using Facebook, Twitter, and YouTube on a browser.

Read more
Social And Information Networks

A Query-Driven System for Discovering Interesting Subgraphs in Social Media

Social media data are often modeled as heterogeneous graphs with multiple types of nodes and edges. We present a discovery algorithm that first chooses a "background" graph based on a user's analytical interest and then automatically discovers subgraphs that are structurally and content-wise distinctly different from the background graph. The technique combines the notion of a \texttt{group-by} operation on a graph and the notion of subjective interestingness, resulting in an automated discovery of interesting subgraphs. Our experiments on a socio-political database show the effectiveness of our technique.

Read more

Ready to get started?

Join us today