Didi Surian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Didi Surian is active.

Explore More

Publication

Featured researches published by Didi Surian.

working conference on reverse engineering | 2010

Mining Collaboration Patterns from a Large Developer Network

Didi Surian; David Lo; Ee-Peng Lim

In this study, we extract patterns from a large developer collaborations network extracted from Source Forge. Net at high and low level of details. At the high level of details, we extract various network-level statistics from the network. At the low level of details, we extract topological sub-graph patterns that are frequently seen among collaborating developers. Extracting sub graph patterns from large graphs is a hard NP-complete problem. To address this challenge, we employ a novel combination of graph mining and graph matching by leveraging network-level properties of a developer network. With the approach, we successfully analyze a snapshot of Source Forge. Net data taken on September 2009. We present mined patterns and describe interesting observations.

working conference on reverse engineering | 2011

Recommending People in Developers' Collaboration Network

Didi Surian; Nian Liu; David Lo; Hanghang Tong; Ee-Peng Lim; Christos Faloutsos

Many software developments involve collaborations of developers across the globe. This is true for both open-source and closed-source development efforts. Developers collaborate on different projects of various types. As with any other teamwork endeavors, finding compatibility among members in a development team is helpful towards the realization of the teams goal. Compatible members tend to share similar programming style and naming strategy, communicate well with one another, etc. However, finding the right person to work with is not an easy task. In this work, we extract information available from Source forge. Net, the largest database of open source software, and build developer collaboration network comprising of information on developers, projects, and project properties. Based on an input developer, we then recommend a list of top developers that are most compatible based on their programming language skills, past projects and project categories they have worked on before, via a random walk with restart procedure. Our quantitative and qualitative experiments show that we are able to recommend reasonable developer candidates from snapshots of Source forge. Net consisting of tens of thousands of developers and projects, and hundreds of project properties.

Journal of Medical Internet Research | 2016

Characterizing Twitter Discussions About HPV Vaccines Using Topic Modeling and Community Detection

Didi Surian; Dat Quoc Nguyen; Georgina Kennedy; Mark Johnson; Enrico Coiera; Adam G. Dunn

Background In public health surveillance, measuring how information enters and spreads through online communities may help us understand geographical variation in decision making associated with poor health outcomes. Objective Our aim was to evaluate the use of community structure and topic modeling methods as a process for characterizing the clustering of opinions about human papillomavirus (HPV) vaccines on Twitter. Methods The study examined Twitter posts (tweets) collected between October 2013 and October 2015 about HPV vaccines. We tested Latent Dirichlet Allocation and Dirichlet Multinomial Mixture (DMM) models for inferring topics associated with tweets, and community agglomeration (Louvain) and the encoding of random walks (Infomap) methods to detect community structure of the users from their social connections. We examined the alignment between community structure and topics using several common clustering alignment measures and introduced a statistical measure of alignment based on the concentration of specific topics within a small number of communities. Visualizations of the topics and the alignment between topics and communities are presented to support the interpretation of the results in context of public health communication and identification of communities at risk of rejecting the safety and efficacy of HPV vaccines. Results We analyzed 285,417 Twitter posts (tweets) about HPV vaccines from 101,519 users connected by 4,387,524 social connections. Examining the alignment between the community structure and the topics of tweets, the results indicated that the Louvain community detection algorithm together with DMM produced consistently higher alignment values and that alignments were generally higher when the number of topics was lower. After applying the Louvain method and DMM with 30 topics and grouping semantically similar topics in a hierarchy, we characterized 163,148 (57.16%) tweets as evidence and advocacy, and 6244 (2.19%) tweets describing personal experiences. Among the 4548 users who posted experiential tweets, 3449 users (75.84%) were found in communities where the majority of tweets were about evidence and advocacy. Conclusions The use of community detection in concert with topic modeling appears to be a useful way to characterize Twitter communities for the purpose of opinion surveillance in public health applications. Our approach may help identify online communities at risk of being influenced by negative opinions about public health interventions such as HPV vaccines.

conference on information and knowledge management | 2011

Mining direct antagonistic communities in explicit trust networks

David Lo; Didi Surian; Kuan Zhang; Ee-Peng Lim

There has been a recent increase of interest in analyzing trust and friendship networks to gain insights about relationship dynamics among users. Many sites such as Epinions, Facebook, and other social networking sites allow users to declare trusts or friendships between different members of the community. In this work, we are interested in extracting direct antagonistic communities (DACs) within a rich trust network involving trusts and distrusts. Each DAC is formed by two subcommunities with trust relationships among members of each sub-community but distrust relationships across the sub-communities. We develop an efficient algorithm that could analyze large trust networks leveraging the unique property of direct antagonistic community. We have experimented with synthetic and real data-sets (myGamma and Epinions) to demonstrate the scalability of our proposed solution.

Vaccine | 2017

Mapping information exposure on social media to explain differences in HPV vaccine coverage in the United States

Adam G. Dunn; Didi Surian; Julie Leask; Aditi Dey; Kenneth D. Mandl; Enrico Coiera

BACKGROUND Together with access, acceptance of vaccines affects human papillomavirus (HPV) vaccine coverage, yet little is known about medias role. Our aim was to determine whether measures of information exposure derived from Twitter could be used to explain differences in coverage in the United States. METHODS We conducted an analysis of exposure to information about HPV vaccines on Twitter, derived from 273.8 million exposures to 258,418 tweets posted between 1 October 2013 and 30 October 2015. Tweets were classified by topic using machine learning methods. Proportional exposure to each topic was used to construct multivariable models for predicting state-level HPV vaccine coverage, and compared to multivariable models constructed using socioeconomic factors: poverty, education, and insurance. Outcome measures included correlations between coverage and the individual topics and socioeconomic factors; and differences in the predictive performance of the multivariable models. RESULTS Topics corresponding to media controversies were most closely correlated with coverage (both positively and negatively); education and insurance were highest among socioeconomic indicators. Measures of information exposure explained 68% of the variance in one dose 2015 HPV vaccine coverage in females (males: 63%). In comparison, models based on socioeconomic factors explained 42% of the variance in females (males: 40%). CONCLUSIONS Measures of information exposure derived from Twitter explained differences in coverage that were not explained by socioeconomic factors. Vaccine coverage was lower in states where safety concerns, misinformation, and conspiracies made up higher proportions of exposures, suggesting that negative representations of vaccines in the media may reflect or influence vaccine acceptance.

Information Processing and Management | 2013

Mining direct antagonistic communities in signed social networks

David Lo; Didi Surian; Philips Kokoh Prasetyo; Kuan Zhang; Ee-Peng Lim

Social networks provide a wealth of data to study relationship dynamics among people. Most social networks such as Epinions and Facebook allow users to declare trusts or friendships with other users. Some of them also allow users to declare distrusts or negative relationships. When both positive and negative links co-exist in a network, some interesting community structures can be studied. In this work, we mine Direct Antagonistic Communities (DACs) within such signed networks. Each DAC consists of two sub-communities with positive relationships among members of each sub-community, and negative relationships among members of the other sub-community. Identifying direct antagonistic communities is an important step to understand the nature of the formation, dissolution, and evolution of such communities. Knowledge about antagonistic communities allows us to better understand and explain behaviors of users in the communities. Identifying DACs from a large signed network is however challenging as various combinations of user sets, which is very large in number, need to be checked. We propose an efficient data mining solution that leverages the properties of DACs, and combines the identification of strongest connected components and bi-clique mining. We have experimented our approach on synthetic, myGamma, and Epinions datasets to showcase the efficiency and utility of our proposed approach. We show that we can mine DACs in less than 15min from a signed network of myGamma, which is a mobile social networking site, consisting of 600,000 members and 8million links. An investigation on the behavior of users participating in DACs shows that antagonism significantly affects the way people behave and interact with one another.

conference on software maintenance and reengineering | 2013

Predicting Project Outcome Leveraging Socio-Technical Network Patterns

Didi Surian; Yuan Tian; David Lo; Hong Cheng; Ee-Peng Lim

There are many software projects started daily, some are successful, while others are not. Successful projects get completed, are used by many people, and bring benefits to users. Failed projects do not bring similar benefits. In this work, we are interested in developing an effective machine learning solution that predicts project outcome (i.e., success or failures) from developer socio-technical network. To do so, we investigate successful and failed projects to find factors that differentiate the two. We analyze the socio-technical aspect of the software development process by focusing at the people that contribute to these projects and the interactions among them. We first form a collaboration graph for each software project. We then create a training set consisting of two graph databases corresponding to successful and failed projects respectively. A new data mining approach is then employed to extract discriminative rich patterns that appear frequently on the successful projects but rarely on the failed projects. We find that these automatically mined patterns are effective features to predict project outcomes. We experiment our solution on projects in Source Forge. Net, the largest open source software development portal, and show that under 10 fold cross validation, our approach could achieve an accuracy of more than 90% and an AUC score of 0.86. We also present and analyze some mined socio-technical patterns.

knowledge discovery and data mining | 2013

Latent outlier detection and the low precision problem

Fei Wang; Sanjay Chawla; Didi Surian

The identification of outliers is an intrinsic component of knowledge discovery. However, most outlier detection techniques operate in the observational space, which is often associated with information redundancy and noise. Also, due to the usually high dimensionality of the observational space, the anomalies detected are difficult to comprehend. In this paper we claim that algorithms for discovery of outliers in a latent space will not only lead to more accurate results but potentially provide a natural medium to explain and describe outliers. Specifically, we propose combining Non-Negative Matrix Factorization (NMF) with subspace analysis to discover and interpret outliers. We report on preliminary work towards such an approach.

siam international conference on data mining | 2015

Cross-modal retrieval: A pairwise classification approach

Aditya Krishna Menon; Didi Surian; Sanjay Chawla

Content is increasingly available in multiple modalities (such as images, text, and video), each of which provides a different representation of some entity. The cross-modal retrieval problem is: given the representation of an entity in one modality, find its best representation in all other modalities. We propose a novel approach to this problem based on pairwise classification. The approach seamlessly applies to both the settings where ground-truth annotations for the entities are absent and present. In the former case, the approach considers both positive and unlabelled links that arise in standard cross-modal retrieval datasets. Empirical comparisons show improvements over state-of-theart methods for cross-modal retrieval.

canadian conference on artificial intelligence | 2013

A Causal Approach for Mining Interesting Anomalies

Sakshi Babbar; Didi Surian; Sanjay Chawla

We propose a novel approach which combines the use of Bayesian network and probabilistic association rules to discover and explain anomalies in data. The Bayesian network allows us to organize information in order to capture both correlation and causality in the feature space, while the probabilistic association rules have a structure similar to association mining rules. In particular, we focus on two types of rules: (i) low support & high confidence and, (ii) high support & low confidence. New data points which satisfy either one of the two rules conditioned on the Bayesian network are the candidate anomalies. We perform extensive experiments on well-known benchmark data sets and demonstrate that our approach is able to identify anomalies in high precision and recall. Moreover, our approach can be used to discover contextual information from the mined anomalies, which other techniques often fail to do so.

Explore More