Is this you? Create Your Porfile

Subhajit Datta

Singapore University of Technology and Design

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Subhajit Datta is active.

Explore More

Publication

Featured researches published by Subhajit Datta.

Empirical Software Engineering | 2017

On negative results when using sentiment analysis tools for software engineering research

Robbert Jongeling; Proshanta Sarkar; Subhajit Datta; Alexander Serebrenik

Recent years have seen an increasing attention to social aspects of software engineering, including studies of emotions and sentiments experienced and expressed by the software developers. Most of these studies reuse existing sentiment analysis tools such as SentiStrength and NLTK. However, these tools have been trained on product reviews and movie reviews and, therefore, their results might not be applicable in the software engineering domain. In this paper we study whether the sentiment analysis tools agree with the sentiment recognized by human evaluators (as reported in an earlier study) as well as with each other. Furthermore, we evaluate the impact of the choice of a sentiment analysis tool on software engineering studies by conducting a simple study of differences in issue resolution times for positive, negative and neutral texts. We repeat the study for seven datasets (issue trackers and Stack Overflow questions) and different sentiment analysis tools and observe that the disagreement between the tools can lead to diverging conclusions. Finally, we perform two replications of previously published studies and observe that the results of those studies cannot be confirmed when a different sentiment analysis tool is used.

international conference on software maintenance | 2015

Choosing your weapons: On sentiment analysis tools for software engineering research

Robbert Jongeling; Subhajit Datta; Alexander Serebrenik

international conference on software engineering | 2014

Does latitude hurt while longitude kills? geographical and temporal separation in a large scale software development project

Patrick Wagstrom; Subhajit Datta

Distributed software development allows firms to leverage cost advantages and place work near centers of competency. This distribution comes at a cost -- distributed teams face challenges from differing cultures, skill levels, and a lack of shared working hours. In this paper we examine whether and how geographic and temporal separation in a large scale distributed software development influences developer interactions. We mine the work item trackers for a large commercial software project with a globally distributed development team. We examine both the time to respond and the propensity of individuals to respond and find that when taken together, geographic distance has little effect, while temporal separation has a significant negative impact on the time to respond. However, both have little impact on the social network of individuals in the organization. These results suggest that while temporally distributed teams do communicate, it is at a slower rate, and firms may wish to locate partner teams in similar time zones for maximal performance.

IEEE Transactions on Big Data | 2016

How Long Will This Live? Discovering the Lifespans of Software Engineering Ideas

Subhajit Datta; Santonu Sarkar; A. S. M. Sajeev

We all want to be associated with long lasting ideas; as originators, or at least, expositors. For a tyro researcher or a seasoned veteran, knowing how long an idea will remain interesting in the community is critical in choosing and pursuing research threads. In the physical sciences, the notion of half-life is often evoked to quantify decaying intensity. In this paper, we study a corpus of 19,000+ papers written by 21,000+ authors across 16 software engineering publication venues from 1975 to 2010, to empirically determine the half-life of software engineering research topics. In the absence of any consistent and well-accepted methodology for associating research topics to a publication, we have used natural language processing techniques to semi-automatically identify and associate a set of topics with a paper. We adapted measures of half-life already existing in the bibliometric context for our study, and also defined a new measure based on publication and citation counts. We find evidence that some of the identified research topics show a mean half-life of close to 15 years, and there are topics with sustaining interest in the community. We report the methodology of our study in this paper, as well as the implications and utility of our results.

international world wide web conferences | 2015

Discovering the Rise and Fall of Software Engineering Ideas from Scholarly Publication Data

Subhajit Datta; Santonu Sarkar; A.S.M. Sajeev; Nishant Kumar

For researchers and practitioners of a relatively young discipline like software engineering, an enduring concern is to identify the acorns that will grow into oaks -- ideas remaining most current in the long run. Additionally, it is interesting to know how the ideas have risen in importance, and fallen, perhaps to rise again. We analyzed a corpus of 19,000+ papers written by 21,000+ authors across 16 software engineering publication venues from 1975 to 2010, to empirically determine the half-life of software engineering research topics. We adapted existing measures of half-life as well as defined a specific measure based on publication and citation counts. The results from this empirical study are a presented in this paper.

bangalore annual compute conference | 2013

How many researchers does it take to make impact?: mining software engineering publication data for collaboration insights

Subhajit Datta; Santonu Sarkar; A. S. M. Sajeev; Nishant Kumar

In the three and half decades since the inception of organized research publication in software engineering, the discipline has gained a significant maturity. This journey to maturity has been guided by the synergy of ideas, individuals and interactions. In this journey software engineering has evolved into an increasingly empirical discipline. Empirical sciences involve significant collaboration, leading to large teams working on research problems. In this paper we analyze a corpus of 19,000+ papers, written by 21,000+ authors from 16 publication venues between 1975 to 2010, to understand what is the ideal team size that has produced maximum impact in software engineering research, and whether researchers in software engineering have maintained the same co-authorship relations over long periods of time as a means of achieving research impact.

Empirical Software Engineering | 2018

How does developer interaction relate to software quality? an examination of product development data

Subhajit Datta

Industrial software systems are being increasingly developed by large and distributed teams. Tools like collaborative development environments (CDE) are used to facilitate interaction between members of such teams, with the expectation that social factors around the interaction would facilitate team functioning. In this paper, we first identify typically social characteristics of interaction in a software development team: reachability, connection, association, and clustering. We then examine how these factors relate to the quality of software produced by a team, in terms of the number of defects, through an empirical study of 70+ teams, involving 900+ developers in total, spread across 30+ locations and 19 time-zones, working on 40,000+ units of work in the multi-version development of a major industrial product, spreading across more than five years. After controlling for known factors affecting large scale distributed development such as dependency, system age, developer expertise and experience, geographic dispersion, socio-technical congruence, and the number of files changed, we find statistically significant effects of connection and clustering on software quality. Higher levels of intra-team connection are found to relate to higher defect count, whereas more clustering relates to fewer defects. We examine the implications of these results for individual developers, project managers, and organizations.

international world wide web conferences | 2017

Predicting the Impact of Software Engineering Topics: An Empirical Study

Santonu Sarkar; Rumana Lakdawala; Subhajit Datta

Predicting the future is hard, more so in active research areas. In this paper, we customize an established model for citation prediction of research papers and apply it on research topics. We argue that research topics, rather than individual publications, have wider relevance in the research ecosystem, for individuals as well as organizations. In this study, topics are extracted from a corpus of software engineering publications covering 55,000+ papers written by more than 70,000 authors across 56 publication venues, over a span of 38 years, using natural language processing techniques. We demonstrate how critical aspects of the original paper-based prediction model are valid for a topic-based approach. Our results indicate the customized model is able to predict citations for many of the topics considered in our study with reasonably high accuracy. Insights from these results indicate the promise of citation of prediction of research topics, and its utility for individual researchers, as well as research groups.

IEEE Transactions on Big Data | 2017

The Habits of Highly Effective Researchers: An Empirical Study

Subhajit Datta; Partha Basuchowdhuri; Surajit Acharya; Subhashis Majumder

Interest in the habits of influential individuals cuts across domains. As researchers, we are intrigued why few attain significant eminence in their fields, whereas many operate in obscurity. An empirical examination of this question has been made possible by the recent availability of large scale publication data. In this paper, we use information from the AMiner Paper Citation and Author Collaboration Networks to discern factors that relate to the impact of influential researchers across five domains in the computing discipline. We propose and apply a novel algorithm to identify influential vertices in co-authorship networks built from total corpora of 1,00,000+ papers and 72,000+ authors over a span of more than 50 years. The results from our study indicate that the impact of these influential researchers relate to a variety of factors. Surprisingly, we find evidence across the domains that higher impact is associated with lower levels of collaboration, and authority.

Archive | 2016

An Ecological Model for Digital Platforms Maintenance and Evolution

Paolo Rocchi; Paolo Spagnoletti; Subhajit Datta

The maintenance of software products has been studied extensively in both software engineering and management information systems. Such studies are mainly focused on the activities that take place prior to starting the maintenance phase. Their contribution is either related to the improvement of software quality or to validating contingency models for reducing maintenance efforts. The continuous maintenance philosophy suggests to shift the attention within the maintenance phase for better coping with the evolutionary trajectories of digital platforms. In this paper, we examine the maintenance process of a digital platform from the perspective of the software vendor. Based on our empirical observations, we derive an interesting statistical relationship that has strong theoretical and practical implications in the study of software defects.

Explore More