Featured Researches

Software Engineering

"How Was Your Weekend?" Software Development Teams Working From Home During COVID-19

The mass shift to working at home during the COVID-19 pandemic radically changed the way many software development teams collaborate and communicate. To investigate how team culture and team productivity may also have been affected, we conducted two surveys at a large software company. The first, an exploratory survey during the early months of the pandemic with 2,265 developer responses, revealed that many developers faced challenges reaching milestones and that their team productivity had changed. We also found through qualitative analysis that important team culture factors such as communication and social connection had been affected. For example, the simple phrase "How was your weekend?" had become a subtle way to show peer support. In our second survey, we conducted a quantitative analysis of the team cultural factors that emerged from our first survey to understand the prevalence of the reported changes. From 608 developer responses, we found that 74% of these respondents missed social interactions with colleagues and 51% reported a decrease in their communication ease with colleagues. We used data from the second survey to build a regression model to identify important team culture factors for modeling team productivity. We found that the ability to brainstorm with colleagues, difficulty communicating with colleagues, and satisfaction with interactions from social activities are important factors that are associated with how developers report their software development team's productivity. Our findings inform how managers and leaders in large software companies can support sustained team productivity during times of crisis and beyond.

Read more
Software Engineering

"Is My Mic On?" Preparing SE Students for Collaborative Remote Work and Hybrid Team Communication

Communication is essential for the success of student and professional software engineering (SE) team development projects. The projects delivered by SE courses provide valuable learning experiences for students because they teach industry-required skills such as teamwork, communication, and scheduling. Professional SE teams have adopted communication software such as Slack, Miro, Microsoft Teams, and GitHub Discussions to share files and convey information between team members. Likewise, they have distributed software development tools such as Visual Studio CodeSpaces and Jira to support productivity. In contrast, within academia, students have focused on having face-to-face meetings for team communication and communication tools for file sharing. Due to the COVID-19 pandemic, universities have been forced to switch to an online or hybrid modality abruptly, thus compelling SE students to quickly adopt communication software. This paper proposes a study on the use of communication software in industry to prepare students for remote software development positions after graduation.

Read more
Software Engineering

A Baseline Model for Software Effort Estimation

Software effort estimation (SEE) is a core activity in all software processes and development lifecycles. A range of increasingly complex methods has been considered in the past 30 years for the prediction of effort, often with mixed and contradictory results. The comparative assessment of effort prediction methods has therefore become a common approach when considering how best to predict effort over a range of project types. Unfortunately, these assessments use a variety of sampling methods and error measurements, making comparison with other work difficult. This article proposes an automatically transformed linear model (ATLM) as a suitable baseline model for comparison against SEE methods. ATLM is simple yet performs well over a range of different project types. In addition, ATLM may be used with mixed numeric and categorical data and requires no parameter tuning. It is also deterministic, meaning that results obtained are amenable to replication. These and other arguments for using ATLM as a baseline model are presented, and a reference implementation described and made available. We suggest that ATLM should be used as a baseline of effort prediction quality for all future model comparisons in SEE.

Read more
Software Engineering

A Benchmark Study of the Contemporary Toxicity Detectors on Software Engineering Interactions

Automated filtering of toxic conversations may help an Open-source software (OSS) community to maintain healthy interactions among the project participants. Although, several general purpose tools exist to identify toxic contents, those may incorrectly flag some words commonly used in the Software Engineering (SE) context as toxic (e.g., 'junk', 'kill', and 'dump') and vice versa. To encounter this challenge, an SE specific tool has been proposed by the CMU Strudel Lab (referred as the `STRUDEL' hereinafter) by combining the output of the Perspective API with the output from a customized version of the Stanford's Politeness detector tool. However, since STRUDEL's evaluation was very limited with only 654 SE text, its practical applicability is unclear. Therefore, this study aims to empirically evaluate the Strudel tool as well as four state-of-the-art general purpose toxicity detectors on a large scale SE dataset. On this goal, we empirically developed a rubric to manually label toxic SE interactions. Using this rubric, we manually labeled a dataset of 6,533 code review comments and 4,140 Gitter messages. The results of our analyses suggest significant degradation of all tools' performances on our datasets. Those degradations were significantly higher on our dataset of formal SE communication such as code review than on our dataset of informal communication such as Gitter messages. Two of the models from our study showed significant performance improvements during 10-fold cross validations after we retrained those on our SE datasets. Based on our manual investigations of the incorrectly classified text, we have identified several recommendations for developing an SE specific toxicity detector.

Read more
Software Engineering

A Cognitive and Machine Learning-Based Software Development Paradigm Supported by Context

Advances in the use of cognitive and machine learning (ML) enabled systems fuel the quest for novel approaches and tools to support software developers in executing their tasks. First, as software development is a complex and dynamic activity, these tasks are highly dependent on the characteristics of the software project and its context, and developers need comprehensive support in terms of information and guidance based on the task context. Second, there is a lack of methods based on conversational-guided agents that consider cognitive aspects such as paying attention and remembering. Third, there is also a lack of techniques that make use of historical implicit or tacit data to infer new knowledge about the project tasks such as related tasks, task experts, relevant information needed for task completion and warnings, and navigation aspects of the process such as what tasks to perform next and optimal task sequencing. Based on these challenges, this paper introduces a novel paradigm for human-machine software support based on context, cognitive assistance, and machine learning, and briefly describes ongoing research activities to realize this paradigm. The research takes advantage of the synergy among emergent methods provided in context-aware software processes, cognitive computing such as chatbots, and machine learning such as recommendation systems. These novel paradigms have the potential to transform the way software development currently occurs by allowing developers to receive valuable information and guidance in real-time while they are participating in projects.

Read more
Software Engineering

A Critical Comparison on Six Static Analysis Tools: Detection, Agreement, and Precision

Background. Developers use Automated Static Analysis Tools (ASATs) to control for potential quality issues in source code, including defects and technical debt. Tool vendors have devised quite a number of tools, which makes it harder for practitioners to select the most suitable one for their needs. To better support developers, researchers have been conducting several studies on ASATs to favor the understanding of their actual capabilities. Aims. Despite the work done so far, there is still a lack of knowledge regarding (1) which source quality problems can actually be detected by static analysis tool warnings, (2) what is their agreement, and (3) what is the precision of their recommendations. We aim at bridging this gap by proposing a large-scale comparison of six popular static analysis tools for Java projects: Better Code Hub, CheckStyle, Coverity Scan, Findbugs, PMD, and SonarQube. Method. We analyze 47 Java projects and derive a taxonomy of warnings raised by 6 state-of-the-practice ASATs. To assess their agreement, we compared them by manually analyzing - at line-level - whether they identify the same issues. Finally, we manually evaluate the precision of the tools. Results. The key results report a comprehensive taxonomy of ASATs warnings, show little to no agreement among the tools and a low degree of precision. Conclusions. We provide a taxonomy that can be useful to researchers, practitioners, and tool vendors to map the current capabilities of the tools. Furthermore, our study provides the first overview on the agreement among different tools as well as an extensive analysis of their precision.

Read more
Software Engineering

A Data Flow Analysis Framework for Data Flow Subsumption

Data flow testing creates test requirements as definition-use (DU) associations, where a definition is a program location that assigns a value to a variable and a use is a location where that value is accessed. Data flow testing is expensive, largely because of the number of test requirements. Luckily, many DU-associations are redundant in the sense that if one test requirement (e.g., node, edge, DU-association) is covered, other DU-associations are guaranteed to also be covered. This relationship is called subsumption. Thus, testers can save resources by only covering DU-associations that are not subsumed by other testing requirements. In this work, we formally describe the Data Flow Subsumption Framework (DSF) conceived to tackle the data flow subsumption problem. We show that DFS is a distributive data flow analysis framework which allows efficient iterative algorithms to find the Meet-Over-All-Paths (MOP) solution for DSF transfer functions. The MOP solution implies that the results at a point p are valid for all paths that reach p . We also present an algorithm, called Subsumption Algorithm (SA), that uses DSF transfer functions and iterative algorithms to find the local DU-associations-node subsumption; that is, the set of DU-associations that are covered whenever a node n is toured by a test. A proof of SA's correctness is presented and its complexity is analyzed.

Read more
Software Engineering

A Deep Dive on the Impact of COVID-19 in Software Development

Context: COVID-19 pandemic has impacted different business sectors around the world. Objective. This study investigates the impact of COVID-19 on software projects and software development professionals. Method: We conducted a mining software repository study based on 100 GitHub projects developed in Java using ten different metrics. Next, we surveyed 279 software development professionals for better understanding the impact of COVID-19 on daily activities and wellbeing. Results: We identified 12 observations related to productivity, code quality, and wellbeing. Conclusions: Our findings highlight that the impact of COVID-19 is not binary (reduce productivity vs. increase productivity) but rather a spectrum. For many of our observations, substantial proportions of respondents have differing opinions from each other. We believe that more research is needed to uncover specific conditions that cause certain outcomes to be more prevalent.

Read more
Software Engineering

A First Look at the Deprecation of RESTful APIs: An Empirical Study

REpresentational State Transfer (REST) is considered as one standard software architectural style to build web APIs that can integrate software systems over the internet. However, while connecting systems, RESTful APIs might also break the dependent applications that rely on their services when they introduce breaking changes, e.g., an older version of the API is no longer supported. To warn developers promptly and thus prevent critical impact on downstream applications, a deprecated-removed model should be followed, and deprecation-related information such as alternative approaches should also be listed. While API deprecation analysis as a theme is not new, most existing work focuses on non-web APIs, such as the ones provided by Java and Android. To investigate RESTful API deprecation, we propose a framework called RADA (RESTful API Deprecation Analyzer). RADA is capable of automatically identifying deprecated API elements and analyzing impacted operations from an OpenAPI specification, a machine-readable profile for describing RESTful web service. We apply RADA on 2,224 OpenAPI specifications of 1,368 RESTful APIs collected from APIs.guru, the largest directory of OpenAPI specifications. Based on the data mined by RADA, we perform an empirical study to investigate how the deprecated-removed protocol is followed in RESTful APIs and characterize practices in RESTful API deprecation. The results of our study reveal several severe deprecation-related problems in existing RESTful APIs. Our implementation of RADA and detailed empirical results are publicly available for future intelligent tools that could automatically identify and migrate usage of deprecated RESTful API operations in client code.

Read more
Software Engineering

A Framework and DataSet for Bugs in Ethereum Smart Contracts

Ethereum is the largest blockchain platform that supports smart contracts. Users deploy smart contracts by publishing the smart contract's bytecode to the blockchain. Since the data in the blockchain cannot be modified, even if these contracts contain bugs, it is not possible to patch deployed smart contracts with code updates. Moreover, there is currently neither a comprehensive classification framework for Ethereum smart contract bugs, nor detailed criteria for detecting bugs in smart contracts, making it difficult for developers to fully understand the negative effects of bugs and design new approaches to detect bugs. In this paper, to fill the gap, we first collect as many smart contract bugs as possible from multiple sources and divide these bugs into 9 categories by extending the IEEE Standard Classification for Software Anomalies. Then, we design the criteria for detecting each kind of bugs, and construct a dataset of smart contracts covering all kinds of bugs. With our framework and dataset, developers can learn smart contract bugs and develop new tools to detect and locate bugs in smart contracts. Moreover, we evaluate the state-of-the-art tools for smart contract analysis with our dataset and obtain some interesting findings: 1) Mythril, Slither and Remix are the most worthwhile combination of analysis tools. 2) There are still 10 kinds of bugs that cannot be detected by any analysis tool.

Read more

Ready to get started?

Join us today