Sarah Nadi
University of Alberta
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sarah Nadi.
international conference on software engineering | 2014
Sarah Nadi; Thorsten Berger; Christian Kästner; Krzysztof Czarnecki
Highly-configurable systems allow users to tailor the software to their specific needs. Not all combinations of configuration options are valid though, and constraints arise for technical or non-technical reasons. Explicitly describing these constraints in a variability model allows reasoning about the supported configurations. To automate creating variability models, we need to identify the origin of such configuration constraints. We propose an approach which uses build-time errors and a novel feature-effect heuristic to automatically extract configuration constraints from C code. We conduct an empirical study on four highly-configurable open-source systems with existing variability models having three objectives in mind: evaluate the accuracy of our approach, determine the recoverability of existing variability-model constraints using our analysis, and classify the sources of variability-model constraints. We find that both our extraction heuristics are highly accurate (93% and 77% respectively), and that we can recover 19% of the existing variability-models using our approach. However, we find that many of the remaining constraints require expert knowledge or more expensive analyses. We argue that our approach, tooling, and experimental results support researchers and practitioners working on variability model re-engineering, evolution, and consistency-checking techniques.
european conference on object-oriented programming | 2015
Flávio Medeiros; Christian Kästner; Márcio Ribeiro; Sarah Nadi; Rohit Gheyi
The C preprocessor has received strong criticism in academia, among others regarding separation of concerns, error proneness, and code obfuscation, but is widely used in practice. Many (mostly academic) alternatives to the preprocessor exist, but have not been adopted in practice. Since developers continue to use the preprocessor despite all criticism and research, we ask how practitioners perceive the C preprocessor. We performed interviews with 40 developers, used grounded theory to analyze the data, and cross-validated the results with data from a survey among 202 developers, repository mining, and results from previous studies. In particular, we investigated four research questions related to why the preprocessor is still widely used in practice, common problems, alternatives, and the impact of undisciplined annotations. Our study shows that developers are aware of the criticism the C preprocessor receives, but use it nonetheless, mainly for portability and variability. Many developers indicate that they regularly face preprocessor-related problems and preprocessor-related bugs. The majority of our interviewees do not see any current C-native technologies that can entirely replace the C preprocessor. However, developers tend to mitigate problems with guidelines, even though those guidelines are not enforced consistently. We report the key insights gained from our study and discuss implications for practitioners and researchers on how to better use the C preprocessor to minimize its negative impact.
conference on software maintenance and reengineering | 2012
Sarah Nadi; Richard C. Holt
The Linux kernel is extensively specialized or configured so that it can be used for many purposes. This variability is implemented by means of three distinct artifacts: source code files, Kconfig (configuration) files, and Make files. Any inconsistencies between these three can lead to undesirable anomalies which can lead to increased maintenance efforts or decreased reliability. This paper extends published work that had found anomalies (dead and undead code blocks) by concentrating largely on code and Kconfig files. We detect further anomalies in the Linux kernel when we also consider the Make files. At the level of code blocks, our work exposes many additional anomalies - more than we could study manually. We found that when we lift the level from code blocks to code files, the detected anomalies became easier to study and understand and thus more useful to the developer. By means of examples, we illustrate how the anomalies we detect can lead to undesired behavior. We show how, over time, developers tend to find and delete such anomalies. We suggest that automatic detection of such anomalies has the potential to decrease maintenance efforts and increase reliability.
mining software repositories | 2013
Hadi Hemmati; Sarah Nadi; Olga Baysal; Oleksii Kononenko; Wei Wang; Reid Holmes; Michael W. Godfrey
The Mining Software Repositories (MSR) research community has grown significantly since the first MSR workshop was held in 2004. As the community continues to broaden its scope and deepens its expertise, it is worthwhile to reflect on the best practices that our community has developed over the past decade of research. We identify these best practices by surveying past MSR conferences and workshops. To that end, we review all 117 full papers published in the MSR proceedings between 2004 and 2012. We extract 268 comments from these papers, and categorize them using a grounded theory methodology. From this evaluation, four high-level themes were identified: data acquisition and preparation, synthesis, analysis, and sharing/replication. Within each theme we identify several common recommendations, and also examine how these recommendations have evolved over the past decade. In an effort to make this survey a living artifact, we also provide a public forum that contains the extracted recommendations in the hopes that the MSR community can engage in a continuing discussion on our evolving best practices.
international conference on software engineering | 2016
Sarah Nadi; Stefan Krüger; Mira Mezini; Eric Bodden
To protect sensitive data processed by current applications, developers, whether security experts or not, have to rely on cryptography. While cryptography algorithms have become increasingly advanced, many data breaches occur because developers do not correctly use the corresponding APIs. To guide future research into practical solutions to this problem, we perform an empirical investigation into the obstacles developers face while using the Java cryptography APIs, the tasks they use the APIs for, and the kind of (tool) support they desire. We triangulate data from four separate studies that include the analysis of 100 StackOverflow posts, 100 GitHub repositories, and survey input from 48 developers. We find that while developers find it difficult to use certain cryptographic algorithms correctly, they feel surprisingly confident in selecting the right cryptography concepts (e.g., encryption vs. signatures). We also find that the APIs are generally perceived to be too low-level and that developers prefer more task-based solutions.
Journal of Software: Evolution and Process | 2014
Sarah Nadi; Richard C. Holt
Although build systems control what code gets compiled into the final built product, they are often overlooked when studying software variability. The Linux kernel is one of the biggest open source software systems supporting variability and contains over 10,000 configurable features described in its Kconfig files. To understand the role of the build system in variability implementation, we use Linux as a case study. We study its build system, Kbuild, and extract the variability constraints in its Makefiles. We first provide a quantitative analysis of the variability in Kbuild. We then study how the variability constraints in the build system affect variability anomalies detected in Linux. We concentrate on dead and undead artifacts, and by extending previous work, we show that considering build system variability constraints allows more anomalies to be detected. We provide examples of such anomalies on both the code block and source file level. Our work shows that Kbuild contains a large percentage of the variability information in Linux, so it should not be ignored during variability analysis. Nonetheless, the anomalies we find suggest that variability on the file level in Kbuild is consistent with Kconfig, whereas the constraints on the code level are harder to keep consistent with both Kbuild and Kconfig. Copyright
ieee international conference on software analysis evolution and reengineering | 2016
Sven Amann; Sebastian Proksch; Sarah Nadi; Mira Mezini
Integrated Development Environments (IDEs) provide a convenient standalone solution that supports developers during various phases of software development. In order to provide better support for developers within such IDEs, we need to understand how much time developers spend using various parts of a given IDE and how often they use available assistance tools. To infer useful conclusions, such information should be gathered for different types of IDEs for different languages. In this paper, we instrument the previously unexplored Visual Studio IDE and track the interactions of developers at an industry partners software-development department. As a result, we capture interactions for more than 6300 hours of work time, from between 27 and 84 professional C# developers. Our work reports how much time professional developers spend on activities such as code editing and execution or navigation, as well as how often they use assistance tools provided by the IDE. We compare our findings to those of prior studies involving other IDEs and discuss the implications of the commonalities and differences for research on (integrated) developer-assistance tools.
working conference on reverse engineering | 2011
Sarah Nadi; Richard C. Holt
The Linux kernel has long been an interesting subject of study in terms of its source code. Recently, it has also been studied in terms of its variability since the Linux kernel can be configured to include or omit certain features according to the users selection. These features are defined in the Kconfig files included in the Linux kernel code. Several articles study both the source code and Kconfig files to ensure variability is correctly implemented and to detect anomalies. However, these studies ignore the Make files which are another important component that controls the variability of the Linux kernel. The Make files are responsible for specifying what actually gets compiled and built into the final kernel. With over 1,300 Make files, more than 35,000 source code files, and over 10,000 Kconfig features, inconsistencies and anomalies are inevitable. In this paper, we explore the Linuxs Make files (Kbuild) to detect anomalies. We develop three rules to identify anomalies in the Make files. Using these rules, we detect 89 anomalies in the latest release of the Linux kernel (2.6.38.6). We also perform a longitudinal analysis to study the evolution of Kbuild anomalies over time, and the solutions implemented to correct them. Our results show that many of the anomalies we detect are eventually corrected in future releases. This work is a first attempt at exploring the consistency of the variability implemented in Kbuild with the rest of the kernel. Such work opens the door for automatic anomaly detection in build systems which can save developers time in the future.
mining software repositories | 2013
Sarah Nadi; Christian Dietrich; Reinhard Tartler; Richard C. Holt; Daniel Lohmann
The Linux kernel is one of the largest configurable open source software systems implementing static variability. In Linux, variability is scattered over three different artifacts: source code files, Kconfig files, and Makefiles. Previous work detected inconsistencies between these artifacts that led to anomalies in the intended variability of Linux. We call these variability anomalies. However, there has been no work done to analyze how these variability anomalies are introduced in the first place, and how they get fixed. In this work, we provide an analysis of the causes and fixes of variability anomalies in Linux. We first perform an exploratory case study that uses an existing set of patches which solve variability anomalies to identify patterns for their causes. The observations we make from this dataset allow us to develop four research questions which we then answer in a confirmatory case study on the scope of the whole Linux kernel. We show that variability anomalies exist for several releases in the kernel before they get fixed, and that contrary to our initial suspicion, typos in feature names do not commonly cause these anomalies. Our results show that variability anomalies are often introduced through incomplete patches that change Kconfig definitions without properly propagating these changes to the rest of the system. Anomalies are then commonly fixed through changes to the code rather than to Kconfig files.
sigplan symposium on new ideas new paradigms and reflections on programming and software | 2015
Steven Arzt; Sarah Nadi; Karim Ali; Eric Bodden; Sebastian Erdweg; Mira Mezini
While cryptography is now readily available to everyone and can, provably, protect private information from attackers, we still frequently hear about major data leakages, many of which are due to improper use of cryptographic mechanisms. The problem is that many application developers are not cryptographic experts. Even though high-quality cryptographic APIs are widely available, programmers often select the wrong algorithms or misuse APIs due to a lack of understanding. Such issues arise with both simple operations such as encryption as well as with complex secure communication protocols such as SSL. In this paper, we provide a long-term solution that helps application developers integrate cryptographic components correctly and securely by bridging the gap between cryptographers and application developers. Our solution consists of a software product line (with an underlying feature model) that automatically identifies the correct cryptographic algorithms to use, based on the developers answers to high-level questions in non-expert terminology. Each feature (i.e., cryptographic algorithm) maps into corresponding Java code and a usage protocol describing API restrictions. By composing the users selected features, we automatically synthesize a secure code blueprint and a usage protocol that corresponds to the selected usage scenario. Since the developer may change the application code over time, we use the usage protocols to statically analyze the program and ensure that the correct use of the API is not violated over time.