Ripon K. Saha
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ripon K. Saha.
automated software engineering | 2013
Ripon K. Saha; Matthew Lease; Sarfraz Khurshid; Dewayne E. Perry
Locating bugs is important, difficult, and expensive, particularly for large-scale systems. To address this, natural language information retrieval techniques are increasingly being used to suggest potential faulty source files given bug reports. While these techniques are very scalable, in practice their effectiveness remains low in accurately localizing bugs to a small number of files. Our key insight is that structured information retrieval based on code constructs, such as class and method names, enables more accurate bug localization. We present BLUiR, which embodies this insight, requires only the source code and bug reports, and takes advantage of bug similarity data if available. We build BLUiR on a proven, open source IR toolkit that anyone can use. Our work provides a thorough grounding of IR-based bug localization research in fundamental IR theoretical and empirical knowledge and practice. We evaluate BLUiR on four open source projects with approximately 3,400 bugs. Results show that BLUiR matches or outperforms a current state-of-the-art tool across applications considered, even when BLUiR does not use bug similarity data used by the other tool.
acm symposium on applied computing | 2012
Manishankar Mondal; Chanchal K. Roy; Md. Saidur Rahman; Ripon K. Saha; Jens Krinke; Kevin A. Schneider
Code cloning is a controversial software engineering practice due to contradictory claims regarding its effect on software maintenance. Code stability is a recently introduced measurement technique that has been used to determine the impact of code cloning by quantifying the changeability of a code region. Although most of the existing stability analysis studies agree that cloned code is more stable than non-cloned code, the studies have two major flaws: (i) each study only considered a single stability measurement (e.g., lines of code changed, frequency of change, age of change); and, (ii) only a small number of subject systems were analyzed and these were of limited variety. In this paper, we present a comprehensive empirical study on code stability using three different stability measuring methods. We use a recently introduced hybrid clone detection tool, NiCAD, to detect the clones and analyze their stability in four dimensions: by clone type, by measuring method, by programming language, and by system size and age. Our four-dimensional investigation on 12 diverse subject systems written in three programming languages considering three clone types reveals that: (i) Type-1 and Type-2 clones are unstable, but Type-3 clones are not; (ii) clones in Java and C systems are not as stable as clones in C# systems; (iii) a systems development strategy might play a key role in defining its comparative code stability scenario; and, (iv) cloned and non-cloned regions of a subject system do not follow a consistent change pattern.
source code analysis and manipulation | 2010
Ripon K. Saha; Muhammad Asaduzzaman; Minhaz F. Zibran; Chanchal K. Roy; Kevin A. Schneider
Code clone genealogies show how clone groups evolve with the evolution of the associated software system, and thus could provide important insights on the maintenance implications of clones. In this paper, we provide an in-depth empirical study for evaluating clone genealogies in evolving open source systems at the release level. We develop a clone genealogy extractor, examine 17 open source C, Java, C++ and C# systems of diverse varieties and study different dimensions of how clone groups evolve with the evolution of the software systems. Our study shows that majority of the clone groups of the clone genealogies either propagate without any syntactic changes or change consistently in the subsequent releases, and that many of the genealogies remain alive during the evolution. These findings seem to be consistent with the findings of a previous study that clones may not be as detrimental in software maintenance as believed to be (at least by many of us), and that instead of aggressively refactoring clones, we should possibly focus on tracking and managing clones during the evolution of software systems.
international conference on software maintenance | 2011
Ripon K. Saha; Chanchal K. Roy; Kevin A. Schneider
Extracting code clone genealogies across multiple versions of a program and classifying them according to their change patterns underlies the study of code clone evolution. While there are a few studies in the area, the approaches do not handle near-miss clones well and the associated tools are often computationally expensive. To address these limitations, we present a framework for automatically extracting both exact and near-miss clone genealogies across multiple versions of a program and for identifying their change patterns using a few key similarity factors. We have developed a prototype clone genealogy extractor, applied it to three open source projects including the Linux Kernel, and evaluated its accuracy in terms of precision and recall. Our experience shows that the prototype is scalable, adaptable to different clone detection tools, and can automatically identify evolution patterns of both exact and near-miss clones by constructing their genealogies.
international conference on engineering of complex computer systems | 2011
Minhaz F. Zibran; Ripon K. Saha; Muhammad Asaduzzaman; Chanchal K. Roy
Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in code-bases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and near-miss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.
international conference on software engineering | 2015
Ripon K. Saha; Lingming Zhang; Sarfraz Khurshid; Dewayne E. Perry
Regression testing is widely used in practice for validating program changes. However, running large regression suites can be costly. Researchers have developed several techniques for prioritizing tests such that the higher-priority tests have a higher likelihood of finding bugs. A vast majority of these techniques are based on dynamic analysis, which can be precise but can also have significant overhead (e.g., for program instrumentation and test-coverage collection). We introduce a new approach, REPiR, to address the problem of regression test prioritization by reducing it to a standard Information Retrieval problem such that the differences between two program versions form the query and the tests constitute the document collection. REPiR does not require any dynamic profiling or static program analysis. As an enabling technology we leverage the open-source IR toolkit Indri. An empirical evaluation using eight open-source Java projects shows that REPiR is computationally efficient and performs better than existing (dynamic or static) techniques for the majority of subject systems.
mining software repositories | 2013
Ripon K. Saha; Chanchal K. Roy; Kevin A. Schneider; Dewayne E. Perry
Understanding the evolution of clones is important both for understanding the maintenance implications of clones and building a robust clone management system. To this end, researchers have already conducted a number of studies to analyze the evolution of clones, mostly focusing on Type-1 and Type-2 clones. However, although there are a significant number of Type-3 clones in software systems, we know a little how they actually evolve. In this paper, we perform an exploratory study on the evolution of Type-1, Type-2, and Type-3 clones in six open source software systems written in two different programming languages and compare the result with a previous study to better understand the evolution of Type-3 clones. Our results show that although Type-3 clones are more likely to change inconsistently, the absolute number of consistently changed Type-3 clone classes is higher than that of Type-1 and Type-2. Type-3 clone classes also have a lifespan similar to that of Type-1 and Type-2 clones. In addition, a considerable number of Type-1 and Type-2 clones convert into Type-3 clones during evolution. Therefore, it is important to manage type-3 clones properly to limit their negative impact. However, various automated clone management techniques such as notifying developers about clone changes or linked editing should be chosen carefully due to the inconsistent nature of Type-3 clones.
international conference on program comprehension | 2011
Manishankar Mondal; Md. Saidur Rahman; Ripon K. Saha; Chanchal K. Roy; Jens Krinke; Kevin A. Schneider
The impacts of clones on software maintenance is a long-lived debate on whether clones are beneficial or not. Some researchers argue that clones lead to additional changes during the maintenance phase and thus increase the overall maintenance effort. Moreover, they note that inconsistent changes to clones may introduce faults during evolution. On the other hand, other researchers argue that cloned code exhibits more stability than non-cloned code. Studies resulting in such contradictory outcomes may be a consequence of using different methodologies, using different clone detection tools, defining different impact assessment metrics, and evaluating different subject systems. In order to understand the conflicting results from the studies, we plan to conduct a comprehensive empirical study using a common framework incorporating nine existing methods that yielded mostly contradictory findings. Our research strategy involves implementing each of these methods using four clone detection tools and evaluating the methods on more than fifteen subject systems of different languages and of a diverse nature. We believe that our study will help eliminate tool and study biases to resolve conflicts regarding the impacts of clones on software maintenance.
conference on software maintenance and reengineering | 2014
Ripon K. Saha; Sarfraz Khurshid; Dewayne E. Perry
Bug fixing is a crucial part of software development and maintenance. A large number of bugs often indicate poor software quality since buggy behavior not only causes failures that may be costly but also has a detrimental effect on the users overall experience with the software product. The impact of long lived bugs can be even more critical since experiencing the same bug version after version can be particularly frustrating for user. While there are many studies that investigate factors affecting bug fixing time for entire bug repositories, to the best of our knowledge, none of these studies investigates the extent and reasons of long lived bugs. In this paper, we analyzed long lived bugs from five different perspectives: their proportion, severity, assignment, reasons, as well as the nature of fixes. Our study on four open-source projects shows that there are a considerable number of long lived bugs in each system and over 90% of them adversely affect the users experience. The reasons of these long lived bugs are diverse including long assignment time, not understanding their importance in advance etc. However, many bug-fixes were delayed without any specific reasons. Our analysis of bug fixing changes further shows that many long lived bugs can be fixed quickly through careful prioritization. We believe our results will help both developers and researchers to better understand factors behind delays, improve the overall bug fixing process, and investigate analytical approaches for prioritizing bugs based on bug severity as well as expected bug fixing effort.
foundations of software engineering | 2013
Ripon K. Saha; Avigit K. Saha; Dewayne E. Perry
Stack Overflow is a highly successful question-answering website in the programming community, which not only provide quick solutions to programmers’ questions but also is considered as a large repository of valuable software engineering knowledge. However, despite having a very engaged and active user community, Stack Overflow currently has more than 300K unanswered questions. In this paper, we perform an initial investigation to understand why these questions remain unanswered by applying a combination of statistical and data mining techniques. Our preliminary results indicate that although there are some topics that were never answered, most questions remained unanswered because they apparently are of little interest to the user community.