Pavneet Singh Kochhar

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pavneet Singh Kochhar is active.

Explore More

Publication

Featured researches published by Pavneet Singh Kochhar.

international conference on quality software | 2013

An Empirical Study of Adoption of Software Testing in Open Source Projects

Pavneet Singh Kochhar; Tegawendé François D Assise Bissyande; David Lo; Lingxiao Jiang

Testing is an indispensable part of software development efforts. It helps to improve the quality of software systems by finding bugs and errors during development and deployment. Huge amount of resources are spent on testing efforts. However, to what extent are they used in practice? In this study, we investigate the adoption of testing in open source projects. We study more than 20,000 non-trivial software projects and explore the correlation of test cases with various project development characteristics including: project size, development team size, number of bugs, number of bug reporters, and the programming languages of these projects.

automated software engineering | 2014

Potential biases in bug localization: do they matter?

Pavneet Singh Kochhar; Yuan Tian; David Lo

Issue tracking systems are valuable resources during software maintenance activities and contain information about the issues faced during the development of a project as well as after its release. Many projects receive many reports of bugs and it is challenging for developers to manually debug and fix them. To mitigate this problem, past studies have proposed information retrieval (IR)-based bug localization techniques, which takes as input a textual description of a bug stored in an issue tracking system, and returns a list of potentially buggy source code files. These studies often evaluate their effectiveness on issue reports marked as bugs in issue tracking systems, using as ground truth the set of files that are modified in commits that fix each bug. However, there are a number of potential biases that can impact the validity of the results reported in these studies. First, issue reports marked as bugs might not be reports of bugs due to error in the reporting and classification process. Many issue reports are about documentation update, request for improvement, refactoring, code cleanups, etc. Second, bug reports might already explicitly specify the buggy program files and for these reports bug localization techniques are not needed. Third, files that get modified in commits that fix the bugs might not contain the bug. This study investigates the extent these potential biases affect the results of a bug localization technique and whether bug localization researchers need to consider these potential biases when evaluating their solutions. In this paper, we analyse issue reports from three different projects: HTTPClient, Jackrabbit, and Lucene-Java to examine the impact of above three biases on bug localization. Our results show that one of these biases significantly and substantially impacts bug localization results, while the other two biases have negligible or minor impact.

social informatics | 2013

Predicting Best Answerers for New Questions: An Approach Leveraging Topic Modeling and Collaborative Voting

Yuan Tian; Pavneet Singh Kochhar; Ee-Peng Lim; Feida Zhu; David Lo

Community Question Answering (CQA) sites are becoming increasingly important source of information where users can share knowledge on various topics. Although these platforms bring new opportunities for users to seek help or provide solutions, they also pose many challenges with the ever growing size of the community. The sheer number of questions posted everyday motivates the problem of routing questions to the appropriate users who can answer them. In this paper, we propose an approach to predict the best answerer for a new question on CQA site. Our approach considers both user interest and user expertise relevant to the topics of the given question. A user’s interests on various topics are learned by applying topic modeling to previous questions answered by the user, while the user’s expertise is learned by leveraging collaborative voting mechanism of CQA sites. We have applied our model on a dataset extracted from StackOverflow, one of the biggest CQA sites. The results show that our approach outperforms the TF-IDF based approach.

conference on software maintenance and reengineering | 2013

Adoption of Software Testing in Open Source Projects--A Preliminary Study on 50,000 Projects

Pavneet Singh Kochhar; Tegawendé F. Bissyandé; David Lo; Lingxiao Jiang

In software engineering, testing is a crucial activity that is designed to ensure the quality of program code. For this activity, development teams spend substantial resources constructing test cases to thoroughly assess the correctness of software functionality. What is however the proportion of open source projects that include test cases? What kind of projects are more likely to include test cases? In this study, we explore 50,000 projects and investigate the correlation between the presence of test cases and various project development characteristics, including the lines of code and the size of development teams.

international conference on engineering of complex computer systems | 2014

Automatic Fine-Grained Issue Report Reclassification

Pavneet Singh Kochhar; Ferdian Thung; David Lo

Issue tracking systems are valuable resources during software maintenance activities. These systems contain different categories of issue reports such as bug, request for improvement (RFE), documentation, refactoring, task etc. While logging issue reports into a tracking system, reporters can indicate the category of the reports. Herzig et al. Recently reported that more than 40% of issue reports are given wrong categories in issue tracking systems. Among issue reports that are marked as bugs, more than 30% of them are not bug reports. The misclassification of issue reports can adversely affects developers as they then need to manually identify the categories of various issue reports. To address this problem, in this paper we propose an automated technique that reclassifies an issue report into an appropriate category. Our approach extracts various feature values from a bug report and predicts if a bug report needs to be reclassified and its reclassified category. We have evaluated our approach to reclassify more than 7,000 bug reports from HTTP Client, Jackrabbit, Lucene-Java, Rhino, and Tomcat 5 into 1 out of 13 categories. Our experiments show that we can achieve a weighted precision, recall, and F1 (F-measure) score in the ranges of 0.58-0.71, 0.61-0.72, and 0.57-0.71 respectively. In terms of F1, which is the harmonic mean of precision and recall, our approach can substantially outperform several baselines by 28.88%-416.66%.

ieee international conference on software analysis evolution and reengineering | 2015

Code coverage and test suite effectiveness: Empirical study with real bugs in large systems

Pavneet Singh Kochhar; Ferdian Thung; David Lo

During software maintenance, testing is a crucial activity to ensure the quality of program code as it evolves over time. With the increasing size and complexity of software, adequate software testing has become increasingly important. Code coverage is often used as a yardstick to gauge the comprehensiveness of test cases and the adequacy of testing. A test suite quality is often measured by the number of bugs it can find (aka. kill). Previous studies have analysed the quality of a test suite by its ability to kill mutants, i.e., artificially seeded faults. However, mutants do not necessarily represent real bugs. Moreover, many studies use small programs which increases the threat of the applicability of the results on large real-world systems. In this paper, we analyse two large software systems to measure the relationship of code coverage and its effectiveness in killing real bugs from the software systems. We use Randoop, a random test generation tool to generate test suites with varying levels of coverage and run them to analyse if the test suites can kill each of the real bugs or not. In this preliminary study, we have performed an experiment on 67 and 92 real bugs from Apache HTTPClient and Mozilla Rhino, respectively. Our experiment finds that there is indeed statistically significant correlation between code coverage and bug kill effectiveness. The strengths of the correlation, however, differ for the two software systems. For HTTPClient, the correlation is moderate for both statement and branch coverage. For Rhino, the correlation is strong for both statement and branch coverage.

Empirical Software Engineering | 2017

Why and how developers fork what from whom in GitHub

Jing Jiang; David Lo; Jiahuan He; Xin Xia; Pavneet Singh Kochhar; Li Zhang

Forking is the creation of a new software repository by copying another repository. Though forking is controversial in traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. Developers freely fork repositories, use codes as their own and make changes. A deep understanding of repository forking can provide important insights for OSS community and GitHub. In this paper, we explore why and how developers fork what from whom in GitHub. We collect a dataset containing 236,344 developers and 1,841,324 forks. We make surveys, and analyze programming languages and owners of forked repositories. Our main observations are: (1) Developers fork repositories to submit pull requests, fix bugs, add new features and keep copies etc. Developers find repositories to fork from various sources: search engines, external sites (e.g., Twitter, Reddit), social relationships, etc. More than 42 % of developers that we have surveyed agree that an automated recommendation tool is useful to help them pick repositories to fork, while more than 44.4 % of developers do not value a recommendation tool. Developers care about repository owners when they fork repositories. (2) A repository written in a developer’s preferred programming language is more likely to be forked. (3) Developers mostly fork repositories from creators. In comparison with unattractive repository owners, attractive repository owners have higher percentage of organizations, more followers and earlier registration in GitHub. Our results show that forking is mainly used for making contributions of original repositories, and it is beneficial for OSS community. Moreover, our results show the value of recommendation and provide important insights for GitHub to recommend repositories.

Empirical Software Engineering | 2017

What do developers search for on the web

Xin Xia; Lingfeng Bao; David Lo; Pavneet Singh Kochhar; Ahmed E. Hassan; Zhenchang Xing

Developers commonly make use of a web search engine such as Google to locate online resources to improve their productivity. A better understanding of what developers search for could help us understand their behaviors and the problems that they meet during the software development process. Unfortunately, we have a limited understanding of what developers frequently search for and of the search tasks that they often find challenging. To address this gap, we collected search queries from 60 developers, surveyed 235 software engineers from more than 21 countries across five continents. In particular, we asked our survey participants to rate the frequency and difficulty of 34 search tasks which are grouped along the following seven dimensions: general search, debugging and bug fixing, programming, third party code reuse, tools, database, and testing. We find that searching for explanations for unknown terminologies, explanations for exceptions/error messages (e.g., HTTP 404), reusable code snippets, solutions to common programming bugs, and suitable third-party libraries/services are the most frequent search tasks that developers perform, while searching for solutions to performance bugs, solutions to multi-threading bugs, public datasets to test newly developed algorithms or systems, reusable code snippets, best industrial practices, database optimization solutions, solutions to security bugs, and solutions to software configuration bugs are the most difficult search tasks that developers consider. Our study sheds light as to why practitioners often perform some of these tasks and why they find some of them to be challenging. We also discuss the implications of our findings to future research in several research areas, e.g., code search engines, domain-specific search engines, and automated generation and refinement of search queries.

ieee international conference on software analysis evolution and reengineering | 2017

Detecting similar repositories on GitHub

Yun Zhang; David Lo; Pavneet Singh Kochhar; Xin Xia; Quanlai Li; Jianling Sun

GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. In this paper, we propose a novel approach that can effectively detect similar repositories on GitHub. Our approach is designed based on three heuristics leveraging two data sources (i.e., GitHub stars and readme files) which are not considered in previous works. The three heuristics are: repositories whose readme files contain similar contents are likely to be similar with one another, repositories starred by users of similar interests are likely to be similar, and repositories starred together within a short period of time by the same user are likely to be similar. Based on these three heuristics, we compute three relevance scores (i.e., readme-based relevance, stargazer-based relevance, and time-based relevance) to assess the similarity between two repositories. By integrating the three relevance scores, we build a recommendation system called RepoPal to detect similar repositories. We compare RepoPal to a prior state-of-the-art approach CLAN using one thousand Java repositories on GitHub. Our empirical evaluation demonstrates that RepoPal achieves a higher success rate, precision and confidence over CLAN.

automated software engineering | 2014

DupFinder: integrated tool support for duplicate bug report detection

Ferdian Thung; Pavneet Singh Kochhar; David Lo

To track bugs that appear in a software, developers often make use of a bug tracking system. Users can report bugs that they encounter in such a system. Bug reporting is inherently an uncoordinated distributed process though and thus when a user submits a new bug report, there might be cases when another bug report describing exactly the same problem is already present in the system. Such bug reports are duplicate of each other and these duplicate bug reports need to be identified. A number of past studies have proposed a number of automated approaches to detect duplicate bug reports. However, these approaches are not integrated to existing bug tracking systems. In this paper, we propose a tool named DupFinder, which implements the state-of-the-art unsupervised duplicate bug report approach by Runeson et al., as a Bugzilla extension. DupFinder does not require any training data and thus can easily be deployed to any project. DupFinder extracts texts from summary and description fields of a new bug report and recent bug reports present in a bug tracking system, uses vector space model to measure similarity of bug reports, and provides developers with a list of potential duplicate bug reports based on the similarity of these reports with the new bug report. We have released DupFinder as an open source tool in GitHub, which is available at: https://github.com/smagsmu/dupfinder.

Explore More