Iman Keivanloo
Queen's University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Iman Keivanloo.
international conference on software engineering | 2014
Iman Keivanloo; Juergen Rilling; Ying Zou
Working code examples are useful resources for pragmatic reuse in software development. A working code example provides a solution to a specific programming problem. Earlier studies have shown that existing code search engines are not successful in finding working code examples. They fail in ranking high quality code examples at the top of the result set. To address this shortcoming, a variety of pattern-based solutions are proposed in the literature. However, these solutions cannot be integrated seamlessly in Internet-scale source code engines due to their high time complexity or query language restrictions. In this paper, we propose an approach for spotting working code examples which can be adopted by Internet-scale source code search engines. The time complexity of our approach is as low as the complexity of existing code search engines on the Internet and considerably lower than the pattern-based approaches supporting free-form queries. We study the performance of our approach using a representative corpus of 25,000 open source Java projects. Our findings support the feasibility of our approach for Internet-scale code search. We also found that our approach outperforms Ohloh Code search engine, previously known as Koders, in spotting working code examples.
Empirical Software Engineering | 2016
Feng Zhang; Audris Mockus; Iman Keivanloo; Ying Zou
Software defects can lead to undesired results. Correcting defects costs 50 % to 75 % of the total software development budgets. To predict defective files, a prediction model must be built with predictors (e.g., software metrics) obtained from either a project itself (within-project) or from other projects (cross-project). A universal defect prediction model that is built from a large set of diverse projects would relieve the need to build and tailor prediction models for an individual project. A formidable obstacle to build a universal model is the variations in the distribution of predictors among projects of diverse contexts (e.g., size and programming language). Hence, we propose to cluster projects based on the similarity of the distribution of predictors, and derive the rank transformations using quantiles of predictors for a cluster. We fit the universal model on the transformed data of 1,385 open source projects hosted on SourceForge and GoogleCode. The universal model obtains prediction performance comparable to the within-project models, yields similar results when applied on five external projects (one Apache and four Eclipse projects), and performs similarly among projects with different context factors. At last, we investigate what predictors should be included in the universal model. We expect that this work could form a basis for future work on building a universal model and would lead to software support tools that incorporate it into a regular development workflow.
international conference on software maintenance | 2014
Jeffrey Svajlenko; Judith F. Islam; Iman Keivanloo; Chanchal K. Roy; Mohammad Mamun Mia
Recently, new applications of code clone detection and search have emerged that rely upon clones detected across thousands of software systems. Big data clone detection and search algorithms have been proposed as an embedded part of these new applications. However, there exists no previous benchmark data for evaluating the recall and precision of these emerging techniques. In this paper, we present a Big Data clone detection benchmark that consists of known true and false positive clones in a Big Data inter-project Java repository. The benchmark was built by mining and then manually checking clones of ten common functionalities. The benchmark contains six million true positive clones of different clone types: Type-1, Type-2, Type-3 and Type-4, including various strengths of Type-3 similarity (strong, moderate, weak). These clones were found by three judges over 216 hours of manual validation efforts. We show how the benchmark can be used to measure the recall and precision of clone detection techniques.
international conference on service oriented computing | 2014
Shaohua Wang; Iman Keivanloo; Ying Zou
With the rapid adoption of REpresentational State Transfer (REST), more software organizations expose their applications as RESTful web APIs and client code developers integrate RESTful APIs into their applications. When web APIs evolve, the client code developers have to update their applications to incorporate the API changes accordingly. However client code developers often encounter challenges during the migration and API providers have little knowledge of how client code developers react to the API changes. In this paper, we investigate the changes among subsequent versions of APIs and classify the identified changes to understand how the RESTful web APIs evolve. We study the on-line discussion from developers to the API changes by analyzing the StackOverflow questions. Through an empirical study, we identify 21 change types and 7 of them are new compared with existing studies. We find that a larger portion of RESTful web API elements are changed between versions compared with Java APIs and WSDL services. Moreover, our results show that adding new methods in the new version causes more questions and views from developers. However the deleted methods draw more relevant discussions. In general, our results provide valuable insights of RESTful web API evolution and help service providers understand how their consumers react to the API changes in order to improve the practice of evolving the service APIs.
Empirical Software Engineering | 2017
Haoran Niu; Iman Keivanloo; Ying Zou
Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user’s queries. Essentially, a code search engine provides a ranking schema, which combines a set of ranking features to calculate the relevance between a query and candidate code examples. Consequently, the ranking schema places relevant code examples at the top of the result list. However, it is difficult to determine the configurations of the ranking schemas subjectively. In this paper, we propose a code example search approach that applies a machine learning technique to automatically train a ranking schema. We use the trained ranking schema to rank candidate code examples for new queries at run-time. We evaluate the ranking performance of our approach using a corpus of over 360,000 code snippets crawled from 586 open-source Android projects. The performance evaluation study shows that the learning-to-rank approach can effectively rank code examples, and outperform the existing ranking schemas by about 35.65 % and 48.42 % in terms of normalized discounted cumulative gain (NDCG) and expected reciprocal rank (ERR) measures respectively.
conference on software maintenance and reengineering | 2014
Shuai Xie; Foutse Khomh; Ying Zou; Iman Keivanloo
Copy and paste activities create clone groups in software systems. The evolution of a clone group across the history of a software system is termed as clone genealogy. During the evolution of a clone group, developers may change the location of the code fragments in the clone group. The type of the clone group may also change (e.g., from Type-1 to Type-2). These two phenomena have been referred to as clone migration and clone mutation respectively. Previous studies have found that clone migration occur frequently in software systems, and suggested that clone migration can induce faults in a software system. In this paper, we examine how clone migration phenomena affect the risk for faults in clone segments, clone groups, and clone genealogies from three long-lived software systems JBoss, APACHE-ANT, and ARGOUML. Results show that: (1) migrated clone segments, clone groups, and clone genealogies are not equally fault-prone; (2) when a clone mutation occurs during a clone migration, the risk for faults in the migrated clone is increased; (3) migrating a clone that was not changed for a longer period of time is risky.
source code analysis and manipulation | 2015
Mohammad Masudur Rahman; Chanchal K. Roy; Iman Keivanloo
Recently, automatic code comment generation is proposed to facilitate program comprehension. Existing code comment generation techniques focus on describing the functionality of the source code. However, there are other aspects such as insights about quality or issues of the code, which are overlooked by earlier approaches. In this paper, we describe a mining approach that recommends insightful comments about the quality, deficiencies or scopes for further improvement of the source code. First, we conduct an exploratory study that motivates crowdsourced knowledge from Stack Overflow discussions as a potential resource for source code comment recommendation. Second, based on the findings from the exploratory study, we propose a heuristic-based technique for mining insightful comments from Stack Overflow Q & A site for source code comment recommendation. Experiments with 292 Stack Overflow code segments and 5,039 discussion comments show that our approach has a promising recall of 85.42%. We also conducted a complementary user study which confirms the accuracy and usefulness of the recommended comments.
ieee international conference on software analysis evolution and reengineering | 2015
Iman Keivanloo; Feng Zhang; Ying Zou
Code clones are unavoidable entities in software ecosystems. A variety of clone-detection algorithms are available for finding code clones. For Type-3 clone detection at method granularity (i.e., similar methods with changes in statements), dissimilarity threshold is one of the possible configuration parameters. Existing approaches use a single threshold to detect Type-3 clones across a repository. However, our study shows that to detect Type-3 clones at method granularity on a large-scale heterogeneous repository, multiple thresholds are often required. We find that the performance of clone detection improves if selecting different thresholds for various groups of clones in a heterogeneous repository (i.e., various applications). In this paper, we propose a threshold-free approach to detect Type-3 clones at method granularity across a large number of applications. Our approach uses an unsupervised learning algorithm, i.e., k-means, to determine true and false clones. We use a clone benchmark with 330,840 tagged clones from 24,824 open source Java projects for our study. We observe that our approach improves the performance significantly by 12% in terms of F-measure. Furthermore, our threshold-free approach eliminates the concern of practitioners about possible misconfiguration of Type-3 clone detection tools.
IEEE Transactions on Services Computing | 2015
Bipin Upadhyaya; Ying Zou; Iman Keivanloo; Joanna Ng
Web service composition enables seamless and dynamic integration of Web services. The behavior of participant Web services determines the overall performance of a composition. Therefore, it is important to choose high quality services for service composition. Existing Web service selection and discovery approaches rely on non-functional aspects (also known as quality of service or QoS), e.g., response time and availability. Though these parameters are crucial for selecting Web services, they may not reflect the users perspective of quality. In this paper, we explore the feasibility of incorporating perceived quality from users perspective for service selection and composition. We name such quality attributes as quality of experience (QoE). First, we propose a solution that automatically mines and identifies QoE attributes from the Web. Second, we study the application of such dynamically extracted QoE attributes for service selection. For the evaluation purpose, we collected more than 34,000 reviews from 58 different services in six domains. Our findings show that it is possible to automatically identify QoE attributes with an average precision and recall of 92 and 80 percent respectively. Our study shows that there is a strong positive correlation between QoS and QoE. Hence QoE can be used during service selection especially when QoS data are not available. Furthermore, we found 70 percent of service discovery queries indeed contain QoE attributes showing the importance of QoE attributes during the service discovery phase.
Science of Computer Programming | 2014
Iman Keivanloo; Chanchal K. Roy; Juergen Rilling
Abstract While source code clone detection is a well-established research area, finding similar code fragments in binary and other intermediate code representations has been not yet that widely studied. In this paper, we introduce SeByte, a bytecode clone detection and search model that applies semantic-enabled token matching. It is developed based on the idea of relaxation on the code fingerprints. This approach separates the input content based on the types of tokens into different dimensions, with each dimension representing the input content from a specific point of view. Following this approach, SeByte compares each dimension separately and independently which we refer to as multi-dimensional comparison in our research. As the similarity search function we use a well-known measure that supports our multi-dimensional comparison heuristic, the Jaccard similarity coefficient. Our preliminary study shows that SeByte can detect clones that are missed by existing approaches due to the differences in the input data and the search algorithm. We then further exploit the model to build a scalable bytecode clone search engine. This extension meets the requirements of a classical search engine including the ranking of result sets. Our evaluation with a large dataset of 500,000 compiled Java classes, which we extracted from the six most recent versions of the Eclipse IDE, showed that our SeByte search is not only scalable but also capable of providing a reliable ranking.