Is this you? Create Your Porfile

Mark Grechanik

University of Illinois at Chicago

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Grechanik is active.

Explore More

Publication

Featured researches published by Mark Grechanik.

international conference on software engineering | 2011

Portfolio: finding relevant functions and their usage

Collin McMillan; Mark Grechanik; Denys Poshyvanyk; Qing Xie; Chen Fu

Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or arbitrary code fragments [30, 29, 31]. Therefore, programmers require support in finding relevant functions and determining how those functions are used. Unfortunately, existing code search engines do not provide enough of this support to developers, thus reducing the effectiveness of code reuse. We provide this support to programmers in a code search system called Portfolio that retrieves and visualizes relevant functions and their usages. We have built Portfolio using a combination of models that address surfing behavior of programmer and sharing Related concepts among functions. We conducted an experiment with 49 professional programmers to compare Portfolio to Google Code Search and Koders using a standard methodology. The results show with strong statistical significance that users find more relevant functions with higher precision with Portfolio than with Google Code Search and Koders.

international conference on software engineering | 2010

A search engine for finding highly relevant applications

Mark Grechanik; Chen Fu; Qing Xie; Collin McMillan; Denys Poshyvanyk; Chad M. Cumby

A fundamental problem of finding applications that are highly relevant to development tasks is the mismatch between the high-level intent reflected in the descriptions of these tasks and low-level implementation details of applications. To reduce this mismatch we created an approach called Exemplar (EXEcutable exaMPLes ARchive) for finding highly relevant software projects from large archives of applications. After a programmer enters a natural-language query that contains high-level concepts (e.g., MIME, data sets), Exemplar uses information retrieval and program analysis techniques to retrieve applications that implement these concepts. Our case study with 39 professional Java programmers shows that Exemplar is more effective than Sourceforge in helping programmers to quickly find highly relevant applications.

international conference on software engineering | 2009

Maintaining and evolving GUI-directed test scripts

Mark Grechanik; Qing Xie; Chen Fu

Since manual black-box testing of GUI-based APplications (GAPs) is tedious and laborious, test engineers create test scripts to automate the testing process. These test scripts interact with GAPs by performing actions on their GUI objects. An extra effort that test engineers put in writing test scripts is paid off when these scripts are run repeatedly. Unfortunately, releasing new versions of GAPs with modified GUIs breaks their corresponding test scripts thereby obliterating benefits of test automation. We offer a novel approach for maintaining and evolving test scripts so that they can test new versions of their respective GAPs. We built a tool to implement our approach, and we conducted a case study with forty five professional programmers and test engineers to evaluate this tool. The results show with strong statistical significance that users find more failures and report fewer false positives (p ≪ 0.02) in test scripts with our tool than with a flagship industry product and a baseline manual approach. Our tool is lightweight and it takes less than eight seconds to analyze approximately 1KLOC of test scripts.

international conference on software engineering | 2012

Detecting similar software applications

Collin McMillan; Mark Grechanik; Denys Poshyvanyk

Although popular text search engines allow users to retrieve similar web pages, source code search engines do not have this feature. Detecting similar applications is a notoriously difficult problem, since it implies that similar highlevel requirements and their low-level implementations can be detected and matched automatically for different applications. We created a novel approach for automatically detecting Closely reLated ApplicatioNs (CLAN) that helps users detect similar applications for a given Java application. Our main contributions are an extension to a framework of relevance and a novel algorithm that computes a similarity index between Java applications using the notion of semantic layers that correspond to packages and class hierarchies. We have built CLAN and we conducted an experiment with 33 participants to evaluate CLAN and compare it with the closest competitive approach, MUDABlue. The results show with strong statistical significance that CLAN automatically detects similar applications from a large repository of 8,310 Java applications with a higher precision than MUDABlue.

empirical software engineering and measurement | 2010

An empirical investigation into a large-scale Java open source code repository

Mark Grechanik; Collin McMillan; Luca DeFerrari; Marco Comi; Stefano Crespi; Denys Poshyvanyk; Chen Fu; Qing Xie; Carlo Ghezzi

Getting insight into different aspects of source code artifacts is increasingly important -- yet there is little empirical research using large bodies of source code, and subsequently there are not much statistically significant evidence of common patterns and facts of how programmers write source code. We pose 32 research questions, explain rationale behind them, and obtain facts from 2,080 randomly chosen Java applications from Sourceforge. Among these facts we find that most methods have one or zero arguments or they do not return any values, few methods are overridden, most inheritance hierarchies have the depth of one, close to 50% of classes are not explicitly inherited from any classes, and the number of methods is strongly correlated with the number of fields in a class.

international conference on software engineering | 2012

Automatically finding performance problems with feedback-directed learning software testing

Mark Grechanik; Chen Fu; Qing Xie

A goal of performance testing is to find situations when applications unexpectedly exhibit worsened characteristics for certain combinations of input values. A fundamental question of performance testing is how to select a manageable subset of the input data faster to find performance problems in applications automatically. We offer a novel solution for finding performance problems in applications automatically using black-box software testing. Our solution is an adaptive, feedback-directed learning testing system that learns rules from execution traces of applications and then uses these rules to select test input data automatically for these applications to find more performance problems when compared with exploratory random testing. We have implemented our solution and applied it to a medium-size application at a major insurance company and to an open-source application. Performance problems were found automatically and confirmed by experienced testers and developers.

IEEE Transactions on Software Engineering | 2012

Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications

Collin McMillan; Mark Grechanik; Denys Poshyvanyk; Chen Fu; Qing Xie

A fundamental problem of finding software applications that are highly relevant to development tasks is the mismatch between the high-level intent reflected in the descriptions of these tasks and low-level implementation details of applications. To reduce this mismatch we created an approach called EXEcutable exaMPLes ARchive (Exemplar) for finding highly relevant software projects from large archives of applications. After a programmer enters a natural-language query that contains high-level concepts (e.g., MIME, datasets), Exemplar retrieves applications that implement these concepts. Exemplar ranks applications in three ways. First, we consider the descriptions of applications. Second, we examine the Application Programming Interface (API) calls used by applications. Third, we analyze the dataflow among those API calls. We performed two case studies (with professional and student developers) to evaluate how these three rankings contribute to the quality of the search results from Exemplar. The results of our studies show that the combined ranking of application descriptions and API documents yields the most-relevant search results. We released Exemplar and our case study data to the public.

foundations of software engineering | 2007

Recovering and using use-case-diagram-to-source-code traceability links

Mark Grechanik; Kathryn S. McKinley; Dewayne E. Perry

Use case diagrams (UCDs) are widely used to describe requirements and desired functionality of software products. However, UCDs are loosely linked to source code, and maintaining traces between the source code and elements of UCDs is a manual, tedious, and laborious process. These traces help programmers to understand code that they maintain and evolve. Our contribution is twofold. First, we offer a novel approach for automating part of the process of recovering traceability links (TLs) between types and variables in Java programs and elements of UCDs. We evaluate our prototype implementation on open-source and commercial software, and the results suggest that our approach can recover many TLs with a high degree of automation and precision. Second, we developed an Eclipse plugin that enables programmers to trace program types and variables to elements of UCDs and vice versa using recovered TLs. We conducted a case study that shows that programmers could maintain and evolve software more efficiently with our plugin. These results demonstrate that modest programmer effort to create TLs together with automated program mining and analysis is a promising approach than can increase program understanding while reducing programmer burden.

international symposium on software reliability engineering | 2010

Is Data Privacy Always Good for Software Testing

Mark Grechanik; Christoph Csallner; Chen Fu; Qing Xie

Database-centric applications (DCAs) are common in enterprise computing, and they use nontrivial databases. Testing of DCAs is increasingly outsourced to test centers in order to achieve lower cost and higher quality. When releasing proprietary DCAs, its databases should also be made available to test engineers, so that they can test using real data. Testing with real data is important, since fake data lacks many of the intricate semantic connections among the original data elements. However, different data privacy laws prevent organizations from sharing these data with test centers because databases contain sensitive information. Currently, testing is performed with fake data that often leads to worse code coverage and fewer uncovered bugs, thereby reducing the quality of DCAs and obliterating benefits of test outsourcing. We show that a popular data anonymization algorithm called k-anonymity seriously degrades test coverage of DCAs. We propose an approach that uses program analysis to guide selective application of k-anonymity. This approach helps protect sensitive data in databases while retaining testing efficacy. Our results show that for small values of k = 7, test coverage drops to less than 30% from the original coverage of more than 70%, thus making it difficult to achieve good quality when testing DCAs while applying data privacy.

Empirical Software Engineering | 2014

On using machine learning to automatically classify software applications into domain categories

Mario Linares-Vásquez; Collin McMillan; Denys Poshyvanyk; Mark Grechanik

Software repositories hold applications that are often categorized to improve the effectiveness of various maintenance tasks. Properly categorized applications allow stakeholders to identify requirements related to their applications and predict maintenance problems in software projects. Manual categorization is expensive, tedious, and laborious – this is why automatic categorization approaches are gaining widespread importance. Unfortunately, for different legal and organizational reasons, the applications’ source code is often not available, thus making it difficult to automatically categorize these applications. In this paper, we propose a novel approach in which we use Application Programming Interface (API) calls from third-party libraries for automatic categorization of software applications that use these API calls. Our approach is general since it enables different categorization algorithms to be applied to repositories that contain both source code and bytecode of applications, since API calls can be extracted from both the source code and byte-code. We compare our approach to a state-of-the-art approach that uses machine learning algorithms for software categorization, and conduct experiments on two large Java repositories: an open-source repository containing 3,286 projects and a closed-source repository with 745 applications, where the source code was not available. Our contribution is twofold: we propose a new approach that makes it possible to categorize software projects without any source code using a small number of API calls as attributes, and furthermore we carried out a comprehensive empirical evaluation of automatic categorization approaches.

Explore More