Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Scott Grant is active.

Publication


Featured researches published by Scott Grant.


working conference on reverse engineering | 2008

Automated Concept Location Using Independent Component Analysis

Scott Grant; James R. Cordy; David B. Skillicorn

Concept location techniques are designed to help isolate sections of source code that relate to specific concepts. Blind Signal Separation techniques like Singular Value Decomposition and Latent Semantic Indexing can be used as a way to identify related sections of source code. This paper explores a related technique called Independent Component Analysis that has the added benefit of identifying statistically independent signals in text, as opposed to ones that are just decorrelated. We describe a tool that we have developed to explore how ICA performs when analysing source code, and show how the technique can be used to perform unsupervised concept location.


mining software repositories | 2013

Encouraging user behaviour with achievements: An empirical study

Scott Grant; Buddy Betts

Stack Overflow, a question and answer Web site, uses a reward system called badges to publicly reward users for their contributions to the community. Badges are used alongside a reputation score to reward positive behaviour by relating a users site identity with their perceived expertise and respect in the community. A greater number of badges associated with a user profile in some way indicates a higher level of authority, leading to a natural incentive for users to attempt to achieve as many badges as possible. In this study, we examine the publicly available logs for Stack Overflow to examine three of these badges in detail. We look at the effect of one badge in context on an individual user level and at the global scope of three related badges across all users by mining user behaviour around the time that the badge is awarded. This analysis supports the claim that badges can be used to influence user behaviour by demonstrating one instance of an increase in user activity related to a badge immediately before it is awarded when compared to the period afterwards.


source code analysis and manipulation | 2010

Estimating the Optimal Number of Latent Concepts in Source Code Analysis

Scott Grant; James R. Cordy

The optimal number of latent topics required to model the most accurate latent substructure for a source code corpus is an open question in source code analysis. Most estimates about the number of latent topics that exist in a software corpus are based on the assumption that the data is similar to natural language, but there is little empirical evidence to support this. In order to help determine the appropriate number of topics needed to accurately represent the source code, we generate a series of Latent Dirichlet Allocation models with varying topic counts. We use a heuristic to evaluate the ability of the model to identify related source code blocks, and demonstrate the consequences of choosing too few or too many latent topics.


Science of Computer Programming | 2013

Using Heuristics to Estimate an Appropriate Number of Latent Topics in Source Code Analysis

Scott Grant; James R. Cordy; David B. Skillicorn

Abstract Latent Dirichlet Allocation (LDA) is a data clustering algorithm that performs especially well for text documents. In natural-language applications it automatically finds groups of related words (called “latent topics”) and clusters the documents into sets that are about the same “topic”. LDA has also been applied to source code, where the documents are natural source code units such as methods or classes, and the words are the keywords, operators, and programmer-defined names in the code. The problem of determining a topic count that most appropriately describes a set of source code documents is an open problem. We address this empirically by constructing clusterings with different numbers of topics for a large number of software systems, and then use a pair of measures based on source code locality and topic model similarity to assess how well the topic structure identifies related source code units. Results suggest that the topic count required can be closely approximated using the number of software code fragments in the system. We extend these results to recommend appropriate topic counts for arbitrary software systems based on an analysis of a set of open source systems.


international conference on program comprehension | 2009

Vector space analysis of software clones

Scott Grant; James R. Cordy

In this paper, we introduce a technique for applying Independent Component Analysis to vector space representations of software code fragments such as methods or blocks. The distance between these points can be determined, and used as a measure of the similarity between the original source code fragments they represent. It can be reasoned that if the initial matrix representation contains enough information about the syntactic structure of the source code, the vector space representation will be sufficient to predict the similarity of fragments to one another, and can provide the likelihood that the code is a clone.


conference on software maintenance and reengineering | 2012

Using Topic Models to Support Software Maintenance

Scott Grant; James R. Cordy; David B. Skillicorn

Our recent research has shown that the latent information found by commonly used topic models generally relates to the development history of a software system. While it is not always possible to associate these latent topics with human-oriented concepts, it is demonstrable that they identify historical maintenance relationships in source code. Specifically, when a developer makes a change to a software project, it is common for a significant part of that change to relate to a single latent topic. A significant conclusion can be drawn from this: latent topic models identify co-maintenance relationships with no supervision, and therefore topic models can be used to support the maintenance phase of software development.


working conference on reverse engineering | 2011

Reverse Engineering Co-maintenance Relationships Using Conceptual Analysis of Source Code

Scott Grant; James R. Cordy; David B. Skillicorn

In this work, we explore the relationship between topic models and co-maintenance history by introducing a visualization that compares conceptual cohesion within change lists. We explain how this view of the project history can give insight about the semantic architecture of the code, and we identify a number of patterns that characterize particular kinds of maintenance tasks. We examine the relationship between co-maintenance history and concept location, and visualize the distribution of changes across concepts to show how these techniques can be used to predict co-maintenance of source code methods.


symposium on web systems evolution | 2011

Contextualized semantic analysis of web services

Scott Grant; Douglas H. Martin; James R. Cordy; David B. Skillicorn

The poor locality of operation descriptions expressed in the Web Service Description Language (WSDL) makes them difficult to analyze and compare in web service discovery tasks. This problem has led us to develop a new method for service operation comparison involving contextualizing operation descriptions by inlining related type information from other sections of the service description. In this paper, we show that this contextualization of web service descriptions can enable topic models (statistical techniques for identifying relationships) to produce semantically meaningful results that can be used to reverse engineer service-oriented web systems and automatically identify related web service operations. Specifically, we model contextualized WSDL service operations using Latent Dirichlet Allocation, and show how this approach can be used to more accurately find similar web service operations.


conference on software maintenance and reengineering | 2014

Examining the relationship between topic model similarity and software maintenance

Scott Grant; James R. Cordy

Software maintenance is the last phase of software development, and typically one of the most time-consuming. One reason for this is the difficulty in finding related source code fragments. A high-level understanding of the source code is necessary to make decisions about which source code fragments should be modified together, for example, in the context of fixing a bug. Even with a similarity metric available, understanding what it means to measure similarity in the first place is important; if a technique suggests that two source code fragments are related, is there a human-oriented way of explaining that relation? In this paper, we attempt to identify a concrete link between software maintenance and the similarity metrics provided by latent topic models. We show that similarity in topic models is related to the likelihood that source code fragments will be modified together in the future, and that an awareness of similar source code can make software maintenance easier.


Archive | 2007

Topic Detection Using Independent Component Analysis

Scott Grant; David B. Skillicorn; James R. Cordy

Collaboration


Dive into the Scott Grant's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge