Sushil Krishna Bajracharya

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sushil Krishna Bajracharya is active.

Explore More

Publication

Featured researches published by Sushil Krishna Bajracharya.

conference on object-oriented programming systems, languages, and applications | 2006

Sourcerer: a search engine for open source code supporting structure-based search

Sushil Krishna Bajracharya; Trung Chi Ngo; Erik Linstead; Yimeng Dou; Paul Rigor; Pierre Baldi; Cristina Videira Lopes

We present Sourcerer, a search engine for open-source code. Sourcerer extracts fine-grained structural information from the code and stores it in a relational model. This information is used to implement a basic notion of CodeRank and to enable search forms that go beyond conventional keyword-based searches.

conference on object-oriented programming systems, languages, and applications | 2008

A theory of aspects as latent topics

Pierre Baldi; Cristina Videira Lopes; Erik Linstead; Sushil Krishna Bajracharya

After more than 10 years, Aspect-Oriented Programming (AOP) is still a controversial idea. While the concept of aspects appeals to everyones intuitions, concrete AOP solutions often fail to convince researchers and practitioners alike. This discrepancy results in part from a lack of an adequate theory of aspects, which in turn leads to the development of AOP solutions that are useful in limited situations. We propose a new theory of aspects that can be summarized as follows: concerns are latent topics that can be automatically extracted using statistical topic modeling techniques adapted to software. Software scattering and tangling can be measured precisely by the entropies of the underlying topic-over-files and files-over-topics distributions. Aspects are latent topics with high scattering entropy. The theory is validated empirically on both the large scale, with a study of 4,632 Java projects, and the small scale, with a study of 5 individual projects. From these analyses, we identify two dozen topics that emerge as general-purpose aspects across multiple projects, as well as project-specific topics/concerns. The approach is also shown to produce results that are compatible with previous methods for identifying aspects, and also extends them. Our work provides not only a concrete approach for identifying aspects at several scales in an unsupervised manner but, more importantly, a formulation of AOP grounded in information theory. The understanding of aspects under this new perspective makes additional progress toward the design of models and tools that facilitate software development.

aspect-oriented software development | 2005

An analysis of modularity in aspect oriented design

Cristina Videira Lopes; Sushil Krishna Bajracharya

We present an analysis of modularity in aspect oriented design using the theory of modular design developed by Baldwin and Clark [10]. We use the three major elements of that theory, namely: i) Design Structure Matrix (DSM), an analysis and modeling tool; ii) Modular Operators, units of variations for design evolution; and iii) Net Options Value (NOV), a quantitative approach to evaluate design. We study the design evolution of a Web Services application where we observe the effects of applying aspect oriented modularization.Based on our analysis we get to the following three main conclusions. First, on the structural part, it is possible to apply the DSM to aspect oriented modularizations in a straightforward manner, i.e. without modifications to DSMs basic model. This shows that aspects can, in fact, be treated as modules of design. Second, the evolution of a design into including aspect modules uses the modular operators proposed by Baldwin and Clark, with a variant of the Inversion operator. This variant captures taking redundant, scattered information hidden in modules and moving it down or keeping it at the same level in the design hierarchy. Third, when calculating and comparing NOVs of the different designs of our application, we obtained higher NOV for the design with aspects than for the design without aspects. This shows that, under this theory of modularity, certain aspect oriented modularizations can add value to the design.

foundations of software engineering | 2010

Leveraging usage similarity for effective retrieval of examples in code repositories

Sushil Krishna Bajracharya; Joel Ossher; Cristina Videira Lopes

Developers often learn to use APIs (Application Programming Interfaces) by looking at existing examples of API usage. Code repositories contain many instances of such usage of APIs. However, conventional information retrieval techniques fail to perform well in retrieving API usage examples from code repositories. This paper presents Structural Semantic Indexing (SSI), a technique to associate words to source code entities based on similarities of API usage. The heuristic behind this technique is that entities (classes, methods, etc.) that show similar uses of APIs are semantically related because they do similar things. We evaluate the effectiveness of SSI in code retrieval by comparing three SSI based retrieval schemes with two conventional baseline schemes. We evaluate the performance of the retrieval schemes by running a set of 20 candidate queries against a repository containing 222,397 source code entities from 346 jars belonging to the Eclipse framework. The results of the evaluation show that SSI is effective in improving the retrieval of examples in code repositories.

automated software engineering | 2007

Mining concepts from code with probabilistic topic models

Erik Linstead; Paul Rigor; Sushil Krishna Bajracharya; Cristina Videira Lopes; Pierre Baldi

We develop and apply statistical topic models to software as a means of extracting concepts from source code. The effectiveness of the technique is demonstrated on 1,555 projects from SourceForge and Apache consisting of 113,000 files and 19 million lines of code. In addition to providing an automated, unsupervised, solution to the problem of summarizing program functionality, the approach provides a probabilistic framework with which to analyze and visualize source file similarity. Finally, we introduce an information-theoretic approach for computing tangling and scattering of extracted concepts, and present preliminary results

Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation | 2009

Sourcerer: An internet-scale software repository

Sushil Krishna Bajracharya; Joel Ossher; Cristina Videira Lopes

Vast quantities of open source code are now available online, presenting a great potential resource for software developers. Yet the current generation of open source code search engines fail to take advantage of the rich structural information contained in the code they index. We have developed Sourcerer, an infrastructure for large-scale indexing and analysis of open source code. By taking full advantage of this structural information, Sourcerer provides a foundation upon which state of the art search engines and related tools easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer.

mining software repositories | 2007

Mining Eclipse Developer Contributions via Author-Topic Models

Erik Linstead; Paul Rigor; Sushil Krishna Bajracharya; Cristina Videira Lopes; Pierre Baldi

We present the results of applying statistical author-topic models to a subset of the Eclipse 3.0 source code consisting of 2,119 source files and 700,000 lines of code from 59 developers. This technique provides an intuitive and automated framework with which to mine developer contributions and competencies from a given code base while simultaneously extracting software function in the form of topics. In addition to serving as a convenient summary for program function and developer activities, our study shows that topic models provide a meaningful, effective, and statistical basis for developer similarity analysis.

Science of Computer Programming | 2014

Sourcerer: An infrastructure for large-scale collection and analysis of open-source code

Sushil Krishna Bajracharya; Joel Ossher; Cristina Videira Lopes

A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-of-the-art software engineering tools, as these tools often require access to both the structural and textual information available in source code. We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-of-the-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer.

automated software engineering | 2007

CodeGenie: using test-cases to search and reuse source code

Otávio Augusto Lazzarini Lemos; Sushil Krishna Bajracharya; Joel Ossher; Ricardo Santos Morla; Paulo Cesar Masiero; Pierre Baldi; Cristina Videira Lopes

We present CodeGenie, a tool that implements a test-driven approachto search and reuse of code available on large-scale coderepositories. While using CodeGenie developers design test cases fora desired feature first, similar to Test-driven Development (TDD).However, instead of implementing the feature as in TDD, CodeGenieautomatically searches for it based on information available in thetests. To check the suitability of the candidate results in thelocal context, each result is automatically woven into thedevelopers project and tested using the original tests. Thedeveloper can then reuse the most suitable result. Later, reusedcode can also be unwoven from the project as wished. For the codesearching and wrapping facilities, CodeGenie relies on Sourcerer, anInternet-scale source code infrastructure that we have developed

Empirical Software Engineering | 2012

Analyzing and mining a code search engine usage log

Sushil Krishna Bajracharya; Cristina Videira Lopes

This paper presents an analysis of a year long usage log of Koders, the first commercially available Internet-Scale code search engine (http://www.koders.com). The usage log comprises about ten million activities from more than three million users. Analysis of the usage data shows that despite of attracting a large number of visitors, Koders has a very sparse usage and that it lacks regular usage from many of its users. When compared to Web search, search behavior in Koders showed many similar patterns. A topic modeling analysis of the usage data shows what topics users of Koders are looking for. Observations on the prevalence of these topics among the users, and observations on how search and download activities vary across topics, lead to the conclusion that users who find code search engines usable are those who already know to a high level of specificity what to look for. This paper also presents a general categorization of these topics that provides insights on the different ways code search engine users express their queries. It identifies various forms of queries in Koders’s log and the kinds of results addressed by the queries. It also provides several suggestions for improvements in code search engines based on the analysis of usage, topics, and query forms. The work presented in this paper is the first of its kind that reveals several insights on the usage of an Internet-Scale code search engine.

Explore More