Erik Linstead | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erik Linstead is active.

Explore More

Publication

Featured researches published by Erik Linstead.

conference on object-oriented programming systems, languages, and applications | 2006

Sourcerer: a search engine for open source code supporting structure-based search

Sushil Krishna Bajracharya; Trung Chi Ngo; Erik Linstead; Yimeng Dou; Paul Rigor; Pierre Baldi; Cristina Videira Lopes

We present Sourcerer, a search engine for open-source code. Sourcerer extracts fine-grained structural information from the code and stores it in a relational model. This information is used to implement a basic notion of CodeRank and to enable search forms that go beyond conventional keyword-based searches.

conference on object-oriented programming systems, languages, and applications | 2008

A theory of aspects as latent topics

Pierre Baldi; Cristina Videira Lopes; Erik Linstead; Sushil Krishna Bajracharya

After more than 10 years, Aspect-Oriented Programming (AOP) is still a controversial idea. While the concept of aspects appeals to everyones intuitions, concrete AOP solutions often fail to convince researchers and practitioners alike. This discrepancy results in part from a lack of an adequate theory of aspects, which in turn leads to the development of AOP solutions that are useful in limited situations. We propose a new theory of aspects that can be summarized as follows: concerns are latent topics that can be automatically extracted using statistical topic modeling techniques adapted to software. Software scattering and tangling can be measured precisely by the entropies of the underlying topic-over-files and files-over-topics distributions. Aspects are latent topics with high scattering entropy. The theory is validated empirically on both the large scale, with a study of 4,632 Java projects, and the small scale, with a study of 5 individual projects. From these analyses, we identify two dozen topics that emerge as general-purpose aspects across multiple projects, as well as project-specific topics/concerns. The approach is also shown to produce results that are compatible with previous methods for identifying aspects, and also extends them. Our work provides not only a concrete approach for identifying aspects at several scales in an unsupervised manner but, more importantly, a formulation of AOP grounded in information theory. The understanding of aspects under this new perspective makes additional progress toward the design of models and tools that facilitate software development.

Bioinformatics | 2007

ChemDB update—full-text search and virtual chemical space

Jonathan H. Chen; Erik Linstead; S. Joshua Swamidass; Dennis Ding-Hwa Wang; Pierre Baldi

UNLABELLED ChemDB is a chemical database containing nearly 5M commercially available small molecules, important for use as synthetic building blocks, probes in systems biology and as leads for the discovery of drugs and other useful compounds. The data is publicly available over the web for download and for targeted searches using a variety of powerful methods. The chemical data includes predicted or experimentally determined physicochemical properties, such as 3D structure, melting temperature and solubility. Recent developments include optimization of chemical structure (and substructure) retrieval algorithms, enabling full database searches in less than a second. A text-based search engine allows efficient searching of compounds based on over 65M annotations from over 150 vendors. When searching for chemicals by name, fuzzy text matching capabilities yield productive results even when the correct spelling of a chemical name is unknown, taking advantage of both systematic and common names. Finally, built in reaction models enable searches through virtual chemical space, consisting of hypothetical products readily synthesizable from the building blocks in ChemDB. AVAILABILITY ChemDB and Supplementary Materials are available at http://cdb.ics.uci.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

automated software engineering | 2007

Mining concepts from code with probabilistic topic models

Erik Linstead; Paul Rigor; Sushil Krishna Bajracharya; Cristina Videira Lopes; Pierre Baldi

We develop and apply statistical topic models to software as a means of extracting concepts from source code. The effectiveness of the technique is demonstrated on 1,555 projects from SourceForge and Apache consisting of 113,000 files and 19 million lines of code. In addition to providing an automated, unsupervised, solution to the problem of summarizing program functionality, the approach provides a probabilistic framework with which to analyze and visualize source file similarity. Finally, we introduce an information-theoretic approach for computing tangling and scattering of extracted concepts, and present preliminary results

international conference on machine learning and applications | 2008

An Application of Latent Dirichlet Allocation to Analyzing Software Evolution

Erik Linstead; Cristina Videira Lopes; Pierre Baldi

We develop and apply unsupervised statistical topic models, in particular latent Dirichlet allocation, to identify functional components of source code and study their evolution over multiple project versions. We present results for two large, open source Java projects, Eclipse and Argo UML, which are well-known and well-studied within the software mining community. Our results demonstrate the effectiveness of probabilistic topic models in automatically summarizing the temporal dynamics of software concerns, with direct application to project management and program understanding. In addition to detecting the emergence of topics on the release timeline which represent integration points for key source code functionality, our techniques can also be used to pinpoint refactoring events in the underlying software design, as well as to identify general programming concepts whose prevalence is dependent only of the size of the code base to be analyzed. Complete results are available from our supplementary materials website at http://sourcerer.ics.uci.edu/icmla2008/software_evolution.html.

mining software repositories | 2007

Mining Eclipse Developer Contributions via Author-Topic Models

Erik Linstead; Paul Rigor; Sushil Krishna Bajracharya; Cristina Videira Lopes; Pierre Baldi

We present the results of applying statistical author-topic models to a subset of the Eclipse 3.0 source code consisting of 2,119 source files and 700,000 lines of code from 59 developers. This technique provides an intuitive and automated framework with which to mine developer contributions and competencies from a given code base while simultaneously extracting software function in the form of topics. In addition to serving as a convenient summary for program function and developer activities, our study shows that topic models provide a meaningful, effective, and statistical basis for developer similarity analysis.

mining software repositories | 2009

SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects

Joel Ossher; Sushil Krishna Bajracharya; Erik Linstead; Pierre Baldi; Cristina Videira Lopes

Abstract The open source movement has made vast quantities of source code available online for free, providing an extremely large dataset for empirical study and potential resuse. A major difficulty in exploiting this potential fully is that the data are currently scattered between competing source code repositories, none of which are structured for empirical analysis and cross-project comparison. As a result, software researchers and developers are left to compile their own datasets, resulting in duplicated effort and limited results. To address this challenge, we built SourcererDB, an aggregated repository of statically analyzed and cross-linked open source Java projects. SourcererDB contains local snapshots of 2,852 Java projects taken from Sourceforge, Apache and Java.net. These projects are statically analyzed to extract rich structural information, which is then stored in a relational database. References to entities in the 16,058 external jars are resolved and grouped, allowing for cross-project usage information to be accessed easily. This paper describes: (a) the mechanism for resolving and grouping these cross-project references, (b) the structure of and the metamodel for the SourcererDB repository, and (d) end-user dataset access mechanisms. Our goal in building SourcererDB is to provide a rich dataset of source code to facilitate the sharing of extracted data and to encourage reuse and repeatability of experiments.

mining software repositories | 2009

Mining the coherence of GNOME bug reports with statistical topic models

Erik Linstead; Pierre Baldi

We adapt Latent Dirichlet Allocation to the problem of mining bug reports in order to define a new information-theoretic measure of coherence. We then apply our technique to a snapshot of the GNOME Bugzilla database consisting of 431,863 bug reports for multiple software projects. In addition to providing an unsupervised means for modeling report content, our results indicate substantial promise in applying statistical text mining algorithms for estimating bug report quality. Complete results are available from our supplementary materials website at http://sourcerer.ics.uci.edu/msr2009/gnome_coherence.html.

Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation | 2009

Exploring Java software vocabulary: A search and mining perspective

Erik Linstead; Lindsey Hughes; Cristina Videira Lopes; Pierre Baldi

We conduct a large-scale analysis of Java source code vocabulary for 12,151 open source projects from Source-Forge and Apache, a corpus substantially larger than considered previously. Simple statistical analysis demonstrates robust power-law behavior for word count distributions across multiple program entities. We then identify salient vocabulary trends for classes, interfaces, methods, and fields. Our results provide low-level insight into the vocabulary space governing Java software development, with direct application to program comprehension and software search. Supplementary material may be found at: http://sourcerer.ics.uci.edu/suite2009/suite.html.

Behavior Modification | 2017

Intensity and Learning Outcomes in the Treatment of Children With Autism Spectrum Disorder.

Erik Linstead; Dennis R. Dixon; Ryan French; Doreen Granpeesheh; Hilary L. Adams; Rene German; Alva Powell; Elizabeth Stevens; Jonathan Tarbox; Julie Kornack

Ample research has shown that intensive applied behavior analysis (ABA) treatment produces robust outcomes for individuals with autism spectrum disorder (ASD); however, little is known about the relationship between treatment intensity and treatment outcomes. The current study was designed to evaluate this relationship. Participants included 726 children, ages 1.5 to 12 years old, receiving community-based behavioral intervention services. Results indicated a strong relationship between treatment intensity and mastery of learning objectives, where higher treatment intensity predicted greater progress. Specifically, 35% of the variance in mastery of learning objectives was accounted for by treatment hours using standard linear regression, and 60% of variance was accounted for using artificial neural networks. These results add to the existing support for higher intensity treatment for children with ASD.

Explore More