Bogdan Dit | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bogdan Dit is active.

Explore More

Publication

Featured researches published by Bogdan Dit.

Journal of Software: Evolution and Process | 2013

Feature location in source code: a taxonomy and survey

Bogdan Dit; Meghan Revelle; Malcom Gethers; Denys Poshyvanyk

Feature location is the activity of identifying an initial location in the source code that implements functionality in a software system. Many feature location techniques have been introduced that automate some or all of this process, and a comprehensive overview of this large body of work would be beneficial to researchers and practitioners. This paper presents a systematic literature survey of feature location techniques. Eighty‐nine articles from 25 venues have been reviewed and classified within the taxonomy in order to organize and structure existing work in the field of feature location. The paper also discusses open issues and defines future directions in the field of feature location. Copyright

international conference on software engineering | 2013

How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms

Annibale Panichella; Bogdan Dit; Massimilano Di Penta; Denys Poshynanyk; Andrea De Lucia

Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar manner as they were used on natural language documents (e.g., using the same settings and parameters) because the underlying assumption was that source code and natural language documents are similar. However, applying topic models on software data using the same settings as for natural language text did not always produce the expected results. Recent research investigated this assumption and showed that source code is much more repetitive and predictable as compared to the natural language text. Our paper builds on this new fundamental finding and proposes a novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks. Our paper introduces a novel solution called LDA-GA, which uses Genetic Algorithms (GA) to determine a near-optimal configuration for LDA in the context of three different SE tasks: (1) traceability link recovery, (2) feature location, and (3) software artifact labeling. The results of our empirical studies demonstrate that LDA-GA is able to identify robust LDA configurations, which lead to a higher accuracy on all the datasets for these SE tasks as compared to previously published results, heuristics, and the results of a combinatorial search.

international conference on software engineering | 2012

Integrated impact analysis for managing software changes

Malcom Gethers; Bogdan Dit; Huzefa H. Kagdi; Denys Poshyvanyk

The paper presents an adaptive approach to perform impact analysis from a given change request to source code. Given a textual change request (e.g., a bug report), a single snapshot (release) of source code, indexed using Latent Semantic Indexing, is used to estimate the impact set. Should additional contextual information be available, the approach configures the best-fit combination to produce an improved impact set. Contextual information includes the execution trace and an initial source code entity verified for change. Combinations of information retrieval, dynamic analysis, and data mining of past source code commits are considered. The research hypothesis is that these combinations help counter the precision or recall deficit of individual techniques and improve the overall accuracy. The tandem operation of the three techniques sets it apart from other related solutions. Automation along with the effective utilization of two key sources of developer knowledge, which are often overlooked in impact analysis at the change request level, is achieved. To validate our approach, we conducted an empirical evaluation on four open source software systems. A benchmark consisting of a number of maintenance issues, such as feature requests and bug fixes, and their associated source code changes was established by manual examination of these systems and their change history. Our results indicate that there are combinations formed from the augmented developer contextual information that show statistically significant improvement over standalone approaches.

international conference on program comprehension | 2010

Using Data Fusion and Web Mining to Support Feature Location in Software

Meghan Revelle; Bogdan Dit; Denys Poshyvanyk

Data fusion is the process of integrating multiple sources of information such that their combination yields better results than if the data sources are used individually. This paper applies the idea of data fusion to feature location, the process of identifying the source code that implements specific functionality in software. A data fusion model for feature location is presented which defines new feature location techniques based on combining information from textual, dynamic, and web mining analyses applied to software. A novel contribution of the proposed model is the use of advanced web mining algorithms to analyze execution information during feature location. The results of an extensive evaluation indicate that the new feature location techniques based on web mining improve the effectiveness of existing approaches by as much as 62%.

Empirical Software Engineering | 2013

Integrating information retrieval, execution and link analysis algorithms to improve feature location in software

Bogdan Dit; Meghan Revelle; Denys Poshyvanyk

Data fusion is the process of integrating multiple sources of information such that their combination yields better results than if the data sources are used individually. This paper applies the idea of data fusion to feature location, the process of identifying the source code that implements specific functionality in software. A data fusion model for feature location is presented which defines new feature location techniques based on combining information from textual, dynamic, and web mining or link analyses algorithms applied to software. A novel contribution of the proposed model is the use of advanced web mining algorithms to analyze execution information during feature location. The results of an extensive evaluation on three Java systems indicate that the new feature location techniques based on web mining improve the effectiveness of existing approaches by as much as 87%.

international conference on program comprehension | 2011

Can Better Identifier Splitting Techniques Help Feature Location

Bogdan Dit; Latifa Guerrouj; Denys Poshyvanyk; Giuliano Antoniol

The paper presents an exploratory study of two feature location techniques utilizing three strategies for splitting identifiers: Camel Case, Samurai and manual splitting of identifiers. The main research question that we ask in this study is if we had a perfect technique for splitting identifiers, would it still help improve accuracy of feature location techniques applied in different scenarios and settings? In order to answer this research question we investigate two feature location techniques, one based on Information Retrieval and the other one based on the combination of Information Retrieval and dynamic analysis, for locating bugs and features using various configurations of preprocessing strategies on two open-source systems, Rhino and jEdit. The results of an extensive empirical evaluation reveal that feature location techniques using Information Retrieval can benefit from better preprocessing algorithms in some cases, and that their improvement in effectiveness while using manual splitting over state-of-the-art approaches is statistically significant in those cases. However, the results for feature location technique using the combination of Information Retrieval and dynamic analysis do not show any improvement while using manual splitting, indicating that any preprocessing technique will suffice if execution data is available. Overall, our findings outline potential benefits of putting additional research efforts into defining more sophisticated source code preprocessing techniques as they can still be useful in situations where execution information cannot be easily collected.

international conference on software maintenance | 2010

Topic XP : Exploring topics in source code using Latent Dirichlet Allocation

Trevor Savage; Bogdan Dit; Malcom Gethers; Denys Poshyvanyk

Acquiring general understanding of large software systems and components from which they are built can be a time consuming task, but having such an understanding is an important prerequisite to adding features or fixing bugs. In this paper we propose the tool, namely TopicXP, to support developers during such software maintenance tasks by extracting and analyzing unstructured information in source code identifier names and comments using Latent Dirichlet Allocation. TopicXP enables developers to gain an overview of a software system under analysis by extracting and visualizing natural language topics, which generally correspond to concepts or features implemented in software classes. TopicXP is implemented as an open-source Eclipse plug-in, which proposes interactive visualization of topics along with structural dependencies between underlying classes implementing these topics. The paper also presents the results of a preliminary user study aimed at evaluating TopicXP.

automated software engineering | 2011

An adaptive approach to impact analysis from change requests to source code

Malcom Gethers; Huzefa H. Kagdi; Bogdan Dit; Denys Poshyvanyk

The paper presents an adaptive approach to perform impact analysis from a given change request (e.g., a bug report) to source code. Given a textual change request, a single snapshot (release) of source code, indexed using Latent Semantic Indexing, is used to estimate the impact set. Additionally, the approach configures the best-fit combination of information retrieval, dynamic analysis, and data mining of past source code commits to produce an improved impact set. The tandem operation of the three techniques sets it apart from other related solutions.

mining software repositories | 2013

An exploratory analysis of mobile development issues using stack overflow

Mario Linares-Vásquez; Bogdan Dit; Denys Poshyvanyk

Question & answer (Q&A) websites, such as Stack Overflow (SO), are widely used by developers to find and provide answers to technical issues and concerns in software development. Mobile development is not an exception to the rule. In the latest SO dump, more than 400K questions were labeled with tags related to mobile technologies. Although, previous works have analyzed the main topics and trends in SO threads, there are no studies devoted specifically to mobile development. In this paper we used topic modeling techniques to extract hot-topics from mobile-development related questions. Our findings suggest that most of the questions include topics related to general questions and compatibility issues, and the most specific topics, such as crash reports and database connection, are present in a reduced set of questions.

Proceedings of the 6th International Workshop on Traceability in Emerging Forms of Software Engineering | 2011

Traceclipse: an eclipse plug-in for traceability link recovery and management

Samuel Klock; Malcom Gethers; Bogdan Dit; Denys Poshyvanyk

Traceability link recovery is an active research area in software engineering with a number of open research questions and challenges, due to the substantial costs and challenges associated with software maintenance. We propose Traceclipse, an Eclipse plug-in that integrates some similar characteristics of traceability link recovery techniques in one easy-to-use suite. The tool enables software developers to specify, view, and manipulate traceability links within Eclipse and it provides an API through which recovery techniques may be added, specified, and run within an integrated development environment. The paper also presents initial case studies aimed at evaluating the proposed plug-in.

Explore More