Is this you? Create Your Porfile

Stan Jarzabek

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stan Jarzabek is active.

Explore More

Publication

Featured researches published by Stan Jarzabek.

foundations of software engineering | 2005

Detecting higher-level similarity patterns in programs

Hamid Abdul Basit; Stan Jarzabek

Cloning in software systems is known to create problems during software maintenance. Several techniques have been proposed to detect the same or similar code fragments in software, so-called simple clones. While the knowledge of simple clones is useful, detecting design-level similarities in software could ease maintenance even further, and also help us identify reuse opportunities. We observed that recurring patterns of simple clones - so-called structural clones - often indicate the presence of interesting design-level similarities. An example would be patterns of collaborating classes or components. Finding structural clones that signify potentially useful design information requires efficient techniques to analyze the bulk of simple clone data and making non-trivial inferences based on the abstracted information. In this paper, we describe a practical solution to the problem of detecting some basic, but useful, types of design-level similarities such as groups of highly similar classes or files. First, we detect simple clones by applying conventional token-based techniques. Then we find the patterns of co-occurring clones in different files using the Frequent Itemset Mining (FIM) technique. Finally, we perform file clustering to detect those clusters of highly similar files that are likely to contribute to a design-level similarity pattern. The novelty of our approach is application of data mining techniques to detect design level similarities. Experiments confirmed that our method finds many useful structural clones and scales up to big programs. The paper describes our method for structural clone detection, a prototype tool called Clone Miner that implements the method and experimental results.

IEEE Transactions on Software Engineering | 2009

A Data Mining Approach for Detecting Higher-Level Clones in Software

Hamid Abdul Basit; Stan Jarzabek

Code clones are similar program structures recurring in variant forms in software system(s). Several techniques have been proposed to detect similar code fragments in software, so-called simple clones. Identification and subsequent unification of simple clones is beneficial in software maintenance. Even further gains can be obtained by elevating the level of code clone analysis. We observed that recurring patterns of simple clones often indicate the presence of interesting higher-level similarities that we call structural clones. Structural clones show a bigger picture of similarity situation than simple clones alone. Being logical groups of simple clones, structural clones alleviate the problem of huge number of clones typically reported by simple clone detection tools, a problem that is often dealt with postdetection visualization techniques. Detection of structural clones can help in understanding the design of the system for better maintenance and in reengineering for reuse, among other uses. In this paper, we propose a technique to detect some useful types of structural clones. The novelty of our approach includes the formulation of the structural clone concept and the application of data mining techniques to detect these higher-level similarities. We describe a tool called clone miner that implements our proposed technique. We assess the usefulness and scalability of the proposed techniques via several case studies. We discuss various usage scenarios to demonstrate in what ways the knowledge of structural clones adds value to the analysis based on simple clones alone.

Communications of The ACM | 1998

The case for user-centered CASE tools

Stan Jarzabek; Riri Huang

and using CASE tools in industrial software projects. We also investigated common tool adoption practices. We examined CASE technology from the perspective of technical and non-technical issues involved in software development. Our main conclusion is that current CASE tools are far too oriented on software modeling and construction methods, while other factors that matter to programmers receive little attention. We imagine the creative, problem-solving aspects of software development and perception of a software project from a process, rather than method, perspective. We observed that method-centered CASE tools are not attractive enough to the users. To better meld into the software development practice, CASE tools should adopt a programmer’s mental model of software projects. In particular, CASE tools should support soft aspects of software development as well as rigorous modeling, provide a natural process-oriented development framework rather than a method-oriented one, and play a more active role in software development than current CASE tools. Separation of development methods from other aspects that contribute to successful software projects does not benefit CASE users, whether programmers or project managers. In this article, we analyze problems that impede wide adoption of CASE tools and propose remedies to some of the problems. We direct this article to managers involved in CASE tool adoption, CASE tool users, and CASE tool developers. Although we focus on CASE tools that support software development according to some methods (such as structured analysis/design or an object-oriented method), we believe our observations apply to other types of software tools, too.

foundations of software engineering | 2007

Efficient token based clone detection with flexible tokenization

Hamid Abdul Basit; Simon J. Puglisi; William F. Smyth; Andrew Turpin; Stan Jarzabek

Code clones are similar code fragments that occur at multiple locations in a software system. Detection of code clones provides useful information for maintenance, reengineering, program understanding and reuse. Several techniques have been proposed to detect code clones. These techniques differ in the code representation used for analysis of clones, ranging from plain text to parse trees and program dependence graphs. Clone detection based on lexical tokens involves minimal code transformation and gives good results, but is computationally expensive because of the large number of tokens that need to be compared. We explored string algorithms to find suitable data structures and algorithms for efficient token based clone detection and implemented them in our tool Repeated Tokens Finder (RTF). Instead of using suffix tree for string matching, we use more memory efficient suffix array. RTF incorporates a suffix array based linear time algorithm to detect string matches. It also provides a simple and customizable tokenization mechanism. Initial analysis and experiments show that our clone detection is simple, scalable, and performs better than the previous well-known tools.

international conference on software engineering | 2005

Beyond templates: a study of clones in the STL and some general implications

Hamid Abdul Basit; Damith C. Rajapakse; Stan Jarzabek

Templates (or generics) help us write compact, generic code, which aids both reuse and maintenance. The STL is a powerful example of how templates help achieve these goals. Still, our study of the STL revealed substantial, and in our opinion, counter-productive repetitions (so-called clones) across groups of similar class or function templates. Clones occurred, as variations across these similar program structures were irregular and could not be unified by suitable template parameters in a natural way. We encountered similar problems in other class libraries as well as in application programs, written in a range of programming languages. In the paper, we present quantitative and qualitative results from our study. We argue that the difficulties we encountered affect programs in general. We present a solution that can treat such template-unfriendly cases of redundancies at the meta-level, complementing and extending the power of language features, such as templates, in areas of generic programming.

international world wide web conferences | 2005

An investigation of cloning in web applications

Damith C. Rajapakse; Stan Jarzabek

Cloning (ad hoc reuse by duplication of design or code) speeds up development, but also hinders future maintenance. Cloning also hints at reuse opportunities that, if exploited systematically, might have positive impact on development and maintenance productivity. Unstable requirements and tight schedules pose unique challenges for Web Application engineering that encourage cloning. We are conducting a systematic study of cloning in Web Applications of different sizes, developed using a range of Web technologies, and serving diverse purposes. Our initial results show cloning rates up to 63% in both newly developed and already maintained Web Applications. Expected contribution of this work is two-fold: (1) to confirm potential benefits of reuse-based methods in addressing clone related problems of Web engineering, and (2) to create a framework of metrics and presentation views to be used in other similar studies.

software product lines | 2005

Reuse without compromising performance: industrial experience from RPG software product line for mobile devices

Weishan Zhang; Stan Jarzabek

It is often believed that reusable solutions, being generic, must necessarily compromise performance. In this paper, we consider a family of Role-Playing Games (RPGs). We analyzed similarities and differences among four RPGs. By applying a reuse technique of XVCL, we built an RPG product line architecture (RPG-PLA) from which we could derive any of the four RPGs. We built into the RPG-PLA a number of performance optimization strategies that could benefit any of the four (and possibly other similar) RPGs. By comparing the original vs. the new RPGs derived from the RPG-PLA, we demonstrated that reuse allowed us to achieve improved performance, both speed and memory utilization, as compared to each game developed individually. At the same time, our solution facilitated rapid development of new games, for new mobile devices, as well as ease of evolving with new features the RPG-PLA and custom games already in use.

IEE Proceedings - Software | 2006

Addressing quality attributes in domain analysis for product lines

Stan Jarzabek; Bo Yang; S. Yoeun

Feature-oriented domain analysis (FODA) is a widely accepted domain analysis method for modelling common and variant requirements for product lines. Goal-oriented analysis, on the other hand, focuses on quality attribute (QA) analysis in single system development. To address QAs in the product line context, the authors extended FODA with concepts of goal-oriented analysis. Their integrated modelling framework improves the current state-of-the art of product line research and practice in two ways. Firstly, during the design of a product line architecture, the proposed framework allows developers to record design rationale in the form of interdependencies among variant features and QAs. Secondly, during system construction, the framework helps developers evaluate the impact of variant features selected for a target system on QAs of that system. In this way, developers and customers can come up with realistic overall requirements for the target system early, avoiding possible expensive rework in later stages of the software lifecycle. The proposed QA modelling framework is illustrated with examples from the computer aided dispatch domain.

Requirements Engineering | 2001

XML-based method and tool for handling variant requirements in domain models

Stan Jarzabek; Hongyu Zhang

A domain model describes common and variant requirements for a system family. UML notations used in requirements analysis and software modeling can be extended with variation points to cater for variant requirements. However, UML models for a large single system are already complicated enough. With variants UML domain models soon become too complicated to be useful. The main reasons are the explosion of possible variant combinations, complex dependencies among variants and inability to trace variants from a domain model down to the requirements for a specific system, member of a family. We believe that the above mentioned problems cannot be solved at the domain model description level alone. We propose a novel solution based on a tool that interprets and manipulates domain models to provide analysts with customized, simple domain views. We describe a variant configuration language that allows us to instrument domain models with variation points and record variant dependencies. An interpreter of this language produces customized views of a domain model, helping analysts understand and reuse software models. We describe the concept of our approach and its simple implementation based on XML and XMI technologies.

working conference on reverse engineering | 2012

Feature Location in a Collection of Product Variants

Yinxing Xue; Zhenchang Xing; Stan Jarzabek

Companies often develop and maintain a collection of product variants that share some common features but also support different, customer-specific features. To reengineering such legacy product variants for systematic reuse, one must identify features and their implementing code units (e.g. functions, files) in different product variants. Information retrieval (IR) techniques may be applied for that purpose. In this paper, we discuss problems that hinder direct application of IR techniques to a collection of product variants. To counter these problems, we present an approach to support effective feature location in product variants. The novelty of our approach is that we exploit commonalities and differences of product variants by software differencing and FCA techniques so that IR technique can achieve satisfactory results for feature location in product variants. We have implemented our approach and conducted evaluation with a collection of nine Linux kernel product variants. Our evaluation shows that our approach always significantly outperforms a direct application of IR technique in the subject product variants.

Explore More