Is this you? Create Your Porfile

Liguo Yu

Indiana University South Bend

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Liguo Yu is active.

Explore More

Publication

Featured researches published by Liguo Yu.

IEEE Transactions on Software Engineering | 2004

Categorization of common coupling and its application to the maintainability of the Linux kernel

Liguo Yu; Stephen R. Schach; Kai Chen; A. Jefferson Offutt

Data coupling between modules, especially common coupling, has long been considered a source of concern in software design, but the issue is somewhat more complicated for products that are comprised of kernel modules together with optional nonkernel modules. This paper presents a refined categorization of common coupling based on definitions and uses between kernel and nonkernel modules and applies the categorization to a case study. Common coupling is usually avoided when possible because of the potential for introducing risky dependencies among software modules. The relative risk of these dependencies is strongly related to the specific definition-use relationships. In a previous paper, we presented results from a longitudinal analysis of multiple versions of the open-source operating system Linux. This paper applies the new common coupling categorization to version 2.4.20 of Linux, counting the number of instances of common coupling between each of the 26 kernel modules and all the other nonkernel modules. We also categorize each coupling in terms of the definition-use relationships. Results show that the Linux kernel contains a large number of common couplings of all types, raising a concern about the long-term maintainability of Linux.

Empirical Software Engineering | 2003

Determining the Distribution of Maintenance Categories: Survey versus Measurement

Stephen R. Schach; Bo Jin; Liguo Yu; Gillian Z. Heller; A. Jefferson Offutt

In 1978, Lientz, Swanson, and Tompkins published the results of a survey on software maintenance. They found that 17.4% of maintenance effort was categorized as corrective in nature, 18.2% as adaptive, 60.3% as perfective, and 4.1% was categorized as other. We refer to this as the “LST” result. We contrast this survey-based result with our empirical results from the analysis of data for the repeated maintenance of three software products: a commercial real-time product, the Linux kernel, and GCC. For all three products and at both levels of granularity we considered, our observed distributions of maintenance categories were statistically very highly significantly different from LST. In particular, corrective maintenance was always more than twice the LST value. For the summed data, the percentage of corrective maintenance was more than three times the LST value. We suggest various explanations for the observed differences, including inaccuracies on the part of the maintenance managers who responded to the LST survey.

mining software repositories | 2007

Mining CVS Repositories to Understand Open-Source Project Developer Roles

Liguo Yu; Srini Ramaswamy

This paper presents a model to represent the interactions of distributed open-source software developers and utilizes data mining techniques to derive developer roles. The model is then applied on case studies of two open-source projects, ORAC-DR and Mediawiki with encouraging results.

Empirical Software Engineering | 2004

Open-Source Change Logs

Kai Chen; Stephen R. Schach; Liguo Yu; A. Jefferson Offutt; Gillian Z. Heller

A recent editorial in Empirical Software Engineering suggested that open-source software projects offer a great deal of data that can be used for experimentation. These data not only include source code, but also artifacts such as defect reports and update logs. A common type of update log that experimenters may wish to investigate is the ChangeLog, which lists changes and the reasons for which they were made. ChangeLog files are created to support the development of software rather than for the needs of researchers, so questions need to be asked about the limitations of using them to support research. This paper presents evidence that the ChangeLog files provided at three open-source web sites were incomplete. We examined at least three ChangeLog files for each of three different open-source software products, namely, GNUJSP, GCC-g++, and Jikes. We developed a method for counting changes that ensures that, as far as possible, each individual ChangeLog entry is treated as a single change. For each ChangeLog file, we compared the actual changes in the source code to the entries in the ChangeLog> file and discovered significant omissions. For example, using our change-counting method, only 35 of the 93 changes in version 1.11 of Jikes appear in the ChangeLog file—that is, over 62% of the changes were not recorded there. The percentage of omissions we found ranged from 3.7 to 78.6%. These are significant omissions that should be taken into account when using ChangeLog files for research. Before using ChangeLog files as a basis for research into the development and maintenance of open-source software, experimenters should carefully check for omissions and inaccuracies.

Journal of Software Maintenance and Evolution: Research and Practice | 2006

Indirectly predicting the maintenance effort of open‐source software

Liguo Yu

An accurate maintenance effort model is essential for a successful software maintenance process. Maintenance effort is usually measured in person-hours used to perform a maintenance task. However, maintenance effort data are usually only available for strictly managed software, such as closed-source software. In other software projects that do not have complete maintenance records, especially some open-source software, there are no direct data for maintenance effort, which precludes the establishment of a maintenance effort model. In this paper, we report a series of studies aimed at presenting a method for indirectly predicting the maintenance effort of open-source software. This report covers two parts of our research. First, we examine the maintenance data from NASA SEL closed-source software projects and identify some software measures that can be used to indirectly represent maintenance effort. Second, based on the findings in the first part, we analyze 121 recent versions of Linux, and use linear regression to construct two indirect maintenance effort models for the Linux project. Our study demonstrates the applicability of this approach to indirectly predicting the maintenance effort and improving the software maintenance process. Copyright

Journal of Systems and Software | 2006

Maintainability of the kernels of open-source operating systems: A comparison of Linux with FreeBSD, NetBSD, and OpenBSD

Liguo Yu; Stephen R. Schach; Kai Chen; Gillian Z. Heller; A. Jefferson Offutt

We compared and contrasted the maintainability of four open-source operating systems: Linux, FreeBSD, NetBSD, and OpenBSD. We used our categorization of common coupling in kernel-based software to highlight future maintenance problems. An unsafe definition is a definition of a global variable that can affect a kernel module if that definition is changed. For each operating system we determined a number of measures, including the number of global variables, the number of instances of global variables in the kernel and overall, as well as the number of unsafe definitions in the kernel and overall. We also computed the value of each our measures per kernel KLOC and per KLOC overall. For every measure and every ratio, Linux compared unfavorably with FreeBSD, NetBSD, and OpenBSD. Accordingly, we are concerned about the future maintainability of Linux.

Quality Technology and Quantitative Management | 2012

Experience in Predicting Fault-Prone Software Modules Using Complexity Metrics

Liguo Yu; Alok Mishra

Abstract Complexity metrics have been intensively studied in predicting fault-prone software modules. However, little work is done in studying how to effectively use the complexity metrics and the prediction models under realistic conditions. In this paper, we present a study showing how to utilize the prediction models generated from existing projects to improve the fault detection on other projects. The binary logistic regression method is used in studying publicly available data of five commercial products. Our study shows (1) models generated using more datasets can improve the prediction accuracy but not the recall rate; (2) lowering the cut-off value can improve the recall rate, but the number of false positives will be increased, which will result in higher maintenance effort. We further suggest that in order to improve model prediction efficiency, the selection of source datasets and the determination of cut-off values should be based on specific properties of a project. So far, there are no general rules that have been found and reported to follow.

It Professional | 2008

Symbiosis and Software Evolvability

Liguo Yu; Srini Ramaswamy; John Bush

As software systems become more pervasive and complex and - at the same time - expensive and difficult to maintain, the R&D community has turned to biological systems to find mechanisms that support system integrity and robustness. In both biology and software development, evolvability has become a research area in its own right. To improve software evolvability, many researchers have studied biological system properties, such as self-organization, modularity, and gene duplication. Our research focuses instead on software ecosystems and symbiosis as a business strategy for multivendor software systems.

international symposium on empirical software engineering | 2005

Measuring the maintainability of open-source software

Liguo Yu; Stephen R. Schach; Kai Chen

An editorial in empirical software engineering suggested that open-source software projects offer a great deal of data that can be used for experimentation. These data include artifacts such as source code and defect reports. In this paper we show that sources of open-source maintenance data, such as defect-tracking systems, change logs, and source code, cannot, in general, be used for measuring maintainability. We further show that approaches such as using defect distributions and the average lag time to fix a defect can be equally unusable. We conclude that, despite the plethora of open-source maintenance data, it is extremely hard to find data for determining the maintainability of open-source software.

Empirical Software Engineering | 2007

Understanding component co-evolution with a study on Linux

Liguo Yu

After a software system has been delivered, it inevitably has to change to remain useful. Evolutionary coupling measures the change dependencies between software components. Reference coupling measures the architecture dependencies between software components. In this paper, we present a method to correlate evolutionary coupling and reference coupling. We study the evolution of 597 consecutive versions of Linux and measure the evolutionary coupling and reference coupling among 12 kernel modules. We compare 12 pairs of evolutionary coupling data and reference coupling data. The results show that linear correlation exists between evolutionary coupling and reference coupling. We conclude that in Linux, the dependencies between software components induced via the system architecture have noticeable effects on kernel module co-evolution.

Explore More