Is this you? Create Your Porfile

Hao Zhong

Shanghai Jiao Tong University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hao Zhong is active.

Explore More

Publication

Featured researches published by Hao Zhong.

automated software engineering | 2009

Inferring Resource Specifications from Natural Language API Documentation

Hao Zhong; Lu Zhang; Tao Xie; Hong Mei

Typically, software libraries provide API documentation, through which developers can learn how to use libraries correctly. However, developers may still write code inconsistent with API documentation and thus introduce bugs, as existing research shows that many developers are reluctant to carefully read API documentation. To find those bugs, researchers have proposed various detection approaches based on known specifications. To mine specifications, many approaches have been proposed, and most of them rely on existing client code. Consequently, these mining approaches would fail to mine specifications when client code is not available. In this paper, we propose an approach, called Doc2Spec, that infers resource specifications from API documentation. For our approach, we implemented a tool and conducted an evaluation on Javadocs of five libraries. The results show that our approach infers various specifications with relatively high precisions, recalls, and F-scores. We further evaluated the usefulness of inferred specifications through detecting bugs in open source projects. The results show that specifications inferred by Doc2Spec are useful to detect real bugs in existing projects.

international conference on software engineering | 2012

Inferring method specifications from natural language API descriptions

Rahul Pandita; Xusheng Xiao; Hao Zhong; Tao Xie; Stephen Oney; Amit M. Paradkar

Application Programming Interface (API) documents are a typical way of describing legal usage of reusable software libraries, thus facilitating software reuse. However, even with such documents, developers often overlook some documents and build software systems that are inconsistent with the legal usage of those libraries. Existing software verification tools require formal specifications (such as code contracts), and therefore cannot directly verify the legal usage described in natural language text in API documents against code using that library. However, in practice, most libraries do not come with formal specifications, thus hindering tool-based verification. To address this issue, we propose a novel approach to infer formal specifications from natural language text of API documents. Our evaluation results show that our approach achieves an average of 92% precision and 93% recall in identifying sentences that describe code contracts from more than 2500 sentences of API documents. Furthermore, our results show that our approach has an average 83% accuracy in inferring specifications from over 1600 sentences describing code contracts.

international conference on software engineering | 2010

Mining API mapping for language migration

Hao Zhong; Suresh Thummalapenta; Tao Xie; Lu Zhang; Qing Wang

To address business requirements and to survive in competing markets, companies or open source organizations often have to release different versions of their projects in different languages. Manually migrating projects from one language to another (such as from Java to C#) is a tedious and error-prone task. To reduce manual effort or human errors, tools can be developed for automatic migration of projects from one language to another. However, these tools require the knowledge of how Application Programming Interfaces (APIs) of one language are mapped to APIs of the other language, referred to as API mapping relations. In this paper, we propose a novel approach, called MAM (Mining API Mapping), that mines API mapping relations from one language to another using API client code. MAM accepts a set of projects each with two versions in two languages and mines API mapping relations between those two languages based on how APIs are used by the two versions. These mined API mapping relations assist in migration of projects from one language to another. We implemented a tool and conducted two evaluations to show the effectiveness of MAM. The results show that our tool mines 25,805 unique mapping relations of APIs between Java and C# with more than 80% accuracy. The results also show that mined API mapping relations help reduce 54.4% compilation errors and 43.0% defects during migration of projects with an existing migration tool, called Java2CSharp. The reduction in compilation errors and defects is due to our new mined mapping relations that are not available with the existing migration tool.

international conference on software engineering | 2006

An experimental comparison of four test suite reduction techniques

Hao Zhong; Lu Zhang; Hong Mei

As a test suite usually contains redundancy, a subset of the test suite (representative set) may still satisfy all the test objectives. As the redundancy increases the cost of executing the test suite, many test suite reduction techniques have been brought out in spite of the NP-completeness of the general problem of finding the optimal representative set of the test suite. In the literature, some experimental studies of test suite reduction techniques have already been reported, but there are still shortcomings of the studies of these techniques. This paper presents an experimental comparison of the four typical test suite reduction techniques: heuristic H, heuristic GRE, genetic algorithm-based approach and ILP-based approach. The aim of the study is to provide a guideline for choosing the appropriate test suite reduction techniques.

international conference on software engineering | 2015

An empirical study on real bug fixes

Hao Zhong; Zhendong Su

Software bugs can cause significant financial loss and even the loss of human lives. To reduce such loss, developers devote substantial efforts to fixing bugs, which generally requires much expertise and experience. Various approaches have been proposed to aid debugging. An interesting recent research direction is automatic program repair, which achieves promising results, and attracts much academic and industrial attention. However, people also cast doubt on the effectiveness and promise of this direction. A key criticism is to what extent such approaches can fix real bugs. As only research prototypes for these approaches are available, it is infeasible to address the criticism by evaluating them directly on real bugs. Instead, in this paper, we design and develop BUGSTAT, a tool that extracts and analyzes bug fixes. With BUGSTATs support, we conduct an empirical study on more than 9,000 real-world bug fixes from six popular Java projects. Comparing the nature of manual fixes with automatic program repair, we distill 15 findings, which are further summarized into four insights on the two key ingredients of automatic program repair: fault localization and faulty code fix. In addition, we provide indirect evidence on the size of the search space to fix real bugs and find that bugs may also reside in non-source files. Our results provide useful guidance and insights for improving the state-of-the-art of automatic program repair.

fundamental approaches to software engineering | 2011

An empirical study on evolution of API documentation

Lin Shi; Hao Zhong; Tao Xie; Mingshu Li

With the evolution of an API library, its documentation also evolves. The evolution of API documentation is common knowledge for programmers and library developers, but not in a quantitative form. Without such quantitative knowledge, programmers may neglect important revisions of API documentation, and library developers may not effectively improve API documentation based on its revision histories. There is a strong need to conduct a quantitative study on API documentation evolution. However, as API documentation is large in size and revisions can be complicated, it is quite challenging to conduct such a study. In this paper, we present an analysis methodology to analyze the evolution of API documentation. Based on the methodology, we conduct a quantitative study on API documentation evolution of five widely used real-world libraries. The results reveal various valuable findings, and these findings allow programmers and library developers to better understand API documentation evolution.

conference on object oriented programming systems languages and applications | 2013

Detecting API documentation errors

Hao Zhong; Zhendong Su

When programmers encounter an unfamiliar API library, they often need to refer to its documentations, tutorials, or discussions on development forums to learn its proper usage. These API documents contain valuable information, but may also mislead programmers as they may contain errors (e.g., broken code names and obsolete code samples). Although most API documents are actively maintained and updated, studies show that many new and latent errors do exist. It is tedious and error-prone to find such errors manually as API documents can be enormous with thousands of pages. Existing tools are ineffective in locating documentation errors because traditional natural language (NL) tools do not understand code names and code samples, and traditional code analysis tools do not understand NL sentences. In this paper, we propose the first approach, DOCREF, specifically designed and developed to detect API documentation errors. We formulate a class of inconsistencies to indicate potential documentation errors, and combine NL and code analysis techniques to detect and report such inconsistencies. We have implemented DOCREF and evaluated its effectiveness on the latest documentations of five widely-used API libraries. DOCREF has detected more than 1,000 new documentation errors, which we have reported to the authors. Many of the errors have already been confirmed and fixed, after we reported them.

Knowledge and Information Systems | 2010

Efficient monitoring of skyline queries over distributed data streams

Shengli Sun; Zhenghua Huang; Hao Zhong; Dongbo Dai; Hongbin Liu; Jinjiu Li

Data management and data mining over distributed data streams have received considerable attention within the database community recently. This paper is the first work to address skyline queries over distributed data streams, where streams derive from multiple horizontally split data sources. Skyline query returns a set of interesting objects which are not dominated by any other objects within the base dataset. Previous work is concentrated on skyline computations over static data or centralized data streams. We present an efficient and an effective algorithm called BOCS to handle this issue under a more challenging environment of distributed streams. BOCS consists of an efficient centralized algorithm GridSky and an associated communication protocol. Based on the strategy of progressive refinement in BOCS, the skyline is incrementally computed by two phases. In the first phase, local skylines on remote sites are maintained by GridSky. At each time, only skyline increments on remote sites are sent to the coordinator. In the second phase, a global skyline is obtained by integrating remote increments with the latest global skyline. A theoretical analysis shows that BOCS is communication-optimal among all algorithms which use a share-nothing strategy. Extensive experiments demonstrate that our proposals are efficient, scalable, and stable.

international conference on software maintenance | 2005

Eliminating harmful redundancy for testing-based fault localization using test suite reduction: an experimental study

Dan Hao; Lu Zhang; Hao Zhong; Hong Mei; Jiasu Sun

In the process of software maintenance, it is usually a time-consuming task to track down bugs. To reduce the cost on debugging, several approaches have been proposed to localize the fault(s) to facilitate debugging. Intuitively, testing-based fault localization (TBFL), such as dicing and TRANTULA, is quite promising as it can take the advantage of a large set of execution traces at the same time. However, redundant test cases may bias the distribution of the test suite and harm this kind of approaches. Therefore, we suggest that the test suite, which is the input of TBFL, should be reduced before used in TBFL. To evaluate whether and to what extent TBFL can benefit from test suite reduction, we performed an experimental study on two source programs. The experimental results show that, for test suites containing unevenly distributed redundant test cases, performing test suite reduction before applying TBFL may be more advantageous.

automated software engineering | 2011

Inferring specifications for resources from natural language API documentation

Hao Zhong; Lu Zhang; Tao Xie; Hong Mei

Many software libraries, especially those commercial ones, provide API documentation in natural languages to describe correct API usages. However, developers may still write code that is inconsistent with API documentation, partially because many developers are reluctant to carefully read API documentation as shown by existing research. As these inconsistencies may indicate defects, researchers have proposed various detection approaches, and these approaches need many known specifications. As it is tedious to write specifications manually for all APIs, various approaches have been proposed to mine specifications automatically. In the literature, most existing mining approaches rely on analyzing client code, so these mining approaches would fail to mine specifications when client code is not sufficient. Instead of analyzing client code, we propose an approach, called Doc2Spec, that infers resource specifications from API documentation in natural languages. We evaluated our approach on the Javadocs of five libraries. The results show that our approach performs well on real scale libraries, and infers various specifications with relatively high precisions, recalls, and F-scores. We further used inferred specifications to detect defects in open source projects. The results show that specifications inferred by Doc2Spec are useful to detect real defects in existing projects.

Explore More