ArXiv | 2021

An Experimental Analysis of Graph-Distance Algorithms for Comparing API Usages

 
 
 
 

Abstract


Modern software development heavily relies on the reuse of functionalities through Application Programming Interfaces (APIs). However, client developers can have issues identifying the correct usage of a certain API, causing misuses accompanied by software crashes or usability bugs. Therefore, researchers have aimed at identifying API misuses automatically by comparing client code usages to correct API usages. Some techniques rely on certain API-specific graph-based data structures to improve the abstract representation of API usages. Such techniques need to compare graphs, for instance, by computing distance metrics based on the minimal graph edit distance or the largest common subgraphs, whose computations are known to be NP-hard problems. Fortunately, there exist many abstractions for simplifying graph distance computation. However, their applicability for comparing graph representations of API usages has not been analyzed. In this paper, we provide a comparison of different distance algorithms of API-usage graphs regarding correctness and runtime. Particularly, correctness relates to the algorithms’ ability to identify similar correct API usages, but also to discriminate similar correct and false usages as well as non-similar usages. For this purpose, we systematically identified a set of eight graph-based distance algorithms and applied them on two datasets of real-world API usages and misuses. Interestingly, our results suggest that existing distance algorithms are not reliable for comparing API usage graphs. To improve on this situation, we identified and discuss the algorithms’ issues, based on which we formulate hypotheses to initiate research on overcoming them.

Volume abs/2108.12511
Pages None
DOI 10.26226/morressier.613b54401459512fce6a7d05
Language English
Journal ArXiv

Full Text