APIScanner -- Towards Automated Detection of Deprecated APIs in Python Libraries
AAPI
Scanner - Towards Automated Detection ofDeprecated APIs in Python Libraries
Aparna Vadlamani, Rishitha Kalicheti, Sridhar Chimalakonda
Research in Intelligent Software & Human Analytics (RISHA) LabDept. of Computer Science & EngineeringIndian Institute of Technology TirupatiTirupati, India { cs17b005, cs17b014, ch } @iittp.ac.in Abstract —Python libraries are widely used for machine learn-ing and scientific computing tasks today. APIs in Python librariesare deprecated due to feature enhancements and bug fixes inthe same way as in other languages. These deprecated APIs arediscouraged from being used in further software development.Manually detecting and replacing deprecated APIs is a tediousand time-consuming task due to the large number of API callsused in the projects. Moreover, the lack of proper documentationfor these deprecated APIs makes the task challenging. To addressthis challenge, we propose an algorithm and a tool API
Scanner that automatically detects deprecated APIs in Python libraries.This algorithm parses the source code of the libraries usingabstract syntax tree (ASTs) and identifies the deprecated APIsvia decorator , hard-coded warning or comments . API Scanner isa Visual Studio Code Extension that highlights and warns thedeveloper on the use of deprecated API elements while writingthe source code. The tool can help developers to avoid usingdeprecated API elements without the execution of code. We testedour algorithm and tool on six popular Python libraries, which de-tected 838 of 871 deprecated API elements. Demo of API
Scanner :https://youtu.be/1hy ugf-iek. Documentation, tool, and sourcecode can be found here: https://rishitha957.github.io/APIScanner.
Index Terms —Deprecated APIs, Python Libraries, API Evolu-tion, Visual Studio Code Extension
I. I
NTRODUCTION
Python is one of the popular dynamic programming lan-guage that has gained immense popularity due to its extensivecollection of libraries, including popular modules for machinelearning and scientific computing . Due to reasons suchas feature improvements and bug repairs, python librariesare frequently updated. Most API changes include moving methods or fields around and renaming or changing methodsignatures [1]. These changes may induce compatibility is-sues in client projects [2]. It is recommended to follow the deprecate-replace-remove cycle to enable developers to adaptto these changes smoothly[3]. In this process, APIs that are nolonger supported are first labeled as deprecated, and then thedeprecated APIs are replaced with their substitution messagesto help developers transition from deprecated APIs to newones [4]. The deprecated APIs are gradually removed fromthe library in future releases. Unfortunately, this process isnot always followed, as discovered by several studies [5], [6], making it difficult for both library maintainers and developers.Ko et al. have analyzed the quality of documentation forresolving deprecated APIs [7]. Researchers have proposedtechniques to automatically update deprecated APIs [8], [9].However, most of them are for static programming languagessuch as Java , C and Android SDKs . Python being a typicaldynamic programming language, exhibits different API evo-lution patterns compared to
Java [2]. Hence it motivates theneed for new techniques and tools to detect deprecated APIs.Deprecated APIs in Python libraries are mainly declared by decorator , hard-coded warning , and comments [10]. Never-theless, it was discovered that library maintainers use variedand multiple strategies for API deprecation, leading to incon-sistency in the implementation of libraries as well as theirautomated detection [10]. In addition, nearly one-third of thedeprecated APIs in Python is not included in the official librarydocumentation, making it hard for developers using librariesto limit the use of deprecated APIs [10].To avoid the usage of deprecated APIs during new softwaredevelopment, developers should be aware of deprecating APIsin the project, motivating the need for this research. Hence,given the rise in popularity of Python and the number ofdeprecated APIs used in Python projects, we propose a novelalgorithm that uses the source code of the Python librariesto get a list of deprecated APIs. This list is further usedto detect deprecated APIs in Python projects. This papercontributes (i) an algorithm for deprecated API detectionand (ii) a Visual Studio Code extension, API Scanner . Webelieve that API Scanner might assist developers to detectdeprecated APIs and help them avoid searching through APIdocumentation or on forums such as Stack Overflow. As apreliminary evaluation, we tested our algorithm and tool onsix popular Python libraries [11] that are commonly used indata analytics, machine learning, and scientific computing.The initial results are promising with 90% API deprecationdetection, with potential for application beyond these libraries.II. A
PPROACH
Wang et al. [10] investigated that inconsistency in theadopted deprecation strategies makes it a harder task to use https://marketplace.visualstudio.com/items?itemName=Rishitha.apiscanner a r X i v : . [ c s . S E ] M a r ig. 1. Approach for Detecting Deprecated API Elements in Python Libraries automated approaches for managing deprecated APIs and theirdocumentation. In this paper, we propose an approach (asshown in Fig. 1) to automatically detect deprecated APIs inPython libraries and alert developers during API usage insoftware development. Firstly, we identify the libraries used inthe client code from import statements. We build an abstractsyntax tree (AST) to parse the source code to detect thepatterns. The proposed Algorithm 1 is then applied on theASTs to retrieve a list of deprecated APIs in those libraries.Based on this list, API Scanner parses each line of code inthe editor, highlights the deprecated elements in the editor.On hovering, the tool also displays a message informing thedeveloper that some element(s) of this API call has beendeprecated (as shown in Fig. 2). We developed API
Scanner as a Visual Studio Code extension as it supports both Pythonscripts and jupyter notebooks . (a) Using Decorator : in Matplotlib@_api.deprecated("3.3", alternative="Glue(’fil’)")class Fil(Glue):def __init__(self):super().__init__(’fil’)(b)
Using Comments : in Sklearnclass GradientBoostingClassifier(args):"""..criterion : {’friedman_mse’, ’mse’, ’mae’}.... deprecated:: 0.24 ‘criterion=’mae’‘ is deprecated and will be removed inversion 0.26. Use ‘criterion=’friedman_mse’‘ or ‘’mse’‘ instead, as treesshould use a least-square criterion in Gradient Boosting"""(c)
Using Hardcoded Warnings : in Pandasclass Series(args):def __init__(self,args):if dtype is None:warnings.warn("The default dtype for empty Series will be ’object’instead of ’float64’ in a future version",DeprecationWarning,stacklevel=2)
Listing 1. Examples of methods of deprecation strategies adopted in Pythonlibraries which are deprecated through a) decorator , b) comments c) hard-coded warning A. Detecting Deprecated API Elements through Source Code
We parse the source code of the library to generate an ASTand denote it as P AST . Examples of Python APIs deprecated https://jupyter.org/ Fig. 2. Snapshot of API Scanner . The black boxes indicate deprecated APIshighlighted by API
Scanner . The red box indicates the message shown byAPI
Scanner on hovering over the highlighted deprecated APIs. by decorator , hard-coded warnings , and comments are shownin listing 1. Structure of AST helps to realize the relationshipbetween class declaration and function definition with decora-tor , hard-coded warnings , and comments . We traverse througheach node N AST in the AST and generate P AST using Depth-First Search (cf. Line-2). Whenever we encounter a classdefinition node, we extract the doc-string of that particularclass. If the doc-string contains the deprecate keyword (suchas (b) in Listing 1), we generate the Fully Qualified API nameof the class by appending the class name to the directory path.We also append the deprecation message to L D (cf. Line-13)along with a list of decorators associated with the class. Ifthere is a deprecated decorator (such as (a) in Listing 1) in theextracted list, we add the fully qualified name of the class andany description provided to list L D (cf. Line-16). Similarly,when we encounter the function definition node, we extract thelist of decorators associated with it. If there is a deprecateddecorator in the extracted list, we add a fully qualified nameof the function to list L D (cf. Line-6). For each function callnode in N AST (cf. Line-7), we verify if
DeprecationWarning or FutureWarning are passed as arguments (such as (c) inListing 1) and add its fully qualified name to list L D , whichis the final generated list of deprecated API elements.III. E VALUATION
A. Libraries Selection
To evaluate our approach, we applied it on six popular third-party Python libraries that were identified by Pimentel et al[11]. However, this approach is not limited to the selectedlibraries and could be applied to other Python libraries as well. • NumPy : Array programming library [12]. • Matplotlib : A 2D graphics environment [13]. • Pandas : Data analysis and manipulation tool [14]. • Scikit-learn : Machine learning library for Python [15]. • Scipy : Library for scientific and technical computing [16]. • Seaborn : Data visualization based on matplotlib [17]. lgorithm 1:
Detecting Deprecated API Elements inPython Libraries
Input: P , Python Library Code Output: L D , List of Deprecated API Elements Function
Detect_Deprecated_API() : L D ← {} /* parseCode returns Abstract syntax tree ofgiven code input */ P AST ← parseCode ( P ) /* Traverse each node in P AST using BFS */ for N AST ∈ P
AST do if isFunctionDefNode( N AST ) then D = N AST .Decorators if isDeprecatedDecorator(D) then L D .add(getFullyQualifiedName( N AST .Name )) /* Traverse each Node in N AST */ for Node ∈ N
AST do if isFunctionCallNode( Node ) and isDeprecationWarning( Node ) then L D .add(getFullyQualifiedName( N AST .Name )) else if isClassDefNode( N AST ) then doc str = N AST .Docstring if doc str .hasDeprecationKeyword() then L D .add(getFullyQualifiedName( N AST .Name )) D = N AST .Decorators if isDeprecatedDecorator(D) then L D .add(getFullyQualifiedName( N AST .Name )) return L D B. Results
Table I summarizes the total number of deprecated APIelements detected by the Algorithm 1 and the total numberof deprecated API elements found in the source code ofthe Python libraries. We manually counted the number ofdeprecated API elements present in the source code of thelibraries. From Table I, we can observe that the algorithm hasdetected more than 90% of the deprecated APIs. In the case of
Matplotlib , only 65% of the deprecated APIs could be detectedsince
Matplotlib deprecates many of its parameters using acustom warning function which does not have any parametersindicating if it is a
DeprecationWarning or not. In such cases,the proposed algorithm could not detect the deprecated APIelements.In the case of
Scikit-learn , Numpy and
Pandas , some of thefunctions that are used to deprecate parameters or parametervalues or deprecation warnings induced by other libraries arealso captured. Hence, the number of deprecated API elementsdetected by the algorithm is higher than the actual number ofdeprecated APIs. Whereas in the case of
Scipy and
Seaborn ,some of the parameters are deprecated without using any of thethree deprecation strategies, which could not be detected bythe algorithm. Hence, the number of deprecated API elementsdetected by the algorithm for
Scipy and
Seaborn are lowerthan the actual number of deprecated APIs.IV. L
IMITATIONS AND T HREATS TO V ALIDITY
API
Scanner detects deprecated APIs through decorator , warning or comments . Any other deprecated APIs that arenot implemented through the above three strategies cannot LibraryName LOC Total No. of Dep-recated API el-ements identifiedusing Algorithm Total No. of Dep-recated API ele-ments in sourcecode
Scikit-learn
Matplotlib
Numpy
Pandas
Scipy
Seaborn
VALUATION OF RESULTS OBTAINED USING OUR ALGORITHM be detected by the algorithm. Moreover, the algorithm findsthe function or class in which a parameter is deprecated butthe exact parameter deprecated may not be mentioned inthe deprecation message displayed by the extension due tothe inconsistent deprecation strategies adopted by the librarymaintainers. APIs deprecated without using the
Deprecation-Warning and
FutureWarning as parameters in the warningfunction cannot be detected by the algorithm. APIs deprecatedusing single-line comments and not using the doc-strings alsocannot be detected by the algorithm. Further, a major pre-requisite for our approach is the availability of source codeof libraries. We can mitigate the threat due to inconsistentdeprecation strategies if we can ensure that the documentationis structured and well maintained for Python libraries.Finally, since the results are evaluated manually, there maybe human errors. Hence, we have carefully reviewed andvalidated some of the results using release notes to mitigatethis potential threat. We plan to extend the evaluation of thetool using release notes and API documentation.V. R
ELATED W ORK
In the literature, several studies on deprecated APIs fordifferent environments have been done to analyze and tacklethe challenges posed by the deprecation of APIs in libraries.Robbes et al. [5], [6] studied the reactions of developersto the deprecation and the impact of API deprecation on the
Smalltalk and
Pharo ecosystem. Ko et al. [7] examined 260deprecated APIs from eight Java libraries and their documen-tation and observed that 61% of deprecated APIs are offeredwith replacements. Similarly, Brito et al. [18] conducted alarge-scale study on 661 real-world Java systems and foundthat replacements are provided for 64% of the deprecatedAPIs. In another study [4] conducted on
Java and C projects,they have observed that an average of 66.7% of APIs in Java projects and 77.8% in C projects were deprecatedwith replacement messages. In 26 open-source Java systemsover 690 versions, Zhou et al. [19] analysed the history ofdeprecated APIs and observed that deprecated API messagesare not well managed by library contributors with very fewdeprecated APIs being listed with replacements. Li et al. [3]characterized the deprecated APIs in Android Apps parsingthe code of 10000 Android applications. Zhang et al. [2]have observed a significant difference in evolution patternsof Python and Java
APIs and also identified 14 patterns inhich Python APIs evolve. Wang et al. [10] observed thatlibrary contributors do not properly handle API deprecation inPython libraries. To this end, there is a need for approachesand tools to automatically detect deprecated API elements inPython projects.Several approaches have been proposed in the literature forother ecosystems to migrate from deprecated APIs [20], [9],[8]. Yaoguo Xi et al. [20] proposed an approach and built atool
DAAMT to migrate from deprecated APIs in Java to theirreplacements if recorded in the documentation. Fazzini et al.[9] developed a technique
AppEvolve to update API changes inAndroid Apps by automatically learning from examples beforeand after-updates. Haryono et al. [8] proposed an approachnamed
CocciEvolve that updates using only a single after-update example. However, tools that handle deprecated APIsin Python projects have not been developed, which motivatedus towards the development of API
Scanner .VI. C
ONCLUSION AND F UTURE W ORK
Considering the extensive use of deprecated APIs duringsoftware development and lack of proper documentation fordeprecated APIs, we proposed an approach to automaticallydetect deprecated APIs in Python libraries during the devel-opment phase of the project. In this paper, we presented anovel algorithm and a tool called API
Scanner that detectsdeprecated APIs. The algorithm identifies the APIs deprecatedvia decorator , hard-coded warning or comments by parsing thesource code of the libraries and generated a list of deprecatedAPIs. API Scanner used this list and searched for the use ofdeprecated APIs in the current active editor. The tool high-lights deprecated APIs in the source code along with furtherdeprecation details. API
Scanner thus aims to help developersdetect deprecated APIs during the development stage and avoidsearching through API documentation or forums such as StackOverflow. Highlighting the use of deprecated APIs in theeditor might help developers to address and replace them. Theproposed algorithm identified 838 out of 871 API elementsacross six different Python libraries.As future work, our goal is to strengthen the tool withrelease-specific information and develop a better user interface(such as different colors) to indicate the severity of thedeprecation. We also plan to improve the documentation ofdeprecated APIs through the information obtained from thealgorithm. We plan to extend the tool to provide a featureto migrate from the deprecated API to its replacement. Weaim to improve the tool’s accuracy by extracting APIs that aredeprecated using the custom deprecation strategies. Finally, weplan to conduct extensive developer studies on the usage ofthe approach and the tool with more libraries.R
EFERENCES[1] D. Dig and R. Johnson, “The role of refactorings in api evolu-tion,” in , 2005, pp. 389–398.[2] Z. Zhang, H. Zhu, M. Wen, Y. Tao, Y. Liu, and Y. Xiong, “How dopython framework apis evolve? an exploratory study,” in . IEEE, 2020, pp. 81–92. [3] L. Li, J. Gao, T. F. Bissyand´e, L. Ma, X. Xia, and J. Klein, “Character-ising deprecated android apis,” in
Proceedings of the 15th InternationalConference on Mining Software Repositories , ser. MSR ’18. New York,NY, USA: Association for Computing Machinery, 2018, p. 254–264.[4] G. Brito, A. Hora, M. T. Valente, and R. Robbes, “On the use ofreplacement messages in api deprecation: An empirical study,”
Journalof Systems and Software , vol. 137, pp. 306 – 321, 2018.[5] R. Robbes, M. Lungu, and D. R¨othlisberger, “How do developers reactto api deprecation? the case of a smalltalk ecosystem,” in
Proceedings ofthe ACM SIGSOFT 20th International Symposium on the Foundations ofSoftware Engineering , ser. FSE ’12. New York, NY, USA: Associationfor Computing Machinery, 2012.[6] A. Hora, R. Robbes, N. Anquetil, A. Etien, S. Ducasse, and M. T.Valente, “How do developers react to api evolution? the pharo ecosystemcase,” in
Proceedings of the 2015 IEEE International Conference onSoftware Maintenance and Evolution (ICSME) , ser. ICSME ’15. USA:IEEE Computer Society, 2015, p. 251–260.[7] D. Ko, K. Ma, S. Park, S. Kim, D. Kim, and Y. L. Traon, “Api documentquality for resolving deprecated apis,” in
Proceedings of the 2014 21stAsia-Pacific Software Engineering Conference - Volume 02 , ser. APSEC’14. USA: IEEE Computer Society, 2014, p. 27–30.[8] S. A. Haryono, F. Thung, H. J. Kang, L. Serrano, G. Muller, J. Lawall,D. Lo, and L. Jiang, “Automatic android deprecated-api usage updateby learning from single updated example,” 2020.[9] M. Fazzini, Q. Xin, and A. Orso, “Automated api-usage update forandroid apps,” in
Proceedings of the 28th ACM SIGSOFT Interna-tional Symposium on Software Testing and Analysis , ser. ISSTA 2019.New York, NY, USA: Association for Computing Machinery, 2019, p.204–215.[10] J. Wang, L. Li, K. Liu, and H. Cai, “Exploring how deprecated pythonlibrary apis are (not) handled,” in
Proceedings of the 28th ACM JointMeeting on European Software Engineering Conference and Symposiumon the Foundations of Software Engineering , 2020, pp. 233–244.[11] J. a. F. Pimentel, L. Murta, V. Braganholo, and J. Freire, “A large-scale study about quality and reproducibility of jupyter notebooks,” in
Proceedings of the 16th International Conference on Mining SoftwareRepositories , ser. MSR ’19. IEEE Press, 2019, p. 507–517.[12] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen,D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern,M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. F.del R’ıo, M. Wiebe, P. Peterson, P. G’erard-Marchant, K. Sheppard,T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, and T. E. Oliphant,“Array programming with NumPy,”
Nature , vol. 585, no. 7825, pp. 357–362, Sep. 2020.[13] J. D. Hunter, “Matplotlib: A 2d graphics environment,”
Computing inScience & Engineering , vol. 9, no. 3, pp. 90–95, 2007.[14] W. McKinney, “Data structures for statistical computing in python,” in
Proceedings of the 9th Python in Science Conference , S. van der Waltand J. Millman, Eds., 2010, pp. 51 – 56.[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-esnay, “Scikit-learn: Machine learning in Python,”
Journal of MachineLearning Research , vol. 12, pp. 2825–2830, 2011.[16] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy,D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright et al. ,“Scipy 1.0: fundamental algorithms for scientific computing in python,”
Nature methods , vol. 17, no. 3, pp. 261–272, 2020.[17] M. Waskom and the seaborn development team, “mwaskom/seaborn,”Sep. 2020.[18] G. Brito, A. Hora, M. T. Valente, and R. Robbes, “Do developersdeprecate apis with replacement messages? a large-scale analysis onjava systems,” in , vol. 1, 2016, pp. 360–369.[19] J. Zhou and R. J. Walker, “Api deprecation: A retrospective analysis anddetection method for code examples on the web,” in
Proceedings of the2016 24th ACM SIGSOFT International Symposium on Foundations ofSoftware Engineering , ser. FSE 2016. New York, NY, USA: Associationfor Computing Machinery, 2016, p. 266–277.[20] Y. Xi, L. Shen, Y. Gui, and W. Zhao, “Migrating deprecated api todocumented replacement: Patterns and tool,” in