Documenting API Input/Output Examples
Siyuan Jiang, Ameer Armaly, Collin McMillan, Qiyu Zhi, Ronald Metoyer
DDocumenting API Input/Output Examples
Siyuan Jiang, Ameer Armaly, Collin McMillan, Qiyu Zhi, and Ronald Metoyer
Department of Computer Science and EngineeringUniversity of Notre Dame, Notre Dame, IN, USA{sjiang1, aarmaly, cmc, qzhi, rmetoyer}@nd.edu
Abstract —When learning to use an Application ProgrammingInterface (API), programmers need to understand the inputs andoutputs (I/O) of the API functions. Current documentation toolsautomatically document the static information of I/O, such asparameter types and names. What is missing from these tools isdynamic information, such as I/O examples—actual valid valuesof inputs that produce certain outputs.In this paper, we demonstrate a prototype toolset we builtto generate I/O examples. our tool logs I/O values when APIfunctions are executed, for example in running test suites. Then,our tool puts I/O values into API documents as I/O examples.our tool has three programs: 1) funcWatch, which collectsI/O values when API developers run test suites, 2) ioSelect,which selects one I/O example from a set of I/O values, and3) ioPresent, which embeds the I/O examples into documents.In a preliminary evaluation, we used our tool to generate fourhundred I/O examples for three C libraries: ffmpeg, libssh, andprotobuf-c.
I. I
NTRODUCTION
Recent studies by Duala-Ekoko and Robillard [1] and bySillito et al. [2] found that when learning Application Pro-gramming Interfaces (API), programmers look for input/outputexamples, actual values of a function’s input/output, in orderto understand the function. In particular, the programmers asked“How does this data structure look at runtime?” [2] and “wehave a newInstance(String) method that takes a String argumentand I have no idea what this String is supposed to be” [1].Programmers often refer to API documents to answersuch questions [3], [4]. However, a study by Maalej andRobillard shows that most of the API documents do not containinput/output (I/O) examples [5]. One reason for the documentslacking I/O examples is that API developers have little toolsupport for documenting I/O examples. Current documentationtools, such as doxygen [6], JavaDoc [7], automate the processof converting comments and source code into structured htmlpages. If API developers write I/O examples in comments, thetools can present the I/O examples in html pages. Otherwise,the tools have no I/O examples to present.When there is little help from API documents, programmers(the users of API libraries) often try a variety of approaches toget input/output examples [8]–[10]. For instance, programmers1) search or ask a question online (e.g., stackoverflow.com),2) look for existing code that use the API (e.g., open sourcerepositories), or 3) write code to do experiments about the API.If the API documents have input/output examples, programmersmay not need these approaches.To help API developers add I/O examples into API doc-uments, we built our tool, which is a prototype toolset that (1) logs actual I/O values when API functions are executed,(2) selects an example from the logged values for each APIfunction, and (3) visualizes the examples to API document.In the rest of the paper, we will describe what our tool addsto API documents, how API developers can use our tool, howour tool creates input/output examples, the preliminary results,the related work, and the future work.II.
OUR TOOL IN A N UTSHELL our tool is for developers of C API libraries to create API doc-uments with I/O examples—actual values of input/output—ofAPI functions. our tool has three programs: funcWatch, ioSelectand ioPresent. First, funcWatch logs I/O values of a functionwhen the function is executed (in tests or in other executableprograms). Often the function is executed multiple times. Then,ioSelect selects one of the calls. Finally, ioPresent adds the I/Ovalues of the selected call into the API document. IoPresentis a patch for the documentation tool, doxygen [6]. The patchenables doxygen to visualize I/O examples into html pages.This website contains a virtual machine that is set up forrunning our tool, the videos about our tool, the installationguides, and three API documents generated by our tool.III. M
OTIVATIONAL E XAMPLE
In this section, we will show what our tool adds toAPI documents. Figure 1 shows a document generated byour tool and doxygen. The document is about the function av_bprint_channel_layout in ffmpeg.The API document is composed of three parts. Part 1 isthe declaration of av_bprint_channel_layout . Part 2 is asentence describing the function. Parts 1 and 2 are generated bydoxygen. Part 3 is what our tool adds to the document, whichis an I/O example. In the rest of the section, we will walkthrough the three parts of the document from the perspectiveof a user who wants to learn av_bprint_channel_layout .First, the user reads Part 1 and gets some basic information:(1) the name of the function, (2) the return type, which is void ,and (3) the types and the names of the parameters. Accordingto the previous studies [1], [2], the user may have the followingquestions after reading Part 1:Q1. What does av_bprint_channel_layout do?Q2. What is struct AVBPrint ?Q3. What is nb_channels ?Second, the user reads Part 2, which answers Q1: the function“append a description of a channel layout to a bprint buffer.” a r X i v : . [ c s . S E ] A ug void av_bprint_channel_layout ( int nb_channels, struct AVBPrint * bp,Append a description of a channel layout to a bprint buffer. D o x y g e n O u r t oo l Parameter name Before function call After function callbp [memory addr.] [memory addr.]bp->str "" "stereo"bp->len 0 6 … … … nb_channels -1 -1channel_layout 3 3uint64_t channel_layout)I/O Example Fig. 1:
The document of function av_bprint_channel_layout inffmpeg. The document has three parts. Part 1 is the function declaration. Part 2is a description of the function. Parts 1 and 2 are generated by doxygen. Part 3 isan input/output example created by our tool. The input/output example is a tablewith three columns: “Parameter name”, “Before function call”, and “After func-tion call”. The first column lists the names of parameters. The second columnlists the values of the parameters before av_bprint_channel_layout is called, i.e. input values. The third column lists the values of the parametersafter av_bprint_channel_layout is called, i.e. output values.Each row in the table represents a pair of input/output values. For example, thefirst row represents the values of the first parameter, bp , which is pointer-type.Because the actual values of pointers—memory addresses—are not meaningful,our tool does not put actual values in the table but “[memory addr.]” to indicatethe values are memory addresses.The value that bp points to is a struct variable, which is composed of thefields of the struct. The second, third and fourth rows represent the fields. Thethree dots in the fourth row does not represent a field but represent three morefields of the struct variable. our tool presents the three fields in the actualdocument, but we omit them due to page limits. Section III discussed thisfigure in detail. From this description, the user may expect an input representing“a description of a channel layout”, and an output representing“a bprint buffer”. The names of the parameters indicate that theparameter channel_layout is a channel layout and bp is abprint. But the user still does not know where the descriptionof the channel layout is, and where the buffer of the bprint is.In other words, the user may have the questions:Q4. Where is the description of a channel layout?Q5. Where is the buffer of the bprint?Part 3, the I/O example, provides some hints for someunanswered questions. The I/O example shows that bp->str changed from an empty string to “stereo”. So the reader canguess that bp->str is the buffer of the bprint (answers Q5).The I/O example also shows that nb_channels can be “-1”,which is related to Q3. For Q4, the user still does not knowwhere the description of a channel layout is. What the readercan know from the I/O example is that when channel_layoutis three, the description is “stereo”.In summary, all the three parts have useful information thathelps the user to understand av_bprint_channel_layout .Part 1 has static information extracted from source code. Part2 has the high-level description that only API developers canprovide. our tool adds Part 3, the actual, concrete input/outputvalues of a function. I/O Example HTML API Documents with I/O Examples
Compiled
Tests ioSelect
Compiled API with Debugging Information
Modified DoxygenFunction Name funcWatch
I/O Values for the FunctionGeneration of I/O Example ioPresentInput for our tool Output of our tool
Original Doxygen
12 3
Fig. 2:
The architecture of our tool. The inputs of our tool are (1) compiledAPI, (2) compiled API tests, (3) an API function name. The three programsin our tool are labeled with numbers: (1) funcWatch, (2) ioSelect, and (3)ioPresent. FuncWatch runs tests and code examples, and logs input/outputvalues of the function specified by the input. IoSelect chooses an examplefrom all the logged I/O values. IoPresent adds the example to html documents.
IV. U
SAGE OF OUR TOOL
The usage of our tool has three steps: 1) run funcWatch,2) run ioSelect, and 3) run ioPresent. If an API developerwants to get I/O examples for all the functions in the library,the developer needs to have a list of the function names forrunning funcWatch (so that funcWatch knows which functionto log). To help developers with this situation, we modifieddoxygen so that doxygen writes all the function names in afile when doxygen is executed.We put a demo video in our online appendix (Section II)in which we followed these steps and generated nearly twohundred I/O examples for the ffmpeg library.V. E
NVISIONED U SERS
The envisioned users of our tool are the developers ofAPI libraries. The developers can use our tool to create APIdocuments with I/O examples. Although the developers are theenvisioned users of our tool, the API users (the programmerswho use and learn API libraries) are the ones that read thedocuments and benefit from the I/O examples.Additionally, the programs in our tool can be used indepen-dently for other purposes. For example, funcWatch can be usedto detect whether a parameter always takes the same value.VI. D
ESIGN OF OUR TOOL
In this section, we will describe the design of our tool.Figure 2 shows the architecture of our tool, which illustratesthe relationship among the three programs: funcWatch, ioSelect,and ioPresent. In the following subsections, we will describehow we design and implement each program.
A. FuncWatch: Logging I/O values
The input of funcWatch is an API function name and atest. FuncWatch runs and monitors the test, and, during theexecution of the test, funcWatch logs the I/O values of thegiven API function.We built funcWatch based on the code of Flashback [11],which uses libdwarf [12], a library that provides access toDwarf debugging information. By using libdwarf, funcWatchnserts breakpoints at the entry and the exits of the givenfunction (a C function may have multiple exits).Then, funcWatch runs the given test with the breakpoints.After a breakpoint is reached and the test paused, funcWatchcollects the parameters/return values of the API function.Furthermore, funcWatch is able to dereference pointers atruntime, so funcWatch can collect not only pointers’ values,but also the values that the pointers point to. FuncWatch alsocalculates the addresses of a struct variable’s fields, so thatfuncWatch can log the values of the fields. Similarly, funcWatchalso work for union and enum types.In C, arrays are often passed without size information tofunctions. While a common practice is to pass the size of anarray as a separate parameter, the naming conventions differacross different projects and developers. So there is no simpleway to automatically detect the size parameters. In this case,funcWatch logs only the first item of an array.In a test, the given function can be called multiple times.FuncWatch logs a call id for every parameter/return value inorder to distinguish the values collected from different calls.
B. ioSelect: Choosing I/O examples
The input of ioSelect is a set of I/O values of an API function.Each value is labeled with a call id logged by funcWatch.IoSelect selects a call id randomly, and all the values with thesame call id become the I/O example for the function. BecauseioSelect selects I/O examples randomly, for the same APIfunction and the same test, our tool may create different I/Oexamples. We implemented ioSelect as a standalone programso that we can easily add new algorithms for selecting I/Oexamples in the future.
C. ioPresent: Adding I/O examples to documents
The input of ioPresent is a set of I/O examples. Each APIfunction has only one I/O example. The output of ioPresent ishtml API documents with the I/O examples.The main functionality of ioPresent is to visualize an I/Oexample into a html table like the table in Figure 1. For example,in Figure 1, ioPresent automatically detects bp->str is afield of a struct variable that bp points to. Then, ioPresentautomatically indents the row of bp->str and makes the rowof bp collapsible (so that all the values that bp points to canbe hidden if needed).We implemented ioPresent as a patch of doxygen. Specifi-cally, we modified two parts of doxygen: Data Organizer and
Html Generator . Data Organizer builds data structures to represent elementsin the API libraries (that are going to be documented), such asclasses and functions. First, we added a new class to representa new type of element—the I/O examples. Second, we addedI/O examples into the data structure that represents a function.We also modified
Html Generator , in which we implementedthe process of visualizing I/O examples in html format.VII. P
RELIMINARY E VALUATION
To assess the ability of our tool to generate I/O examples forreal-world libraries, we did a preliminary evaluation to answer TABLE IT
HE NUMBER OF FUNCTIONS THATHAVE
I/O
EXAMPLES GENERATED BY OUR TOOLAPI library the question: how many functions will have I/O examples byusing our tool?In this evaluation, we used our tool on three API libraries:ffmpeg, libssh, and protobuf-c. For each library, we executedfuncWatch on every function (found by doxygen) and everytest in the test suites. Then, we executed ioSelect and ioPresentto generate API documents with I/O examples.In total, our tool generates I/O examples for 402 functions,191 functions in ffmpeg, 202 in libssh, and 9 in protobuf-c.To put the numbers in context, Table I shows the numberof functions in source code, the number of functions in APIdocuments, along with the number of functions that have I/Oexamples. The API documents have much fewer functions thansource code, because doxygen includes only the functions thatare commented in a predefined format. This default settinghelps doxygen exclude the internal functions, which shouldnot be seen by API users.our tool did not generate I/O examples for all the functionsthat are in API documents, because not all the functions wereexecuted in the test suites. For example, ffmpeg has more thantwenty thousand functions defined in source code. Only aboutsix hundred functions are documented. our tool created I/Oexamples for 30% of the documented functions.VIII. V
ISUALIZING ALL
I/O
VALUES
Sometimes, it is useful to see all the possible values of theparameters and return variables of an API function. So we builta prototype to visualize all the values logged by funcWatchfor an API function. For each parameter in a function, wedraw a bar chart of all the possible values of the parameter,where the height of each bar represents the frequency of thatvalue. For example, in Figure 3, function av_gcd has threebar charts. Each bar chart represents a parameter or a returnvariable. Parameter a has five different values:
0, 1, 3, 4,25 (logged by funcWatch). The most frequent value for a is , which occurs in more than 140 calls.The bar charts are interactive to show the relationshipsbetween the parameters and return variables. For example, inFigure 3, the cursor is hovered over bar in bar chart a . Theother two bar charts are updated to reflect the values collectedwhen a was . In other words, Figure 3 shows that there areabout 30 calls where a is . In these calls, b is either or . And all these calls return .IX. R ELATED W ORK
In this section, we will present two categories of relatedwork.
The first category is about introducing different typesig. 3:
All the logged I/O values for function av_gcd . Three bar charts correspond to three variables: a , b , and return . In this figure, the user has mousedover the possible value “25” in a , and the corresponding values for b and return are updated and highlighted. of contents into API documents, such as information fromStackOverflow posts [13], usage information [14], frequently-asked-questions from the Web [15], and code example sum-maries mined from the Web [16], [17]. Different from theseapproaches, our tool does not mine the Web, and our tooladds I/O values instead of source code examples into APIdocuments. The second category is about collecting and analyzing dy-namic information, which can be useful in API documentation.For example,
Daikon analyzes dynamic information to reportlikely program invariants [18]. The reports can be useful inAPI documents. We plan to embed this type of reports to APIdocuments in the future in order to study whether and how thereports can help programmers use API functions.X. F
UTURE W ORK AND L IMITATIONS our tool is a prototype toolset, which serves as our first steptowards investigating what information is useful for API users.Previous studies have shown that programmers ask questionsabout dynamic information [1], [2], such as I/O examples.What is not shown is whether and how this information helpsprogrammers. First, we will conduct an empirical study toassess the usefulness of I/O examples. In this study, we planto hire programmers to complete certain program tasks withand without I/O examples in documents. By comparing theprogrammers’ performance with and without I/O examples, wecan see how I/O examples can help programmers.Furthermore, we plan to design and implement differentselection strategies for ioSelect. Our assumption is that someI/O examples are more useful than others. We plan to investigatesome potential characteristics of useful I/O examples. Forexample, the most common I/O values may be more useful.Additionally, the quality and the quantity of the I/O examplesgenerated by our tool depend on the test suites of the APIlibraries. In the future work, we may consider code examples,tutorial examples, and real-world applications that use the APIlibraries in addition to the test suites to generate I/O examples.Implementation-wise, our tool currently has three majorlimitations. First, funcWatch does not support 64-bit memoryaddressing so our tool can work only in 32-bit operatingsystems. Second, funcWatch collects only the values forstatically-linked libraries. Third, funcWatch collects all theI/O values no matter whether a test passes or fails. If a testfails, the logged values may be invalid. The current workaround for this limitation is to detect the outputs of the tests in a scriptand discard the outputs of funcWatch for those failed tests.XI. C
ONCLUSION
To conclude, we built and demonstrated a prototype toolset,our tool. The goal of our tool is to help API developers createand add I/O examples into API documents. our tool has threeprograms: funcWatch, ioSelect, and ioPresent. Our preliminaryevaluation shows that our tool can generate four hundred I/Oexamples for three real-world C libraries.A
CKNOWLEDGMENT
We thank Douglas Smith for his work in visualization ofall I/O values. This work was partially supported by the NSFCCF-1452959 and CNS-1510329 grants, and the Office ofNaval Research grant N000141410037. Any opinions, findings,and conclusions expressed herein are the authors’ and do notnecessarily reflect those of the sponsors.R
EFERENCES[1] E. Duala-Ekoko and M. P. Robillard, “Asking and answering questionsabout unfamiliar apis: An exploratory study,” ser. ICSE ’12, 2012.[2] J. Sillito, G. C. Murphy, and K. D. Volder, “Asking and answeringquestions during a programming change task,”
IEEE Trans. Softw. Eng. ,vol. 34, no. 4, July 2008.[3] A. Forward and T. C. Lethbridge, “The relevance of software documen-tation, tools and technologies: A survey,” ser. DocEng ’02, 2002.[4] M. P. Robillard and R. DeLine, “A field study of api learning obstacles,”
Empirical Software Eng. , vol. 16, no. 6, pp. 703–732, 2010.[5] W. Maalej and M. P. Robillard, “Patterns of knowledge in api referencedocumentation,”
IEEE Trans. Softw. Eng.
IEEE TSE , vol. 32, no. 12, 2006.[9] J. Lawrance, C. Bogart, M. Burnett, R. Bellamy, K. Rector, and S. D.Fleming, “How programmers debug, revisited: An information foragingtheory perspective,”
IEEE Trans. Softw. Eng. , vol. 39, no. 2, 2013.[10] T. Roehm, R. Tiarks, R. Koschke, and W. Maalej, “How do professionaldevelopers comprehend software?” ser. ICSE ’12, 2012.[11] A. Armaly and C. McMillan, “Pragmatic source code reuse via executionrecord and replay,”
Journal of Software: Evolution and Process
ACM TOIS , vol. 31, no. 1, 2013.[17] S. Subramanian, L. Inozemtseva, and R. Holmes, “Live api documenta-tion,” ser. ICSE 2014, 2014.18] M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S.Tschantz, and C. Xiao, “The daikon system for dynamic detection oflikely invariants,”