Automatic API Usage Scenario Documentation from Technical Q&A Sites
AAutomatic API Usage Scenario Documentation from Technical Q&A Sites
GIAS UDDIN,
University of Calgary, Canada
FOUTSE KHOMH,
Polytechnique Montréal, Canada
CHANCHAL K ROY,
University of Saskatchewan, Canada
The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse developmentneeds. To address shortcomings in API official documentation resources, several research has thus focused on augmenting officialAPI documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIsinto its official documentation. Recently, surveys of software developers find that developers in SO consider the combination of codeexamples and reviews about APIs as a form of API documentation, and that they consider such a combination to be more useful thanofficial API documentation when the official resources can be incomplete, ambiguous, incorrect, and outdated. Reviews are opinionatedsentences with positive/negative sentiments. However, we are aware of no previous research that attempts to automatically produceAPI documentation from SO by considering both API code examples and reviews. In this paper, we present two novel algorithms thatcan be used to automatically produce API documentation from SO by combining code examples and reviews towards those examples.The first algorithm is called statistical documentation, which shows the distribution of positivity and negativity around the codeexamples of an API using different metrics (e.g., star ratings). The second algorithm is called concept-based documentation, whichclusters similar and conceptually relevant usage scenarios. An API usage scenario contains a code example, a textual descriptionof the underlying task addressed by the code example, and the reviews (i.e., opinions with positive and negative sentiments) fromother developers towards the code example. We deployed the algorithms in Opiner, a web-based platform to aggregate informationabout APIs from online forums. We evaluated the algorithms by mining all Java JSON-based posts in SO and by conducting three userstudies based on produced documentation from the posts. The first study is a survey, where we asked the participants to compare ourproposed algorithms against a Javadoc-syle documentation format (called as Type-based documentation in Opiner). The participantswere asked to compare along four development scenarios (e.g., selection, documentation). The participants preferred our proposedtwo algorithms over type-based documentation. In our second user study, we asked the participants to complete four coding tasksusing Opiner and the API official and informal documentation resources. The participants were more effective and accurate whileusing Opiner. In a subsequent survey, more than 80% of participants asked the Opiner documentation platform to be integrated intothe formal API documentation to complement and improve the API official documentation.CCS Concepts: • Software and its engineering → Software creation and management ; Documentation ; Search-based software engineering . Additional Key Words and Phrases:
API, Documentation, Usage Scenario, Crowd-Sourced Developer Forum
ACM Reference Format:
Gias Uddin, Foutse Khomh, and Chanchal K Roy. 2021. Automatic API Usage Scenario Documentation from Technical Q&A Sites. 1, 1(February 2021), 43 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
Authors’ addresses: Gias Uddin, [email protected], University of Calgary, Canada; Foutse Khomh, [email protected], PolytechniqueMontréal, Canada; Chanchal K Roy, University of Saskatchewan, Canada, [email protected] to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].© 2021 Association for Computing Machinery.Manuscript submitted to ACMManuscript submitted to ACM a r X i v : . [ c s . S E ] F e b Uddin et al.
APIs (Application Programming Interfaces) offer interfaces to reusable software components [68]. Modern day rapidsoftware development is facilitated by the numerous open source APIs that are available for any given task. Suchis the popularity of APIs that the number of open source repositories in GitHub now is 100 million, an exponentialincrease over 67 million from only two years ago [27]. With the growing number of open source repositories and theAPIs supported by the repositories, developers now can face two major challenges: selection of an API amidst multiplechoices and then learning how to properly use it [92, 95]. Both tasks can be facilitated by the official API resources.Unfortunately, official API documentation can be incomplete, obsolete and incorrect [67, 71, 99], which often leavesdevelopers no choice but to look for alternative documentation and knowledge sharing resources [53, 92].The advent and proliferation of online developer forums has opened up an interesting avenue for developers to lookfor solutions of their development tasks in the forum posts [6, 89]. Among the numerous online forums, Stack Overflow(SO) is a large online community where millions of developers ask and answer questions about their programming needs.Developers post questions in SO about their different technical topics, such as selection, usage, and troubleshooting ofAPIs. Volunteers answer those questions, or make comments, as do participants in other social forums. To date, thereare around 120 million posts, out of which 48 million are questions/answers, and the rest (72 million) are comments.Around 11 million users visit SO and add 9K new questions to the site each day [22].The popularity and growing influence of SO has motivated a number of recent research efforts to produce APIdocumentation automatically from SO contents, such as adding code examples and interesting textual contents about aJava API type (e.g., a class) in the Javadocs [84, 89], recommending usage examples within a given IDE [53], summarizingAPI reviews (i.e., opinions with positive and negative sentiments) to assist in API selection [95], and so on. In ourprevious surveys of 178 developers, we find that developers consider the combination of code examples and API reviewsas a form of API documentation [92]. In fact, the developers consider such a combination as more valuable than officialAPI documentation, when the official resources can be lacking [92]. We are aware of no previous research that attemptsto automatically produce API documentation from SO by combining both code examples and reviews.In this paper, we propose a new documentation format for APIs that we can generate automatically by mining SO.The format considers both code example of an API and relevant reviews about the code example from other developersin the forum posts. We present two novel documentation algorithms based on the code examples and reviews. The firstalgorithm is called
Statistical Documentation which offers visualized usage and review statistics about code examplesof an API. The second algorithm in
Concept-Based Documentation which clusters usage scenarios API that areconceptually similar, e.g., one scenario consisting of creating an HTTP connection and another sending messages overthe HTTP connection. Using the two algorithms, we automatically produce the documentation of an API by mining SO.We deploy those documentation in Opiner [95]. Opiner was previously developed as an online prototype engine tosummarize reviews about an API from online developer forums. The overarching goal of Opiner is to become a onestop resource for crowd-sourced API documentation. In this paper, we have extended Opiner to also include our minedusage documentation of the APIs. Opiner is hosted at: http://opiner.polymtl.ca.In Figure 1, we show screenshots of Opiner usage documentation engine. The Opiner online web-site currentlyindexes the mined and documented usage scenarios of the APIs from a total of 3048 threads SO tagged as ‘Java+JSON’.This dataset was previously used to mine and summarize reviews about diverse Java APIs [95, 96]. As such, we expectto see code examples discussing about Java APIs for JSON parsing in the posts. A developer can search for the usagedocumentation of an API by searching its name in Opiner - see 1 under the front page of Opiner in Figure 1. The front
Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 3
Front Page of Opiner Online User InterfaceUsage sentiment overview for API JacksonCo-used APIs in the code examples using API Jackson The Concept-Based Documentation Page for JacksonA Usage Scenario is Expanded Upon ClickEach Usage Scenario Has a See Also SectionSee Also Section Can be Expanded Upon Click
The Statistical Documentation Page for Jackson
Fig. 1. Screenshots of Opiner online website with the deployed our two novel API usage scenario documentation algorithms. WhileOpiner online website offers other features, the screenshots here only show the extensions of Opiner that were implemented as partof the new contributions presented in this paper page also shows the APIs with the most number of usage scenarios. As shown in 2 , one of the most used APIs for JSONparsing in Java is Jackson. The circles 3 and 4 show some metrics that we developed to produce and visualize theStatistical Documentation. The circle 3 shows the overall distribution of positive and negative opinions in the forumposts where the code examples of the API Jackson were found. The circle 4 shows that the API javax.ws (blue pie) isfrequently used alongside the Jackson API in the same code examples. The javax.ws API is an official Java package thatis used to create RESTful services, where JSON is the primary medium of communication.The right part of Figure 1 shows screenshots of the concept-based documentation of Jackson in Opiner. In Opiner,each concept consists of one or more similar API usage scenarios. Each usage scenario of an API consists of a codeexample, a textual description of the underlying task addressed by the code example, and reviews (i.e., opinionatedsentences with positive/ negative sentiments) of the code examples as found in the comments to the post where the codeexample is found. Each concept is titled as the title of the most recently posted usage scenario. In circle 5 , we show themost recent three concepts for API Jackson. The concepts are sorted by time of their most recent usage scenarios. Themost recent concept is placed at the top of all concepts. Upon clicking on a concept title, we can see details of the mostrecent scenario in the concept as shown in circle 6 . Each concept is provided a star rating as the overall sentimentstowards all the usage scenarios that are grouped under the concept (see circle 6 ). Other relevant usage scenarios ofthe concept are grouped under a ‘See Also’ (see circle 7 ). Each usage scenario under the ‘See Also’ can be further
Manuscript submitted to ACM
Uddin et al.
Table 1. Contributions and Research Advances Made in our Paper
Contribution Summary Research Advancement
Algorithms We propose two novel al-gorithms to automaticallydocument API usage sce-narios from online devel-oper forum: Statistical andConcept-based. Previous related research focused mainly on linking code ex-ample or interesting insights directly to the Javadoc of an APItype [84, 89], or to complement API official documentation using SOcontents [3, 5, 11–13, 17, 20, 33–36, 42, 49, 50, 79, 85, 90, 101, 102, 107].Our algorithms offer directions to design innovative algorithms tocomplement and improve API official documentation.Techniques We implemented and de-ployed the algorithms inour tool, Opiner [96]. We are aware of no tool that can offer search and documentation fea-tures of API usage scenarios automatically collected from developerforums. The underlying documentation framework in Opiner canbe further extended with new API usage documentation algorithms.We conducted three userstudies to demonstrate theeffectiveness of the pro-posed usage documenta-tion algorithms over tradi-tional API documentationapproach and resources. The positive reception of our proposed API documentation formatsbased on the two algorithms opens up a new research area in soft-ware engineering to design innovative techniques and tools by har-nessing knowledge shared in online crowd-sourced forums. As wenoted in Sections 1 and 8, existing research [11, 12, 20, 35, 36, 79, 84,85, 89, 90] mostly focused on complementing the traditional officialdocumentation.explored (see circle 8 ). Each usage scenario is linked to the corresponding post in SO where the code example wasfound (by clicking the details word after the description text of a scenario as shown in 6 ).We evaluated the usefulness of the proposed two documentation algorithms over the traditional type-based docu-mentation approach [75]. In a type-based documentation, we adopt a Javadoc-style by clustering all the usage scenariosof an API type (e.g., a class) under the type name in Opiner website. Previously, Subramanian et al. [84] also promotedsimilar documentation format for Javadocs by automatically mining all the code examples of an API from SO. Given thateach usage scenario in our concept-based documentation also contains reviews and textual task description of a codeexample, we added all such information to each code example in our Type-based documentation. We then recruited 29developers (18 professional) and asked them to compare the three documentation types (i.e., Statistics, Concept-Basedand Type-Based) along four development scenarios (e.g., API selection, documentation) as originally used in [95]. Theparticipants preferred our proposed two algorithms over type-based documentation in all the development scenarios.We conducted a second user study using 31 developers to evaluate the effectiveness of the produced documentationin Opiner to complete coding tasks. Each participant completed four coding tasks using Opiner documentation, officialJavadocs, SO, and everything (i.e., including search engine). The participants, on average, wrote more correct code, inthe least amount time, and using the least effort while using Opiner compared to the other documentation resources. Ina subsequent survey, more than 80% participants preferred the Opiner documentation over existing SO posts. Morethan 85% of participants asked the Opiner documentation platform to be integrated into the formal API documentationto complement and improve the API official documentation.In summary, we advance the state of the art by presenting two novel algorithms to automatically document API usagescenarios from online developer forums with each deployed in an online API documentation prototype tool Opiner. Wedemonstrate the effectiveness of the algorithms and the tool to assist developers in their diverse development tasksusing three user studies. In Table 1, we outline the major contributions of this paper.
Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 5
This research borrows concepts and techniques from software engineering and opinion analysis. In this section, wepresent the major concepts and techniques upon which this study is founded.
In this paper, we investigated our API usage documentation techniques for both open-source and official Java APIs.As such, we analyzed SO posts tagged as “Java” where Java APIs are mostly discussed. However, the analysis and thetechniques developed can be applicable for any API. In particular, we adopt the definition of an API pioneered by MartinFowler. An API is a “set of rules and specifications that a software program can follow to access and make use of theservices and resources provided by its one or more modules” [103]. An API is identified by a name. An API consists ofone or more modules. Each module can have one or more source code packages. Each package can have one or morecode elements, such as classes, methods, etc. For the Java official APIs available through the Java SDKs, we consider anofficial Java package as an API. Similar format is adopted in the Java official documentation (e.g., the java.time packageis denoted as the Java date APIs in the new JavaSE official tutorial [47]).As shown in Figure 2, this is also how APIs are discussed and mentioned in SO. For example, there are three opensource Java APIs mentioned in the textual contents in Figure 2: Jackson, Google Gson, and org.json. In the code example,two packages from the official Java SDK are used along with Gson: java.util and java.lang.An API is normally designed to support specific development needs. Each need can be implemented as a functionalityin the API. Each functionality is denoted as ‘feature’ [68]. For example, the Gson API is developed to support theprocessing and manipulation of JSON-based inputs in Java. One feature of the Gson API is the conversion of JSONArrayinto a Java Object. As shown in Figure 2, this can be addressed by using two methods from the two classes of the GsonAPI: getType(. . . ) and fromJson(. . . ) from the classes TypeToken and Gson, respectively.
Bing Liu, in his book [38], defines opinion as: “An opinion is a quintuple < 𝑒 𝑖 , 𝑎 𝑖 𝑗 , 𝑠 𝑖 𝑗𝑘𝑙 , ℎ 𝑘 , 𝑡 𝑙 > , where 𝑒 𝑖 is the name ofthe entity, 𝑎 𝑖 𝑗 is an aspect of 𝑒 𝑖 , 𝑠 𝑖 𝑗𝑘𝑙 is the sentiment on aspect 𝑎 𝑖 𝑗 of entity 𝑒 𝑖 , ℎ 𝑘 is the opinion holder, and 𝑡 𝑙 is thetime when the opinion is expressed by ℎ 𝑘 ". The sentiment 𝑠 𝑖 𝑗𝑘𝑙 is positive or negative. Both entity ( 𝑒 𝑖 ) and aspect ( 𝑎 𝑖 𝑗 )represent the opinion target. An aspect about an entity can be about a property or a feature supported by the entity.For example, in Figure 2 the first comment (C1) has two sentences. The first sentence is ‘The code is buggy’. This anegative opinion about the the bug aspect of the provided code example. In this paper, we produce API documentation by combining code examples of an API with the relevant reviews towardsthe code examples. We use the notion ‘API Usage Scenario’ which is a composite of three items: a code exampleassociated to an API, a textual description of the underlying task addressed by the code example, and a set of reviews(i.e., opinions with positive and negative sentiments) towards the code example as provided in the comments to thepost where the code example is found. For example, from Figure 2, we can produce an API usage scenario based on thecode snippet as follows: (1) The code snippet is provided to complete a development task involving Java to convertJSON data to Java object using the Google Gson API. (2) A textual task description of the task by identifying relevant
Manuscript submitted to ACM
Uddin et al.
How to convert JSON data to JSON object
C1. The code is buggy. In the new version of GSON, TypeToken is not public, hence you will get constructor error.C2. Using actual version of GSON (2.2.4) it works perfectly!C3. I found org.json a bit buggy when converting a Json ArrayC4. I would recommend using the Jackson API.Check Java JSON website for competing APIs, such as Jackson, Gson, org.json. If you don’t need object de-serialization but to simply get an attribute, you can try org.json. Google Gson supports generics and nested beans that should map to a Java collection such as List. It’s pretty simple! You have a JSON object with several properties of which groups property represents an array of nested objects of the very same type. This can be parsed with Gson the following way:
Import java.util.*;import java.lang.reflect.Type;import com.google.gson.Gson;Import com.google.gson.reflect.TypeToken;Class Data { private String title; private long id; private List groups;}Type listType = new TypeToken
Question A n s w e r C o mm e n t s Fig. 2. How APIs are discussed in SO sentences, such as those immediately before the code snippet. (3) The reviews in comments C1 and C2 that are relevantto code snippet.Our decision to use API usage scenarios instead of simply code examples is influenced by seminal research of Carrollet al. [14] and Shull et al. [75]. Carroll et al. [14] proposed ‘minimal manual’ for technical documents by designing thedocumentation around specific tasks. In subsequent study, Shull et al. [75] find that such a task-based documentationformat is more useful than a traditional hierarchical documentation format [75]. In our API usage scenarios, eachscenario corresponds to a specific development task.Our decision to utilize reviews from comments is based on our previous findings from surveys of 178 softwaredevelopers from SO and GitHub [92]. We find that developers consider the combination of a code example in an answerpost and the reviews about it from other developers in the comments as a form of API documentation. We also find thatdevelopers consider such a combination more valuable than API official documentation, because the reviews are offeredby experts and are based on their real-world experience on the API usage. The usefulness of comments is confirmedwith empirical evidence by two recent studies as well, published at the same year of our surveys (i.e., 2019). Ren etal. [63] exploited comments to identify ‘controversial’ answers, i.e., answers that may be potentially incorrect. Theyfind that in those ‘controversial’ cases, comments are useful to offer a more accurate usage experience of the API. In aseparate study, Zhang et al. [108] manually analyzed a statistically significant sample of all SO comments and foundthat more than 75% of the comments are useful. Indeed, the number of comments is much more than the number ofanswers in SO (72 million vs 29 million as of 2020). Zhang et al. [108] further emphasized: “The amount of informationin comments cannot be neglected, with 23% of the answers having a commenting-thread that is longer than their actualanswer.”
These positive findings from Zhang et al. [108] highlight that most of the comments in SO are informative andnot noisy, and thus could be used to assist developers.
Our API documentation framework in Opiner currently supports our two proposed algorithms and a Javadoc-stylepresentation of the mined API usage scenarios. In the Javadoc of an API, individual pages are created to document eachtype of the API. A type for a Java API can be a class, annotation, or an interface. The Javadoc-style presentation in
Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 7
Forum Posts API Database
Extend Opiner & Deploy The Produced Documentation in OpinerParse Texts in Answer & Comments to Answer Parse Code Snippets in Answer Posts Detect Opinionated Sentences in Comments to Answer PostLink Code Snippet to API Generate Task Description Link Opinions to Code SnippetsProduce Statistical Documentation
Documentation Component
Produce Concept Based Documentation Produce Type-Based Documentation
Mined API Usage Scenarios
Proposed New Documentation Algorithms Javadoc-Based Extension
Produced API Statistical, Concept-Based, and Type-Based Documentation
Mining Component (Uddin et al. IST 2020)
Fig. 3. The major components in our proposed crowd-sourced API documentation framework
Opiner is called Type-based documentation, because we cluster all usage scenarios associated to an API type under thetype. For Java APIs, Javadocs are one of the most commonly known and used documentation formats - as noted in anumber of old and recent previous research [53, 75]. As such, previous research on automatic API documentation effortshave proposed to augment the Javadocs of an API type with code examples and relevant insights (e.g., specific conditionsof usage) from SO [84, 89]. However, previous research finds that a Javadoc-type hierarchical documentation format isnot a useful presentation format [99]. We thus innovate by proposing two novel API documentation algorithms, whichare different from the type-based documentation format.
The input to our API documentation framework is a list of forum posts and an API database. The final outputs are APIdocumentation based on the input. The framework consists of two major components (see Figure 3):(1) Mining Component. Takes as input the forum posts and an API database. The output is a list of API usagescenarios. Each API scenario consists of a code snippet, textual description, and reviews towards the snippet.(2) Documentation Component. Takes as input the mined usage scenarios of an API and produces three typesof documentation. The first two are based on our two proposed documentation algorithms (statistical andconcept-based). The third is type-based documentation, which is developed as an adaptation of Javadoc.Our API database consists of (1) all the Java APIs collected from the online Maven Central repository [78], and (2) allthe Java official APIs from JavaSE (6-8) and EE (6 and 7). In Table 2, we show the summary statistics of our API database. We consider a binary file (e.g., jar) of a project from Maven as an API Manuscript submitted to ACM
Uddin et al.
Table 2. Descriptive statistics of the Java APIs in the database
API Module Version Link
Given as input a forum post, first we preprocess the post contents and then we mine API usage scenarios from theparsed contents. The techniques supporting both steps are previously published in Uddin et al. [98]. We thus brieflydescribe the steps below and leave the details in [98].Given as input an answer to a question in SO, we divide its contents using three parts: (1) Code snippet in theanswer. We detect code snippets as the tokens wrapped with the < code > tag. (2) Textual contents in answer, and(3) Textual contents in the comments to the answer. The textual contents are tokenized into sentences. Opinionatedsentences are detected as sentences having positive or negative polarity. To detect opinionated sentences, we use theOpinerDSO algorithm [97] which offered performance comparable to state-of-the-art sentiment detection tools forsoftware engineering, e.g., Senti4SD [9]. Nevertheless, the Opiner framework is flexible to replace the OpinerDSO byany other sentiment detector. During the parsing of code snippets, we discard non-code and non-Java snippets (e.g.,XML, Javascript) using Language-specific naming conventions (similar to Dagenais and Robillard [18]). We parse avalid code example to identify API elements (types, methods, interfaces). We consult our API database to infer FQN(Fully Qualified Name) of the API elements. Given as input a parse code example and the textual contents in the postwhere code example is found, we produce an API usage scenario using three algorithms as follows.First, we heuristically link a code snippet to an API name mentioned in the textual contents of the forum postwhere the code snippet is discussed by consulting the textual contents and API elements found in the code snippet. Forexample, in Figure 2, the code snippet is linked to the Google Gson API. State-of-the-art algorithm like Baker [84] is notdesigned to link a code example to an API name mentioned in the textual contents of a forum post. For example, for thecode snippet in Figure 2, Baker [84] links it to three APIs (java.util, java.lang, and Google GSON). However, the codesnippet is provided to explain the conversion of JSON data to JSON object using the GSON API, as mentioned in thetextual contents. The code snippet also has the most number of API classes and methods matched with those foundin the Gson API. Second, we produce a textual description of the underlying task addressed by the code snippet. Thealgorithm does this by picking sentences in the textual contents of the forum post where the code snippet is discussedand where the API linked to the code snippet is referred to. For example, in Figure 2, the following sentence is pickedinto (among others) the description, “Google Gson supports generics and nested beans . . . ”, but not this sentence “if youdon’t need object deserialization, . . . you can try org.json”. The description is produced by combining beam search [61]with TextRank algorithm [43]. Third, we associate positive and negative opinions relevant to the code example byanalyzing the comments to the post. The algorithm does this by looking for references in the comments to the APIthat is linked to the code snippet. For example, in Figure 2, all opinionated sentences from comment C1 and C2 arelinked to the code snippet. Each algorithm shows a precision and recall of over 0.8 on multiple evaluation settings. Eachalgorithm outperforms the state-of-the-art baselines (e.g., Baker [84]). For further details we refer to the paper [98]. Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 9
Given as input the mined usage scenarios of an API, we produce three types of documentation: two using our twoproposed novel documentation algorithms (statistical, concept-based) and the third based on an adaption of Javadocpresentation format. We discuss the three documentation types below.
Algorithm 1. Statistical Documentation
We produce statistical documentation of the mined API usage scenarios to offer a visualized overview of the usage ofan API based on the mined usage scenarios. This documentation thus can complement the front/introductory page of aAPI documentation by offering visualized statistics of the API usage. In addition, this type of documentation can alsooffer a quick overview of the underlying quality of the code example (described below).To complement the front page of API documentation, we produce three types of visualizations:(1) Sentiment Overview. The overall sentiments in the reactions to the usage scenarios of the API, and(2) Co-Used APIs. The usage of other APIs in the same code examples of the API.(3) Co-Used API Types. How the various types (e.g., classes) of an API are often used together.In addition, we provide an overview of the quality of each API usage scenario as follows:(4) Star Rating of API Usage Scenario. Following previous research and adoption of 5-star ratings in online productreviews [37] and API review summarization [95, 96], we show the overall rating of each API usage scenario byanalyzing the positive and negative opinions related to a code example.(5) Star Rating of API Type. The overall star-rating of an API type based on the usage scenarios of the API wherethe type was used.We describe the approaches below.(1) Sentiment Overview. In our previous work on API review summarization [95], we observed that developers reportedto have increased awareness about an API while seeing the visualized overview of the API reviews from developerforums. We thus offer two types of overview of the quality of all code examples of an API. The first is a simple piechart by showing the overall counts of all the positive and negative reactions to the usage scenarios linked to an API.We do this aggregating all the positive and negative opinions across all the mined usage scenarios of an API. Thesecond is a time-series of the aggregated sentiments. We do this as follows. (a) We find the first and last month-yearof creation dates among the usage scenarios of an API. The creation date of a usage scenario is the time when thepost containing the usage scenario was created. (b) For each month-year, we create two bins: positive and negative.The posiive bin contains the count of all positive opinions the posts created in that month-year received due to thecode examples discussed in those posts and included in our API usage scenarios. (c) We create two time-series onefor positive and another for negative polarities. Each time-series chart has an X-axis as month-year and y-axis as thecounts of positive/negative polarity for that month-year. While the pie-chart offers overall sentiments of developerstowards the usage scenarios of an API, the time-series chart shows how the sentiments changed over time. For example,an API may have high positive reviews when it was first created, but it started get more negative reviews over time dueto obsolete code examples.(2) Co-Used APIs. For a given API, we show how many other APIs were used in the code examples where the APIwas used. Because each API offers features focusing on specific development scenarios, such insights can be helpful toknow which other APIs besides the given API a developer needs to learn to be able to properly utilize the API for an
Manuscript submitted to ACM = | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 | × | 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 | + |
𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 | (1)(5) Star Rating of API Type. We compute the overall five star rating (using Equation 1) of each type by taking intoaccount all the positive and negative reactions towards the usage scenarios grouped under the type. Algorithm 2. Concept Based Documentation
The input to the concept-based documentation algorithm is a list of all usage scenarios associated to an API. The outputis a list of concepts, where each concept contains a list of API usage scenarios that are similar to each other basedon the development tasks implemented. We propose concept-based documentation to present the mined API usagescenarios by grouping the scenarios around conceptually similar tasks. A concept in our documentation algorithm is a cluster of API usage scenarios that offer similar features or that are situationally relevant . Two API usage scenarios aresimilar, if the code examples use the same API elements (e.g., classes) and they have similar inputs and outputs. Forexample, if two code examples using the GSON API offer conversion of JSON objects to Java objects, they have similarinputs (e.g., JSON object) and similar outputs (e.g., Java objects). As such, we cluster the two code examples into oneconcept. Two API usage scenarios are situationally relevant, if the code examples have similar inputs, but they producedifferent outputs and vice versa. For example, two code examples using the GSON are situationally relevant if both takeas input a JSON object, but one offers conversion of the JSON object to Java objects and another to XML objects.Given that the code examples in a concept should have similar inputs, the code examples should then use similarAPI elements. That means the code examples in a concept should exhibit similar usage pattern involving similar APIelements. Therefore, given as input all the usage scenarios of an API, we first identify frequent itemsets as types (e.g.,classes) of an API that are found to be used frequently together. A set of code examples showing similar usage patternscan be similar, if they are clone of each other or if they conceptually relevant (e.g., similar input or similar output).Therefore, our approach has two major steps:(1) Detect Usage Patterns. We detect API types as itsemsets that are frequently used together in the scenarios. Weassign usage scenarios to the patterns.(2) Detect Concepts. We create clusters of API usage scenarios in the detected usage patterns that are similar toeach other based on inputs and/or outputs. Each cluster is denoted as a ‘concept’.
Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 11
Import org.codehaus.jackson.map.ObjectMapper;ObjectMapper mapper = new ObjectMapper();registration = mapper.readValue(node.get(“registration”), Registration.class);Import org.codehaus.jackson.map.ObjectMapper;Import com.fasterxml.jackson.xml.XmlMapper;ObjectMapper mapper = new ObjectMapper();foo = mapper.readValue(jsonString, Foo.class);XmlMapper xmlMapper = new XmlMapper();String xml = xmlMapper.writeValueAsStirng(foo); Import com.fasterxml.jackson.core.JsonGenerator;Import com.fasterxml.jackson.databind.SerializerProvider;Import com.fasterxml.jackson.databind.JsonSerializer;Import com.fasterxml.jackson.core.JsonProcessingException;class MyNullKeySerializer extends JsonSerializer
Code Snippet S1Code Snippet S2Code Snippet S4
T1. ObjectMapperT1. ObjectMapperT2. XmlMapperT4. JsonGeneratorT5. SerializerProviderT6. JsonProcessingExceptionT7. JsonSerializer
Code Snippet S3
Import org.codehaus.jackson.map.ObjectMapper;Import org.codehaus.jackson.map.ObjectWriter;private final ObjectWriter writer = new ObjectMapper().writerWithDefaultPrettyPrinter();writer.writerValueAsString(httpresponse)
T1. ObjectMapperT3. ObjectWriter
Code Snippet Itemsets as API Types
S1 T1S2 T1,T2S3 T1,T3S4S5S6S7S8 T4,T5,T6,T7T1,T2T4,T5,T6T1,T3T4,T5,T6
Frequent Itemsets Support (min = 2)
T1 4T4,T5,T6 3T4,T5 3T4,T6 3T5,T6 3T4/T5/T6 3T1,T2 2T1,T3 2T2/T3 2
Concept 3Concept 2Concept 1 F r e qu e n t U s a g e P a tt e r n D e t e c t i o n C o n c e p t D e t e c t i o n ID Fig. 4. Examples on how usage patterns and concepts are detected in the concept-based documentation of an API
We provide visualized examples of the two steps in Figure 4 and describe the steps below.Step 1. Detect Usage Patterns. Given that similar code examples in our concepts should use similar API elements, wefirst need to identify patterns of API elements in the mined usage scenarios of an API that are frequently used together.To detect usage patterns, we use frequent itemset mining [51]. Frequent itsemset mining has been used in summarizingproduct features [32] and to find useful usage patterns from software repositories [7, 113].Using frequent itemset mining, we cluster code examples that use the same types of a given API, even that the codeexample can use more than one API. For example, if a code example is provided to convert JSON string to Java objectusing the GSON API, the JSON string can come from a file, from a web service, or from any source. The code examplethus can show usage of the reading of the JSON string from those sources using APIs other than the GSON API. Aclone detection technique may consider such code examples as not clones, because one code example may be using thejava.util API to read a JSON string from a file while another may be using the HttpClient API to read the JSON stringfrom web server. However, both code examples are using the same GSON API types. As such both will be clusteredunder a concept using our frequent itemset mining approach.In Figure 4, the left column shows examples of detecting usage patterns. The right column shows four of the codesnippets (S1-S4) used in the examples of left column. All the code snippets in Figure 4 are associated to the API Jacksonfor JSON parsing. Given as input all the mined API usage scenarios of an API, our pattern detection involves two steps:(1) Collection of API types from the code examples, and (2) Generation of frequent sets of API types.
Manuscript submitted to ACM
ObjectMapper ) is collected, because this is the only type from Jackson API in S1. For the code snippet S2, we findtwo types from Jackson API (
ObjectMapper and
XmlMapper ). Similarly, given we focus on clustering types of given API,which is Jackson in Figure 4, we ignore types that come from other APis or that are local, such as Registration in S1,foo and String in S2, etc. Registration and foo are ignored, because those are local classes, i.e., those are not providedby the Jackson API. The class ‘String’ is ignored, because it is provided by java.lang package. Please note that ourlinking of a code example to an API name mentioned in the forum text considers all the code elements from all APIs ina code example (method, class, interface, etc.), as we describe the linking algorithm in our paper [98]. While producingconcepts by taking as input all code examples linked to an API, we focus on types (class, interface) of a given API inthe code examples, because we focus on clustering code examples implementing similar features of the API and thatprevious studies find that API types are more informative than API methods to analyze API features [15].Second, we create a list of itemsets by collecting API types as explained above. Thus for eight code examples inFigure 4 left column, we have eight itemsets. Second, we apply the Frequent Itemset Mining [51] on the lists using theFPGrowth C implementation developed by Borget [8]. The output is a list of frequent itemsets and a support value foreach itemset. For example, the second frequent itemset in Figure 4 (ID 2) is { 𝑇 ,𝑇 ,𝑇 } with support 3 (because it isfound in three code snippets S4, S6, S8). Each frequent itemset is considered as a pattern. For example, in Figure 4 wehave eight patterns that were found in at least two code examples. We assign a code snippet to a pattern by computingthe similarity between a code snippet ( 𝑆 ) and a pattern ( 𝑃 ) as follows:Similarity = | Types ( 𝑆 ) (cid:209) Types ( 𝑃 )|| Types ( 𝑆 )| (2)We assign a code example to the pattern with the highest similarity. For example, all the three code examples (S4, S6,S8) are assigned to the pattern { 𝑇 ,𝑇 ,𝑇 } (ID 2) in Figure 4. If more than one pattern is found with the maximumsimilarity value, we assign the code snippet to the pattern with the maximum support. Therefore, the output of this stepis a matrix 𝑃 × 𝑆 , where 𝑃 stands for a usage pattern and 𝑆 stands for a set of usage scenarios assigned to the pattern.More than one API usage scenario can belong to one concept. For example, all the code examples related to thefrequent itemsets (T4, T5), (T4,T6), and (T5,T6) will be grouped under the super itemset (T4,T5,T6) in Figure 4. We dothis to ensure that a concept can contain all the different use cases (i.e., scenarios) that use the API types under theconcept. The current implementation of concepts in our algorithm assigns each API usage scenario into one conceptonly, i.e., it is a ‘hard’ assignment. Intuitively, an API usage scenario can be similar to API usage scenarios assigned tomore than one concept, e.g., due to situational relevance. We leave the creation and analysis of such ‘soft’ assignmentof API usage scenarios into concept as our future work. Given that all such relevant code examples are grouped under aconcept, a developer in our produced API documentation would only need to look at one or two of the grouped APIusage scenarios in the concept to get a concise but complete insight of the overall API usage addressed by the concept.Another way to produce such concepts would be to use closed frequent itemsets [51]. However, closed frequent itemsetsonly return the super (i.e., closed) itemsets. Therefore, while our concept detection approach borrows ideas from closedfrequent itemset to find super itemset, we leverage standard frequent itemset mining to also identify all the frequentitemsets under the closed itemset. Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 13The output from this step is a matrix of patterns vs mined API usage scenarios 𝑃 × 𝑆 , where P stands for a patternand S contains a list of code examples belonging to the pattern. The code examples in a pattern of an API contain a setof commonly co-used types of the API.Step 2. Detect Concepts. In this step, we further analyze the patterns identified in the previous step to determinewhether two or more patterns could be connected with each other. Intuitively, a connection between two patterns canbe established if code examples in the two patterns share similar inputs/outputs. For example, suppose one patterncontains code examples related to the establishment of an HTTP connection using the HttpClient API and anotherpattern contains code examples related to the sending/receiving or messages over an HTTP connection. The twopatterns can be connected, because the output (i.e., an established HTTP connection) from pattern 1 (i.e., the codeexamples in pattern 1) is used as input in pattern 2. If we find such connected patterns, we group those patterns togetherinto a concept. We detect concepts as follows.First, given all code examples under a pattern, we apply clone detection to find code examples that are similar toeach other. This helps us create intermediate sub-groups of code examples in a pattern, where code examples in asub-group are clones of each other. We then compare the inputs/outputs between two patterns by comparing theinputs/outputs in the sub-groups found in the two patterns. Thus the formation of sub-groups reduces the numberof analysis (of inputs/outputs) between patterns. This step is important, because otherwise we will be left with anexponential combination of multiple patterns to analyze. For clone detection, we use NiCad [73], which is a widelyused state of the art clone detection tool in software engineering literature. Specifically, we use NiCad 3, which detectsnear-miss clones . We detect inputs to a code example by analyzing the inputs taken by methods in the code example,where the inputs are generated by another method in the code example. We detect outputs from a code example as theoutputs from the methods, where the outputs are not fed into other method(s) in the same code example.As a demonstration of the above process, consider Figure 4 right column: The first code snippet (S1) belongs topattern with ID 1 (left column) and the second code snippet (S2) belongs to pattern with ID 7 (left column). S1 uses onetype of API Jackson ( ObjectMapper ), S2 uses both
ObjectMapper and
XmlMapper . Both code snippets use the readValue methodof the
ObjectMapper to convert a JSON string into a Java class. Code snippet S2 further attempts to produce an XML stringout of the generated Java class. It does that by using the
XmlMapper class of the same Jackson API and by taking as inputthe generated Java class from
ObjectMapper . Therefore, we say Patterns 1 and 7 are situationally relevant, because Pattern1 always needs to be executed before Pattern 7 above. To ensure that we establish the relationship between patternscorrectly, we employ program slicing. For example, the third code snippet (S3) in Figure 4 right column belongs topattern with ID 3. S3 also uses the
ObjectMapper class of the Jackson API. However, S3 does not use the readValue methodof
ObjectMapper . Instead S3 creates a string out of a httpresponse and assigns that to another class of the Jackson API(
ObjectWriter ) for pretty printing. We thus do not create an edge between Pattern 1 and Pattern 8. Similarly, all the codeexamples (S4, S6, S8) belong to another concept.Once a set of patterns are grouped together into a concept, we assign a numeric ID to the concept. The output ofthis step is a matrix 𝐶 × 𝑆 , where 𝐶 stands for a concept and 𝑆 denotes a list of API usage scenarios assigned to theconcept. In Opiner web site, we present each concept as a list of four items: { 𝑅, 𝑆, 𝑂,𝑇 }. Here 𝑅 corresponds to a scenariodetermined as representative of all the usage scenarios belonging to the concept (discussed below). 𝑆 corresponds to therest of the usage scenarios in the bucket that we wrap under a ‘See Also’ sub-list. 𝑂 is the overall star rating for thebucket (discussed below). 𝑇 is the title of the concept. We describe the fields below. Following [86], we set a minimum of 60% similarity and a minimum block size of five lines in the code example. Manuscript submitted to ACM
Javadoc Adaptation. Type-Based Documentation
We generate the type-based documentation of an API by grouping the scenarios of the API based on the API type asfollows. (1) We identify all the types in a code example associated with the API (2) We create a bucket for each type andput each code example in the bucket where the type was found. We present this bucket as a sorted list by time (mostrecent as the topmost). Our type-based documentation approach is similar to Baker [84], who proposed to annotatethe Javadoc of an API class using all the code examples in SO where the API class was found. For example, based onFigure 2, the code example will be put under two classes of the Google GSON API (Gson and TypeToken), one classunder java.util API (List) and one class under java.lang API (Type). The class ‘Data’ is ignored, because it is a locallydeclared class in the code example.
We implemented the API usage scenario documentation algorithms by extending Opiner [95]. We focused on followingtwo requirements during the extension:(1) Scalability. The Opiner’s system should be able to process millions of posts from developer forums.(2) Efficiency. The processing should be completed in short time, ideally in hours.The Opiner’s system architecture is able to process millions of posts. The proposed documentation algorithms aresupported in Opiner by four major components (see Figure 5):(1) Database: A hybrid data storage component to manipulate data in diverse formats.(2) Application Engine: Handles the loading and processing of the data from forum posts and hosts all the algorithmsto mine and summarize usage scenarios about APIs.(3) REST Web Server: The middleware between the Application Engine and the Website that supports two types ofREST APIs: (a) Search. to find an API with auto-completion support. (b) Documentation. provides the documen-tation and visualization produced by the three algorithms.(4) Website: The web-based user interface to host the search and documentation results for API usage scenarios.The decoupling of the ‘Application Engine’ from other components allows it to run independently of the website. Thusthe ‘Application Engine’ can run offline, even when the Website is being used. The ‘Application Engine’ is designed toload and preprocess each SO thread in parallel. The documentation algorithms are applied on the preprocessed postcontents. Our profiling of the ‘Application Engine’ shows that the preprocessing of the contents takes almost 80% of all
Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 15
Opiner Application EngineOpiner Application Engine
Opiner DatabaseOpiner Database
Opiner Website
Opiner REST-based Web serverOpiner REST-based Web server Search APIDocumentation APICode Visualization API
Developer Forums Maven Central
Fig. 5. The system architecture in Opiner to support the proposed API usage scenario documentation algorithms the time. This is because this step handles the creation of the meta information of posts based on preprocessing, suchas the parsing of code snippets, the linking of code snippets to the API mentions, for examples. The documentationcomponent takes less than 10% of the time. Therefore, once API usage scenarios are mined and properly preprocessed,the generation of the documentation using the proposed algorithms should take minimal time in Opiner.
We applied the algorithms on a dataset consists of 3,048 threads from SO that contain a total of 22.7K posts (ques-tion+answer+comment). The Opiner online web-site currently indexes the mined and documented usage scenariosof the APIs found in this dataset. The threads in the dataset are tagged as ‘Java+JSON’, i.e., the posts discussed aboutJSON-based tasks using Java APIs. This dataset was previously used in [95] to summarize reviews about APIs andwas found to offer a rich set of competing APIs with diverse API usage discussions. As such, we expect to see codeexamples discussing about Java APIs for JSON parsing in the posts. This dataset contains numerous usage scenarios frommultiple competing APIs to support JSON-based manipulation in Java (e.g., REST-based architectures, microservices,etc.). JSON-based techniques can be used to support diverse development scenarios, such as, serialization of disk-basedand networked files, lightweight communication between server and clients and among interconnected softwaremodules, messaging format over HTTP as a replacement of XML, in encryption techniques, and on-the-fly conversionof Language-based objects to JSON formats, etc.We parsed the dataset to collect all the code examples. There are 8,596 valid code snippets (e.g., Java code) and 4,826invalid code snippets (e.g., XML block). In Table 3 we show descriptive statistics of the dataset. A total of around 15K APInames are mentioned in the textual contents of the posts. On average, each valid snippet contained at least 7.9 lines. Thelast column “Users” in Table 3 shows the total number of distinct users that posted at least one answer/comment/question.On average, around four users participated in one thread. More than one user participated in 2,940 threads (96.4%). Amaximum of 56 distinct users participated in one thread [48].The 8,596 code examples are associated with 175 distinct APIs using the code example to API mention linkingalgorithm from Section 3.1. Table 4 presents the distribution of the APIs by the code examples. The majority (60%) of
Manuscript submitted to ACM
Table 3. Descriptive statistics of the dataset used to produce API documentation in Opiner website
Threads Posts Sentences Words Snippet Lines APIs Mentioned in Texts Users
Average
Table 4. Distribution of Code Snippets By APIs
Overall Top 5API Snippet Avg STD Snippet Avg Max Min
175 8,596 49.1 502.7 5,196 1,039.2 1,951 88
Table 5. Distribution of Reviews in Scenarios with at least one reaction
Scenarios Comments Positive Negativew/Reviews Total Avg Total Avg Total Avg
Gson class from Google Gson API was found in 679 code examples out of the 1,053 code examples linked to the API(i.e., 64.5%). Similarly, the
JSONObject class from the org.json API was found in 1,324 of 1,498 code examples linked tothe API (i.e., 88.3%). Most of those code examples also contained other types of the APIs. Therefore, if we follow thedocumentation approach of Baker [84], we would have at least 1,324 code examples linked to the Javadoc of
JSONObject for the API org.json. This is based on the parsing of our 3,048 SO threads. Among the API usage scenarios in our studydataset, we found 1,154 scenarios contained at least one review using our proposed algorithm to associate reviews to anAPI usage scenario. In Table 5, we show the distributions of comments and reviews in the 1,154 scenarios. There are atotal of 7,538 comments found in the corresponding posts of those scenarios, out of which 2,487 are sentences withpositive polarity and 1,216 are sentences with negative polarity.
In this section, we present the results of a user study that we conducted to compare the usefulness of our two proposeddocumentation algorithms (Statistical and Concept-based) over type-based documentation of API usage scenarios.
Previous research find that a Javadoc-style hierarchical API documentation format is less useful to developers than amore practical task-centric documentation [75, 99]. As previously reported, effective developers investigate source codeby looking for structural cues [69] and such exploration could involve the usage of multiple types of a single API [94].Thus, a traditional type-based documentation may not handle the more useful task-based documentation format ashypothesized by Carroll et al. [14] as later confirmed by Shull et al. [75]. We thus need to understand whether and howour proposed documentation algorithms could offer added benefits over a traditional type-based documentation format
Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 17across diverse development scenarios. Such assessment can be formed based on inputs from software developers on thedocumentation produced in Opiner using the three algorithms.
We investigate how software developers rate the three documentation types (our two proposed + type-based) in Opineralong four development scenarios. Our goal was to judge the usefulness of a given documentation as shown in Opiner(see Section 3.4). The objects were the three types documentation produced for a given API using the algorithms andthe subjects were the participants who rated each documentation. The contexts are four development scenarios. Thescenarios are previously used in Uddin and Khomh [95].
We recruit 29 software developers. Among the 29 participants, 18 were professional developers.The rest of the participants (11) were recruited from four universities: two in Canada (University of Saskatchewan andPolytéchnique Montreal) and two in Bangladesh (Bangladesh University of Engineering & Technology and KhulnaUniversity). The 18 professional developers were recruited through the online professional social network site, Free-lancer.com. Sites like Amazon Mechanical turks and Freelancer.com have been gaining popularity to conduct studies inempirical software engineering research due to the availability of efficient, knowledgeable and experienced softwareengineers. In our study, we only recruited a freelancer if he had professional software development experience inJava. Among the 11 participants recruited from the universities, eight reported their profession as students, two asgraduate researchers. Among the 18 freelancers, one was a business data analyst, four were team leads, and the restwere software developers. Among the 29 participants 88.2% were actively involved in software development (94.4%among the freelancers and 81.3% among the university participants). Each participant had a background in computerscience and software engineering.The number of years of experience of the participants in software development ranged between less than one year tomore than 10 years: three (all of them being students) with less than one year of experience, nine between one andtwo, 12 between three and six, four between seven and 10 and the rest (nine) had more than 10 years of experience.Among the four participants that were not actively involved in daily development activities, one was a business analyst(a freelancer) and three were students (university participants). The business data analyst had between three and sixyears of development experience in Java. The diversity in the participant occupation offered us insights into whetherand how Opiner was useful to all participants in general.
We asked the participants to compare the documentation produced by the three algorithms under fourdifferent development scenarios (selection, documentation, presentation, and API authoring). Each task was describedusing a hypothetical development scenario where the participant was asked to judge the summaries through the lens ofa software engineering professional. Persona based usability studies have been proven effective both in Academia andIndustry [46]. We describe the tasks below.(1) Selection.
Can the usage documentation help you to select this API?
The persona was a ‘Software Architect’ who wastasked with making decision on the selection of an API given the usage documentation produced for the API inOpiner. The participants were asked to consider the following decision criteria in their answers: the documentation(C1) contained all the right information, (C2) was relevant for selection, and (C3) was usable .(2) Documentation.
Is the produced documentation complete and readable?
The persona was a ‘Technical API Documenta-tion Writer’ who was tasked with the writing of the documentation of an API by taking into accounts the usage
Manuscript submitted to ACM
Table 6. Impact of the summaries (in percentages) based on the scenarios
Scenario Rating Type-Based Statistical Concept-BasedSelection
Useful 93.1 93.1
Not Useful 6.9 6.9 0
Documentation
Useful 93.1 79.3
Not Useful 6.9 20.7 0
Presentation
Useful 93.1
Authoring
Useful 72.4 69.0
Not Useful 10.3 10.3 3.4Neutral 17.2 20.7 6.9summaries of the API in Opiner. The decision criteria on whether and how the different summaries in Opiner couldbe useful for such a task were: (C1) the completeness of the information, and (C2) the readability of the summaries.(3) Presentation.
Can the documentation easily help you to justify your selection of the API?
The persona was a developmentteam lead who was tasked with the creation of a presentation by using the summaries in Opiner to justify theselection of an API. The decision criteria were: (C1) the conciseness of the information and (C2) recency of the providedscenarios. .(4) Authoring.
Can the documentation easily help you to decide whether to improve an API feature? the persona was an APIauthor who was tasked with the creation of a new API by learning the strengths and weaknesses of the competingAPIs using Opiner. The decision criteria were: (C1) the reactions towards code examples and (C2) the presence of diverse scenarios .We assessed the ratings of the three tasks (Selection, Documentation, Presentation) using two metrics: useful (thedocumentation does not miss any info, misses some info but still useful), not useful (misses all the info so not usefulat all). For the task (Authoring), we define usefulness as follows: useful (Fully helpful, helpful), not useful (PartiallyUnhelpful, Fully unhelpful). For the authoring scenario, we further asked the participants whether he had decided toauthor a new API as a competitor to the Jackson API whose documentation he had analyzed in Opiner. The optionswere: Yes, No, Maybe. Jackson is the most popular Java API for JSON parsing. Each participant was asked to justify hisrating for each scenario in a text box. The study was conducted in a Google form.
In Table 6, we show the percentage of the ratings for each usage documentation algorithm for the four developmentscenarios (Selection, Documentation, Presentation, and Authoring).
Observation 1.
Our two proposed documentation algorithms (Concept and Statistical) were rated as more usefulthan the type-based documentation (Javadoc adaptation) across all four development scenarios.For the ‘Selection’ scenario, the incorporation of reactions as positive and negative opinions in the usage documen-tation were considered as useful. According to one participant: “Conceptual documentation is the most useful of all.Inexperienced developers can select code snippet based on positive or negative reactions while the experienced developers cancompare the code and go for the best one.
However, just the presence of the reactions were not considered as enough for
Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 19selection of an API when coding tasks were involved. “Statistical documentation just shows the negative and positiveviews, it is not useful to view the working examples or code. Conceptual documentation groups together the common APIfeatures, it helped me to find the common examples in same place. Type based documentation is also ok but had to dig in tofind the description. All code examples provided the full code example, so it was good.”
For the ‘Documentation’ scenario, the participants appreciated the innovative presentation formats of clusteringusage scenarios by concepts. According to one participant ““For Documentation purpose , Conceptual documentationis more readable than other two documentation.”” . Similar to the observations of Carroll et al. [14] for the needs fordocumenting APIs based on tasks, the participants advocated the potential of concept-based summaries documentation,such as “Conceptual documentation is important to generate different kind of ideas and thoughts to complete different kindof task such as serialization, deserialization, format specification, mapping with Jackson API. This documentation is mostuseful to me because of the availability of resource.” . The participants considered type-based documentation to be usefulfor specific tasks that may not involve multiple types of a given API at the same time.For the ‘Presentation’ scenario, the participants preferred the visualized charts from statistical documentation. Theirsuggested workflow was to start the presentation with the charts and then dive deeper into the presentation based oninsights from concept-based documentation. According to one participant, the combination of ratings and examples isthe key: “If I was the team lead, statistical documentation would help to decide me to view the users positive and negativereaction over the selection API, I would definitely take this into consideration. Conceptual documentation would help meto create a presentation based on the examples and the ratings.”
The participants asked for an extension in statisticaldocumentation to compare APIs by the features offered, “The statistical documentation helps to summarize the popularityof the API and other co-existing APIs. But it would be more helpful if there was comparisons among competitive APIs aswell. The team lead should should this information one by one.”
For the ‘API Authoring’ scenario, when asked about whether the documentation of the Jackson API in Opiner showedenough weakness of the API that the developers would like to author a competing API, 48.4% responded with a ‘No’,35.5% with a ‘Yes’ and 16.1% with Maybe. The participants considered the concept-based documentation to be mostuseful during such decision making, followed by statistical documentation. According to one participant, “In order toauthor a new API, I would like to understand how the market of developers is reacting to the current API. More negativeresponses would indicate that there is dissatisfaction among the developers and there is a need to create a new API for whichI would look into the negative responses in the conceptual documentation. Given that I can see that there is a positive trendfor jackson Api from statistical documentation and there are no or very few negative responses from the users among theeight usage scenarios that I had selected. I feel that I would not author a new API to compete Jackson, simply because it stillworks and the developers are content, which I could identify from the statistical documentation.”
Observation 2.
The concept-based documentation was considered as the most useful in three out of the fourscenarios (Selection 100%, Documentation 100%, and Authoring 89.7%), while the statistical documentation wasconsidered as the most useful for the other scenario (Presentation 96.6%).
In this section, we evaluate the effectiveness of the documentation in Opiner web site against traditional API webdocumentation resources, i.e., API official documentation and developer forum (SO).
Manuscript submitted to ACM
The goal of the automatic documentation of APIs usage scenarios in Opiner is to assist developers in finding the rightsolutions to their coding tasks with relative ease than other resources. Therefore, we need to assess the effectiveness ofOpiner API usage documentation in real-world coding tasks. Previous research reported that developers consider thecombination of code examples and API reviews in the online developer forums as a form of API documentation [92]and preferred such documentation over API official documentation. However, the developers struggled to make acomplete and concise insights due to the huge volume and scattered nature of the usage discussions in the online forums.Therefore, the usage scenario documentation in Opiner should alleviate the pains of developers to find a completesolution to a development task without going over multiple SO posts. Previous research also showed that the APIdocumentation is often incomplete, obsolete and incorrect [99] and that developers find it hard to learn and use an APIby simply relying on API official documentation [67]. Therefore, the usage scenario documentation in Opiner should beable to compensate for such shortcomings in the official documentation of an API during their coding tasks.
We recruited 31 developers and asked them to complete four coding tasks using the documentation produced in Opinerand using API web documentation resources (i.e., baselines). At the end of the coding tasks, we asked the participantsto participate in a short survey to share their experience of using Opiner over the baseline resources. The participantsused the documentation produced in Opiner (see Section 3.4) for the coding tasks.
The coding tasks were completed by a total of 31 participants. 29 out of the 31 participants camefrom our previous study to evaluate the usefulness of our API documentation algorithms in Section 4. The additionaltwo participants in this study are recruited from universities. Both of them are graduate students with more than oneyear experience in software development. Among the 31 participants, 18 were professional developers. The rest of theparticipants (13) were recruited from four universities. Each freelancer was remunerated with $20, which was a modestsum given the volume of the work. Each participant had a background in computer science and software engineering.The survey questions were answered by 29 out of the 31 participants.
The four tasks are described in Table 7. Each task was required to be completed using a pre-selectedAPI. Thus for the four tasks, each participant needed to use four different APIs: Jackson [24], Gson [28], Xstream [100],and Spring framework [77]. Jackson and Gson are two most popular Java APIs for JSON parsing [81]. Spring is one ofthe most popular web framework in Java [26] and XStream is well-regarded for its efficient adherence to the JAXBprinciples [83], i.e., XML to JSON conversion and vice versa.All of the four APIs can be found in the list of top 10 most discussed APIs in our evaluation corpus. The APIs aremature and are fairly large and thus can be hard to learn. The largest API is the Spring framework 5.0.5 with a total of3687 types (Class, Annotation, etc.), followed by Jackson 2.2 (467 types), XStream 1.4.10 (340 types), and Gson 2.8.4 (34types). For Jackson API, we counted the code types that are shipped with the core modules (jackson-core, databind, andannotations). The Jackson API has been growing with the addition of more modules (e.g., jackson-datatype). The fourAPIs have a total of 4528 types. In comparison, the entire Java SE 7 SDK has a total of 4024 code types, and the entireJava SE 8 SDK has a total of 4240 code types. We used the online official Javadocs of the APIs to collect the type information. Our online appendix contains the list.Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 21
For each coding task, we prepared four settings: SO complete task using only SO, DO complete task using only API official documentation, OP complete the task only with the help of Opiner, and EV complete task using any online resources available (i.e., SO, DO, Opiner, and Search engines).The above four settings help us properly analyze the diverse coding behavior of software developers using the diverseonline API documentation resources available to complete a coding task. Intuitively, the EV setting is the most natural,because it allows a developer to use search engines and/or any other resources. The SO, DO, and OP settings simplyrestrict the access of resources. This restriction then provides insights whether the participants are more/less usefulwhile simply using a subset of all available resources. If, a user still shows performance similar/higher similar to theEV setting while only using Opine (i.e, the OP setting), that then can offer increased confidence on the the overalleffectiveness of Opiner to support developers in their coding tasks.We follow a between-subject design [104], with four different groups (G1, G2, G3, G4) of participants, each participantcompleting four tasks in four different settings. Each of three groups (G1, G3, G4) had eight participants. At the end, eightparticipant from the three groups (G1, G3, G4) and seven from G2 completed the coding tasks. Each participant in a groupwas asked to complete the four tasks. Each participant in a group completed the tasks in the order and settings shown inTable 8. To ensure that the participants used the correct development resource for a given API in a given developmentsetting, we added the links to those resources for the API in the task description. For example, for the task TJ and thesetting SO, we provided the following link to query SO using Jackson as a tag: https://stackoverflow.com/questions/tagged/jackson . Forthe task TG and the setting DO, we provided the following link to the official documentation page of the Google GSONAPI: https://github.com/google/gson/blob/master/UserGuide.md . For the task TX and the setting PO, we provided a link to the summariesof the API XStream in the Opiner website http://sentimin.soccerlab.polymtl.ca:38080/opinereval/code/get/xstream/ . For the task TS with thesetting EV, we provided three links, i.e., one from Opiner (as above), one from SO (as above), an one from API formaldocumentation (as above). For the ‘EV’ setting, we added in the instructions that the participants are free to use anysearch engine.To avoid potential bias in the coding tasks, we enforced the homogeneity of the groups by ensuring that: (1) nogroup entirely contained participants that were only professional developers or only students, (2) no group entirelycontained participants from a specific geographic location and–or academic institution, (3) each participant completedthe tasks assigned to him independently and without consulting with others (4) each group had same number of fourcoding tasks (5) each group had exposure to all four development settings as part of the four development tasks. Theuse of balanced groups simplified and enhanced the statistical analysis of the collected data. The four tasks were picked randomly from our evaluation corpus of 22.7K SO posts.Each task was observed in SO posts more than once and was asked by more than one developer. Each task was related tothe manipulation of JSON inputs using Java APIs for JSON parsing. The manipulation of JSON-based inputs is prevalentin disk-based, networked as well as HTTP-based message, file, and object processing. Therefore, the Java APIs for JSONparsing offer many complex features to support the diverse development needs. The solution to each task spanned overtwo posts. The two posts are from two different threads in SO. Thus the developers could search in Stack Overflow tofind the solutions. However, that would require those searching posts from multiple threads in SO. All of those tasks arecommon using the four APIs. Each post related to the tasks was viewed and upvoted more than hundred times in SO.To ensure that each development resource was treated with equal fairness during the completion of the development
Manuscript submitted to ACM
Table 7. Overview of coding tasks
Task API Description
TJ Jackson Write a method that takes as input a Java Object and serializes it to Json, using the Jacksonannotation features that can handle custom names in Java object during deserialization.TG GSON Write a method that takes as input a JSON string and converts it into a Java Object. The conversionshould be flexible enough to handle unknown types in the JSON fields using generics.TX Xstream Write a method that takes as input a XML string and converts it into a JSON object. The solutionshould support aliasing of the different fields in the XML string to ensure readabilityTS Spring Write a method that takes as input a JSON response and converts it into a Java object. The responseshould adhere to strict JSON character encoding (e.g., UTF-8).
Table 8. Distribution of coding tasks per group per setting. SO = SO, DO = Javadoc, OP = Opiner, EV = Everything. TJ, TG, TX, TS =Task Using Jackson, GSON, XStream, Spring, resp. ↓ Group | Setting → SO DO OP EVG1
TJ TG TX TS G2 TS TJ TG TX G3 TX TS TJ TG G4 TG TX TS TJtasks, we also made sure that each task could be completed using any of the development resources, i.e., the solution toeach task could be found in any of the resources at a given time, without the need to rely on the other resources.
A seven-page coding guide was produced to explain the study requirements (e.g., the guide forGroup G1: https://goo.gl/ccJMeY ). Before each participant was invited to complete the study, he had to read the entire codingguide. Each participant was encouraged to ask questions to clarify the study details before and during the study. Torespond to the questions the participants communicated with the first author over Skype. Each participant was alreadyfamiliar with formal and informal documentation resources. To ensure a fair comparison of the different resources usedto complete the tasks, each participant was given a brief demo of Opiner before the beginning of the study. This wasdone by giving them an access to the Opiner web site.
The study was performed in a Google Form, where participation was by invitationonly. Four versions of the form were generated, each corresponding to one group. Each group was given access to oneversion of the form representing the group. An offline copy of each form is provided in our online appendix [1]. Thedevelopers were asked to write the solution to each coding task in a text box reserved for the task. The developers wereencouraged to use any IDE to code the solution to the tasks.Before starting each task, a participant was asked to mark down the beginning time. After completing a solution theparticipant was again asked to mark the time of completion of the task. The participant was encouraged to not take anybreak during the completion of the task (after he marked the starting time of the task). To avoid fatigue, each participantwas encouraged to take a short break between two tasks. Besides completing each coding task, each participant wasalso asked to assess the complexity and effort required for each task, using the NASA Task Load Index (TLX) [30],which assesses the subjective workload of subjects. After completing each task, we asked each subject to provide their
Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 23self-reported effort on the completed task through the official NASA TLX log engine at nasatlx.com. Each subject wasgiven a login ID, an experiment ID and task IDs, which they used to log their effort estimation for each task, under thedifferent settings.
The main independent variable we consider is the development resourcethat participants use to find solutions for their coding tasks. Dependent variables are the values of the following metrics:correctness of the code solutions, time, and effort spent to code the solutions (discussed below).(1) Correctness. To check the correctness of the solution for a coding task, we used the following process: (a) Weidentified the correct API elements (types, methods) used for the coding task. (b) We matched how many of those APIelements were found in the coding solution and in what order. (c) We quantified the correctness of the coding solutionusing the following equation: Correctness = | API Elements Found || API Elements Expected | (3)An API element can be either an API type (class, annotation) or an API method. Intuitively, a complete solution shouldhave all the required API elements expected for the solution. We discarded the following types of solutions: (a) Duplicates.
Cases where the solution of one task was copied into the solution of another task. We identified this by seeing the samesolution copy pasted for the two tasks. Whenever this happened, we discarded the second solution. (b)
Links.
Caseswhere developers only provided links to an online resource without providing a solution for the task. We discardedsuch solutions. (c)
Wrong API.
Cases where developers provided the solution using an API that was not given to them.(2) Time. We computed the time taken to develop solutions for each task, by taking the difference between the startand the end time reported for the task by the participant. Because the time spent was self-reported, it was prone toerrors (some participants failed to record their time correctly). To remove erroneous entries, we discarded the followingtype of reported time: (a) reported times that were less than two minutes. It takes time to read the description of a taskand to write it down, and it is simply not possible to do all such activities within a minute. (b) reported times that weremore than 90 minutes for a given task. For example, we discarded one time that was reported as 1,410 minutes, i.e.,almost 24 hours. Clearly, a participant cannot be awake for 24 hours to complete one coding task. This happened inonly a few cases.(3) Effort. We used the TLX metrics values as reported by the participants. We analyzed the following five dimensionsin the TLX metrics for each task under each setting: (a)
Frustration Level.
How annoyed versus complacent the developerfelt during the coding of the task? (b)
Mental Demand.
How much mental and perceptual activity was required? (c)
Temporal Demand.
How much time pressure did the participant feel during the coding of the solution? (d)
PhysicalDemand.
How much physical activity was required. (e)
Overall Performance.
How satisfied was the participant with hisperformance? Each dimension was reported in a 100-points range with 5-point steps. A TLX ‘effort’ score is automaticallycomputed as a task load index by combining all the ratings provided by a participant. Because the provided TLX scoreswere based on the judgment of the participants, they are prone to subjective bias. Detecting outliers and removing thoseas noise from such ordinal data is a standard statistical process [91]. By following Tukey, we only considered valuesbetween the following two ranges as valid: (a) Lower limit: First quartile - 1.5 * IQR (b) Upper limit: Third quartile + 1.5* IQR . Here IQR stands for ‘Inter quartile range’, which is calculated as: IQR = 𝑄 − 𝑄
1. Q1 and Q3 stand for the firstand third quartile, respectively.
The survey was conducted in a Google doc form. We asked five questions, the first threequestions were related to API official documentation and the last two were related to SO:
Manuscript submitted to ACM
Table 9. Summary statistics of the correctness (scale 0 - 1), time (minutes) and effort (Scale 0-100) spent in the coding tasks. Baselines:SO = SO, DO = Official Documentation, EV = Everything including search engine
Opiner (OP) SO Δ 𝑆𝑂,𝑂𝑃 DO Δ 𝐷𝑂,𝑂𝑃 EV Δ 𝐸𝑉,𝑂𝑃
Correctness 0.62 ↓ ↓ ↓ Time 18.6 ↑ ↑ ↑ Effort 45.8 ↑ ↑ ↑ (1) How do the Opiner documentation offer improvements over formal documentation?(2) How can Opiner complement formal API documentation?(3) How would you envision for Opiner to complete the formal API documentation?(4) How do Opiner summaries offer improvements over the SO contents for API usage?(5) How would you envision Opiner to complement SO in your daily development activities?The first and the fourth question had a five-point Likert scale (Strongly Agree, Agree, Neutral, Disagree, and StronglyDisagree). The other questions were answered in text boxes. A total of 135 coding solutions were provided by the participants. We discarded 14 as invalid solutions (i.e., link/wrongAPI). Out of the 135 reported time, we discarded 23 as being spurious. 24 participants completed the TLX metrics(providing 96 entries in total), with each setting having six participants.Table 9 compares Opiner against the three baselines (API official documentation, SO, and Everything includingsearch engines) based on the three metrics: the average correctness of the provided solutions, and the average time andeffort spent to complete the solutions. The columns with Δ computes the percent difference between Opiner and eachbaseline for each metric. For example, Δ 𝑆𝑂,𝑂𝑃 for the metric ‘Effort’ is computed as: Δ 𝐸𝑓 𝑓 𝑜𝑟𝑡 𝑆𝑂 ,𝐸𝑓 𝑓 𝑜𝑟𝑡 𝑂𝑃 = 𝐸𝑓 𝑓 𝑜𝑟𝑡 𝑆𝑂 − 𝐸𝑓 𝑓 𝑜𝑟𝑡 𝑂𝑃 𝐸𝑓 𝑓 𝑜𝑟𝑡 𝑆𝑂 (4)Recall that the effort calculation uses the NASA TLX software effort index. Observation 3.
Opiner outperforms each baseline for each metric. The participants coded with the maximumcorrectness (0.62), with least time (18.6 minutes) and effort (45.8) per coding solution using Opiner.Among the three baselines, the ‘EV’ setting was the most useful for the developers. The EV setting contained everything(SO, API official documentation, Opiner, and search engine), i.e., the developers were able to consult all the documentationresources available in the Web. Opiner still outperformed the ‘EV’ setting; developers coded with less correctness (-13%),and spent more time (4%) and effort (16%) while using ‘EV’ setting than Opiner.
Observation 4.
The correctness of the provided solutions were the lowest when participants used only SO. For codecorrectness, the difference is the maximum between Opiner and SO (35% less correct code while using SO).The difficulty of developers to produce a correct solution just by relying on SO confirms previous findings that developersare facing difficulty while attempting to find the right solution from the millions of forum posts [92, 95].
Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 25 ' 2 ( 9 2 3 6 2 > 6 H W W L Q J @ 7 * ' 2 ( 9 2 3 6 2 > 6 H W W L Q J @ 7 - ' 2 ( 9 2 3 6 2 > 6 H W W L Q J @ 7 6 ' 2 ( 9 2 3 6 2 > 6 H W W L Q J @ 7 ; &