[PDF] Automatic API Usage Scenario Documentation from Technical Q&A Sites

Abstract

Full PDF

AAutomatic API Usage Scenario Documentation from Technical Q&A Sites

GIAS UDDIN,

University of Calgary, Canada

FOUTSE KHOMH,

Polytechnique Montréal, Canada

CHANCHAL K ROY,

University of Saskatchewan, Canada

The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse developmentneeds. To address shortcomings in API official documentation resources, several research has thus focused on augmenting officialAPI documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIsinto its official documentation. Recently, surveys of software developers find that developers in SO consider the combination of codeexamples and reviews about APIs as a form of API documentation, and that they consider such a combination to be more useful thanofficial API documentation when the official resources can be incomplete, ambiguous, incorrect, and outdated. Reviews are opinionatedsentences with positive/negative sentiments. However, we are aware of no previous research that attempts to automatically produceAPI documentation from SO by considering both API code examples and reviews. In this paper, we present two novel algorithms thatcan be used to automatically produce API documentation from SO by combining code examples and reviews towards those examples.The first algorithm is called statistical documentation, which shows the distribution of positivity and negativity around the codeexamples of an API using different metrics (e.g., star ratings). The second algorithm is called concept-based documentation, whichclusters similar and conceptually relevant usage scenarios. An API usage scenario contains a code example, a textual descriptionof the underlying task addressed by the code example, and the reviews (i.e., opinions with positive and negative sentiments) fromother developers towards the code example. We deployed the algorithms in Opiner, a web-based platform to aggregate informationabout APIs from online forums. We evaluated the algorithms by mining all Java JSON-based posts in SO and by conducting three userstudies based on produced documentation from the posts. The first study is a survey, where we asked the participants to compare ourproposed algorithms against a Javadoc-syle documentation format (called as Type-based documentation in Opiner). The participantswere asked to compare along four development scenarios (e.g., selection, documentation). The participants preferred our proposedtwo algorithms over type-based documentation. In our second user study, we asked the participants to complete four coding tasksusing Opiner and the API official and informal documentation resources. The participants were more effective and accurate whileusing Opiner. In a subsequent survey, more than 80% of participants asked the Opiner documentation platform to be integrated intothe formal API documentation to complement and improve the API official documentation.CCS Concepts: • Software and its engineering → Software creation and management ; Documentation ; Search-based software engineering . Additional Key Words and Phrases:

API, Documentation, Usage Scenario, Crowd-Sourced Developer Forum

ACM Reference Format:

Gias Uddin, Foutse Khomh, and Chanchal K Roy. 2021. Automatic API Usage Scenario Documentation from Technical Q&A Sites. 1, 1(February 2021), 43 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

Authors’ addresses: Gias Uddin, [email protected], University of Calgary, Canada; Foutse Khomh, [email protected], PolytechniqueMontréal, Canada; Chanchal K Roy, University of Saskatchewan, Canada, [email protected] to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].© 2021 Association for Computing Machinery.Manuscript submitted to ACMManuscript submitted to ACM a r X i v : . [ c s . S E ] F e b Uddin et al.

APIs (Application Programming Interfaces) offer interfaces to reusable software components [68]. Modern day rapidsoftware development is facilitated by the numerous open source APIs that are available for any given task. Suchis the popularity of APIs that the number of open source repositories in GitHub now is 100 million, an exponentialincrease over 67 million from only two years ago [27]. With the growing number of open source repositories and theAPIs supported by the repositories, developers now can face two major challenges: selection of an API amidst multiplechoices and then learning how to properly use it [92, 95]. Both tasks can be facilitated by the official API resources.Unfortunately, official API documentation can be incomplete, obsolete and incorrect [67, 71, 99], which often leavesdevelopers no choice but to look for alternative documentation and knowledge sharing resources [53, 92].The advent and proliferation of online developer forums has opened up an interesting avenue for developers to lookfor solutions of their development tasks in the forum posts [6, 89]. Among the numerous online forums, Stack Overflow(SO) is a large online community where millions of developers ask and answer questions about their programming needs.Developers post questions in SO about their different technical topics, such as selection, usage, and troubleshooting ofAPIs. Volunteers answer those questions, or make comments, as do participants in other social forums. To date, thereare around 120 million posts, out of which 48 million are questions/answers, and the rest (72 million) are comments.Around 11 million users visit SO and add 9K new questions to the site each day [22].The popularity and growing influence of SO has motivated a number of recent research efforts to produce APIdocumentation automatically from SO contents, such as adding code examples and interesting textual contents about aJava API type (e.g., a class) in the Javadocs [84, 89], recommending usage examples within a given IDE [53], summarizingAPI reviews (i.e., opinions with positive and negative sentiments) to assist in API selection [95], and so on. In ourprevious surveys of 178 developers, we find that developers consider the combination of code examples and API reviewsas a form of API documentation [92]. In fact, the developers consider such a combination as more valuable than officialAPI documentation, when the official resources can be lacking [92]. We are aware of no previous research that attemptsto automatically produce API documentation from SO by combining both code examples and reviews.In this paper, we propose a new documentation format for APIs that we can generate automatically by mining SO.The format considers both code example of an API and relevant reviews about the code example from other developersin the forum posts. We present two novel documentation algorithms based on the code examples and reviews. The firstalgorithm is called

Statistical Documentation which offers visualized usage and review statistics about code examplesof an API. The second algorithm in

Concept-Based Documentation which clusters usage scenarios API that areconceptually similar, e.g., one scenario consisting of creating an HTTP connection and another sending messages overthe HTTP connection. Using the two algorithms, we automatically produce the documentation of an API by mining SO.We deploy those documentation in Opiner [95]. Opiner was previously developed as an online prototype engine tosummarize reviews about an API from online developer forums. The overarching goal of Opiner is to become a onestop resource for crowd-sourced API documentation. In this paper, we have extended Opiner to also include our minedusage documentation of the APIs. Opiner is hosted at: http://opiner.polymtl.ca.In Figure 1, we show screenshots of Opiner usage documentation engine. The Opiner online web-site currentlyindexes the mined and documented usage scenarios of the APIs from a total of 3048 threads SO tagged as ‘Java+JSON’.This dataset was previously used to mine and summarize reviews about diverse Java APIs [95, 96]. As such, we expectto see code examples discussing about Java APIs for JSON parsing in the posts. A developer can search for the usagedocumentation of an API by searching its name in Opiner - see 1 under the front page of Opiner in Figure 1. The front

Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 3

Front Page of Opiner Online User InterfaceUsage sentiment overview for API JacksonCo-used APIs in the code examples using API Jackson The Concept-Based Documentation Page for JacksonA Usage Scenario is Expanded Upon ClickEach Usage Scenario Has a See Also SectionSee Also Section Can be Expanded Upon Click

The Statistical Documentation Page for Jackson

Fig. 1. Screenshots of Opiner online website with the deployed our two novel API usage scenario documentation algorithms. WhileOpiner online website offers other features, the screenshots here only show the extensions of Opiner that were implemented as partof the new contributions presented in this paper page also shows the APIs with the most number of usage scenarios. As shown in 2 , one of the most used APIs for JSONparsing in Java is Jackson. The circles 3 and 4 show some metrics that we developed to produce and visualize theStatistical Documentation. The circle 3 shows the overall distribution of positive and negative opinions in the forumposts where the code examples of the API Jackson were found. The circle 4 shows that the API javax.ws (blue pie) isfrequently used alongside the Jackson API in the same code examples. The javax.ws API is an official Java package thatis used to create RESTful services, where JSON is the primary medium of communication.The right part of Figure 1 shows screenshots of the concept-based documentation of Jackson in Opiner. In Opiner,each concept consists of one or more similar API usage scenarios. Each usage scenario of an API consists of a codeexample, a textual description of the underlying task addressed by the code example, and reviews (i.e., opinionatedsentences with positive/ negative sentiments) of the code examples as found in the comments to the post where the codeexample is found. Each concept is titled as the title of the most recently posted usage scenario. In circle 5 , we show themost recent three concepts for API Jackson. The concepts are sorted by time of their most recent usage scenarios. Themost recent concept is placed at the top of all concepts. Upon clicking on a concept title, we can see details of the mostrecent scenario in the concept as shown in circle 6 . Each concept is provided a star rating as the overall sentimentstowards all the usage scenarios that are grouped under the concept (see circle 6 ). Other relevant usage scenarios ofthe concept are grouped under a ‘See Also’ (see circle 7 ). Each usage scenario under the ‘See Also’ can be further

Manuscript submitted to ACM

Uddin et al.

Table 1. Contributions and Research Advances Made in our Paper

Contribution Summary Research Advancement

Algorithms We propose two novel al-gorithms to automaticallydocument API usage sce-narios from online devel-oper forum: Statistical andConcept-based. Previous related research focused mainly on linking code ex-ample or interesting insights directly to the Javadoc of an APItype [84, 89], or to complement API official documentation using SOcontents [3, 5, 11–13, 17, 20, 33–36, 42, 49, 50, 79, 85, 90, 101, 102, 107].Our algorithms offer directions to design innovative algorithms tocomplement and improve API official documentation.Techniques We implemented and de-ployed the algorithms inour tool, Opiner [96]. We are aware of no tool that can offer search and documentation fea-tures of API usage scenarios automatically collected from developerforums. The underlying documentation framework in Opiner canbe further extended with new API usage documentation algorithms.We conducted three userstudies to demonstrate theeffectiveness of the pro-posed usage documenta-tion algorithms over tradi-tional API documentationapproach and resources. The positive reception of our proposed API documentation formatsbased on the two algorithms opens up a new research area in soft-ware engineering to design innovative techniques and tools by har-nessing knowledge shared in online crowd-sourced forums. As wenoted in Sections 1 and 8, existing research [11, 12, 20, 35, 36, 79, 84,85, 89, 90] mostly focused on complementing the traditional officialdocumentation.explored (see circle 8 ). Each usage scenario is linked to the corresponding post in SO where the code example wasfound (by clicking the details word after the description text of a scenario as shown in 6 ).We evaluated the usefulness of the proposed two documentation algorithms over the traditional type-based docu-mentation approach [75]. In a type-based documentation, we adopt a Javadoc-style by clustering all the usage scenariosof an API type (e.g., a class) under the type name in Opiner website. Previously, Subramanian et al. [84] also promotedsimilar documentation format for Javadocs by automatically mining all the code examples of an API from SO. Given thateach usage scenario in our concept-based documentation also contains reviews and textual task description of a codeexample, we added all such information to each code example in our Type-based documentation. We then recruited 29developers (18 professional) and asked them to compare the three documentation types (i.e., Statistics, Concept-Basedand Type-Based) along four development scenarios (e.g., API selection, documentation) as originally used in [95]. Theparticipants preferred our proposed two algorithms over type-based documentation in all the development scenarios.We conducted a second user study using 31 developers to evaluate the effectiveness of the produced documentationin Opiner to complete coding tasks. Each participant completed four coding tasks using Opiner documentation, officialJavadocs, SO, and everything (i.e., including search engine). The participants, on average, wrote more correct code, inthe least amount time, and using the least effort while using Opiner compared to the other documentation resources. Ina subsequent survey, more than 80% participants preferred the Opiner documentation over existing SO posts. Morethan 85% of participants asked the Opiner documentation platform to be integrated into the formal API documentationto complement and improve the API official documentation.In summary, we advance the state of the art by presenting two novel algorithms to automatically document API usagescenarios from online developer forums with each deployed in an online API documentation prototype tool Opiner. Wedemonstrate the effectiveness of the algorithms and the tool to assist developers in their diverse development tasksusing three user studies. In Table 1, we outline the major contributions of this paper.

Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 5

This research borrows concepts and techniques from software engineering and opinion analysis. In this section, wepresent the major concepts and techniques upon which this study is founded.

In this paper, we investigated our API usage documentation techniques for both open-source and official Java APIs.As such, we analyzed SO posts tagged as “Java” where Java APIs are mostly discussed. However, the analysis and thetechniques developed can be applicable for any API. In particular, we adopt the definition of an API pioneered by MartinFowler. An API is a “set of rules and specifications that a software program can follow to access and make use of theservices and resources provided by its one or more modules” [103]. An API is identified by a name. An API consists ofone or more modules. Each module can have one or more source code packages. Each package can have one or morecode elements, such as classes, methods, etc. For the Java official APIs available through the Java SDKs, we consider anofficial Java package as an API. Similar format is adopted in the Java official documentation (e.g., the java.time packageis denoted as the Java date APIs in the new JavaSE official tutorial [47]).As shown in Figure 2, this is also how APIs are discussed and mentioned in SO. For example, there are three opensource Java APIs mentioned in the textual contents in Figure 2: Jackson, Google Gson, and org.json. In the code example,two packages from the official Java SDK are used along with Gson: java.util and java.lang.An API is normally designed to support specific development needs. Each need can be implemented as a functionalityin the API. Each functionality is denoted as ‘feature’ [68]. For example, the Gson API is developed to support theprocessing and manipulation of JSON-based inputs in Java. One feature of the Gson API is the conversion of JSONArrayinto a Java Object. As shown in Figure 2, this can be addressed by using two methods from the two classes of the GsonAPI: getType(. . . ) and fromJson(. . . ) from the classes TypeToken and Gson, respectively.

Bing Liu, in his book [38], defines opinion as: “An opinion is a quintuple < 𝑒 𝑖 , 𝑎 𝑖 𝑗 , 𝑠 𝑖 𝑗𝑘𝑙 , ℎ 𝑘 , 𝑡 𝑙 > , where 𝑒 𝑖 is the name ofthe entity, 𝑎 𝑖 𝑗 is an aspect of 𝑒 𝑖 , 𝑠 𝑖 𝑗𝑘𝑙 is the sentiment on aspect 𝑎 𝑖 𝑗 of entity 𝑒 𝑖 , ℎ 𝑘 is the opinion holder, and 𝑡 𝑙 is thetime when the opinion is expressed by ℎ 𝑘 ". The sentiment 𝑠 𝑖 𝑗𝑘𝑙 is positive or negative. Both entity ( 𝑒 𝑖 ) and aspect ( 𝑎 𝑖 𝑗 )represent the opinion target. An aspect about an entity can be about a property or a feature supported by the entity.For example, in Figure 2 the first comment (C1) has two sentences. The first sentence is ‘The code is buggy’. This anegative opinion about the the bug aspect of the provided code example. In this paper, we produce API documentation by combining code examples of an API with the relevant reviews towardsthe code examples. We use the notion ‘API Usage Scenario’ which is a composite of three items: a code exampleassociated to an API, a textual description of the underlying task addressed by the code example, and a set of reviews(i.e., opinions with positive and negative sentiments) towards the code example as provided in the comments to thepost where the code example is found. For example, from Figure 2, we can produce an API usage scenario based on thecode snippet as follows: (1) The code snippet is provided to complete a development task involving Java to convertJSON data to Java object using the Google Gson API. (2) A textual task description of the task by identifying relevant

Manuscript submitted to ACM

Uddin et al.

How to convert JSON data to JSON object

C1. The code is buggy. In the new version of GSON, TypeToken is not public, hence you will get constructor error.C2. Using actual version of GSON (2.2.4) it works perfectly!C3. I found org.json a bit buggy when converting a Json ArrayC4. I would recommend using the Jackson API.Check Java JSON website for competing APIs, such as Jackson, Gson, org.json. If you don’t need object de-serialization but to simply get an attribute, you can try org.json. Google Gson supports generics and nested beans that should map to a Java collection such as List. It’s pretty simple! You have a JSON object with several properties of which groups property represents an array of nested objects of the very same type. This can be parsed with Gson the following way:

Import java.util.*;import java.lang.reflect.Type;import com.google.gson.Gson;Import com.google.gson.reflect.TypeToken;Class Data { private String title; private long id; private List groups;}Type listType = new TypeToken>(){}.getType();List dataList = new Gson().fromJson(jsonArray, listType);

Question A n s w e r C o mm e n t s Fig. 2. How APIs are discussed in SO sentences, such as those immediately before the code snippet. (3) The reviews in comments C1 and C2 that are relevantto code snippet.Our decision to use API usage scenarios instead of simply code examples is influenced by seminal research of Carrollet al. [14] and Shull et al. [75]. Carroll et al. [14] proposed ‘minimal manual’ for technical documents by designing thedocumentation around specific tasks. In subsequent study, Shull et al. [75] find that such a task-based documentationformat is more useful than a traditional hierarchical documentation format [75]. In our API usage scenarios, eachscenario corresponds to a specific development task.Our decision to utilize reviews from comments is based on our previous findings from surveys of 178 softwaredevelopers from SO and GitHub [92]. We find that developers consider the combination of a code example in an answerpost and the reviews about it from other developers in the comments as a form of API documentation. We also find thatdevelopers consider such a combination more valuable than API official documentation, because the reviews are offeredby experts and are based on their real-world experience on the API usage. The usefulness of comments is confirmedwith empirical evidence by two recent studies as well, published at the same year of our surveys (i.e., 2019). Ren etal. [63] exploited comments to identify ‘controversial’ answers, i.e., answers that may be potentially incorrect. Theyfind that in those ‘controversial’ cases, comments are useful to offer a more accurate usage experience of the API. In aseparate study, Zhang et al. [108] manually analyzed a statistically significant sample of all SO comments and foundthat more than 75% of the comments are useful. Indeed, the number of comments is much more than the number ofanswers in SO (72 million vs 29 million as of 2020). Zhang et al. [108] further emphasized: “The amount of informationin comments cannot be neglected, with 23% of the answers having a commenting-thread that is longer than their actualanswer.”

These positive findings from Zhang et al. [108] highlight that most of the comments in SO are informative andnot noisy, and thus could be used to assist developers.

Our API documentation framework in Opiner currently supports our two proposed algorithms and a Javadoc-stylepresentation of the mined API usage scenarios. In the Javadoc of an API, individual pages are created to document eachtype of the API. A type for a Java API can be a class, annotation, or an interface. The Javadoc-style presentation in

Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 7

Forum Posts API Database

Extend Opiner & Deploy The Produced Documentation in OpinerParse Texts in Answer & Comments to Answer Parse Code Snippets in Answer Posts Detect Opinionated Sentences in Comments to Answer PostLink Code Snippet to API Generate Task Description Link Opinions to Code SnippetsProduce Statistical Documentation

Documentation Component

Produce Concept Based Documentation Produce Type-Based Documentation

Mined API Usage Scenarios

Proposed New Documentation Algorithms Javadoc-Based Extension

Produced API Statistical, Concept-Based, and Type-Based Documentation

Mining Component (Uddin et al. IST 2020)

Fig. 3. The major components in our proposed crowd-sourced API documentation framework

Opiner is called Type-based documentation, because we cluster all usage scenarios associated to an API type under thetype. For Java APIs, Javadocs are one of the most commonly known and used documentation formats - as noted in anumber of old and recent previous research [53, 75]. As such, previous research on automatic API documentation effortshave proposed to augment the Javadocs of an API type with code examples and relevant insights (e.g., specific conditionsof usage) from SO [84, 89]. However, previous research finds that a Javadoc-type hierarchical documentation format isnot a useful presentation format [99]. We thus innovate by proposing two novel API documentation algorithms, whichare different from the type-based documentation format.

The input to our API documentation framework is a list of forum posts and an API database. The final outputs are APIdocumentation based on the input. The framework consists of two major components (see Figure 3):(1) Mining Component. Takes as input the forum posts and an API database. The output is a list of API usagescenarios. Each API scenario consists of a code snippet, textual description, and reviews towards the snippet.(2) Documentation Component. Takes as input the mined usage scenarios of an API and produces three typesof documentation. The first two are based on our two proposed documentation algorithms (statistical andconcept-based). The third is type-based documentation, which is developed as an adaptation of Javadoc.Our API database consists of (1) all the Java APIs collected from the online Maven Central repository [78], and (2) allthe Java official APIs from JavaSE (6-8) and EE (6 and 7). In Table 2, we show the summary statistics of our API database. We consider a binary file (e.g., jar) of a project from Maven as an API Manuscript submitted to ACM

Uddin et al.

Table 2. Descriptive statistics of the Java APIs in the database

API Module Version Link

Given as input a forum post, first we preprocess the post contents and then we mine API usage scenarios from theparsed contents. The techniques supporting both steps are previously published in Uddin et al. [98]. We thus brieflydescribe the steps below and leave the details in [98].Given as input an answer to a question in SO, we divide its contents using three parts: (1) Code snippet in theanswer. We detect code snippets as the tokens wrapped with the < code > tag. (2) Textual contents in answer, and(3) Textual contents in the comments to the answer. The textual contents are tokenized into sentences. Opinionatedsentences are detected as sentences having positive or negative polarity. To detect opinionated sentences, we use theOpinerDSO algorithm [97] which offered performance comparable to state-of-the-art sentiment detection tools forsoftware engineering, e.g., Senti4SD [9]. Nevertheless, the Opiner framework is flexible to replace the OpinerDSO byany other sentiment detector. During the parsing of code snippets, we discard non-code and non-Java snippets (e.g.,XML, Javascript) using Language-specific naming conventions (similar to Dagenais and Robillard [18]). We parse avalid code example to identify API elements (types, methods, interfaces). We consult our API database to infer FQN(Fully Qualified Name) of the API elements. Given as input a parse code example and the textual contents in the postwhere code example is found, we produce an API usage scenario using three algorithms as follows.First, we heuristically link a code snippet to an API name mentioned in the textual contents of the forum postwhere the code snippet is discussed by consulting the textual contents and API elements found in the code snippet. Forexample, in Figure 2, the code snippet is linked to the Google Gson API. State-of-the-art algorithm like Baker [84] is notdesigned to link a code example to an API name mentioned in the textual contents of a forum post. For example, for thecode snippet in Figure 2, Baker [84] links it to three APIs (java.util, java.lang, and Google GSON). However, the codesnippet is provided to explain the conversion of JSON data to JSON object using the GSON API, as mentioned in thetextual contents. The code snippet also has the most number of API classes and methods matched with those foundin the Gson API. Second, we produce a textual description of the underlying task addressed by the code snippet. Thealgorithm does this by picking sentences in the textual contents of the forum post where the code snippet is discussedand where the API linked to the code snippet is referred to. For example, in Figure 2, the following sentence is pickedinto (among others) the description, “Google Gson supports generics and nested beans . . . ”, but not this sentence “if youdon’t need object deserialization, . . . you can try org.json”. The description is produced by combining beam search [61]with TextRank algorithm [43]. Third, we associate positive and negative opinions relevant to the code example byanalyzing the comments to the post. The algorithm does this by looking for references in the comments to the APIthat is linked to the code snippet. For example, in Figure 2, all opinionated sentences from comment C1 and C2 arelinked to the code snippet. Each algorithm shows a precision and recall of over 0.8 on multiple evaluation settings. Eachalgorithm outperforms the state-of-the-art baselines (e.g., Baker [84]). For further details we refer to the paper [98]. Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 9

Given as input the mined usage scenarios of an API, we produce three types of documentation: two using our twoproposed novel documentation algorithms (statistical, concept-based) and the third based on an adaption of Javadocpresentation format. We discuss the three documentation types below.

Algorithm 1. Statistical Documentation

We produce statistical documentation of the mined API usage scenarios to offer a visualized overview of the usage ofan API based on the mined usage scenarios. This documentation thus can complement the front/introductory page of aAPI documentation by offering visualized statistics of the API usage. In addition, this type of documentation can alsooffer a quick overview of the underlying quality of the code example (described below).To complement the front page of API documentation, we produce three types of visualizations:(1) Sentiment Overview. The overall sentiments in the reactions to the usage scenarios of the API, and(2) Co-Used APIs. The usage of other APIs in the same code examples of the API.(3) Co-Used API Types. How the various types (e.g., classes) of an API are often used together.In addition, we provide an overview of the quality of each API usage scenario as follows:(4) Star Rating of API Usage Scenario. Following previous research and adoption of 5-star ratings in online productreviews [37] and API review summarization [95, 96], we show the overall rating of each API usage scenario byanalyzing the positive and negative opinions related to a code example.(5) Star Rating of API Type. The overall star-rating of an API type based on the usage scenarios of the API wherethe type was used.We describe the approaches below.(1) Sentiment Overview. In our previous work on API review summarization [95], we observed that developers reportedto have increased awareness about an API while seeing the visualized overview of the API reviews from developerforums. We thus offer two types of overview of the quality of all code examples of an API. The first is a simple piechart by showing the overall counts of all the positive and negative reactions to the usage scenarios linked to an API.We do this aggregating all the positive and negative opinions across all the mined usage scenarios of an API. Thesecond is a time-series of the aggregated sentiments. We do this as follows. (a) We find the first and last month-yearof creation dates among the usage scenarios of an API. The creation date of a usage scenario is the time when thepost containing the usage scenario was created. (b) For each month-year, we create two bins: positive and negative.The posiive bin contains the count of all positive opinions the posts created in that month-year received due to thecode examples discussed in those posts and included in our API usage scenarios. (c) We create two time-series onefor positive and another for negative polarities. Each time-series chart has an X-axis as month-year and y-axis as thecounts of positive/negative polarity for that month-year. While the pie-chart offers overall sentiments of developerstowards the usage scenarios of an API, the time-series chart shows how the sentiments changed over time. For example,an API may have high positive reviews when it was first created, but it started get more negative reviews over time dueto obsolete code examples.(2) Co-Used APIs. For a given API, we show how many other APIs were used in the code examples where the APIwas used. Because each API offers features focusing on specific development scenarios, such insights can be helpful toknow which other APIs besides the given API a developer needs to learn to be able to properly utilize the API for an

𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 | (1)(5) Star Rating of API Type. We compute the overall five star rating (using Equation 1) of each type by taking intoaccount all the positive and negative reactions towards the usage scenarios grouped under the type. Algorithm 2. Concept Based Documentation

The input to the concept-based documentation algorithm is a list of all usage scenarios associated to an API. The outputis a list of concepts, where each concept contains a list of API usage scenarios that are similar to each other basedon the development tasks implemented. We propose concept-based documentation to present the mined API usagescenarios by grouping the scenarios around conceptually similar tasks. A concept in our documentation algorithm is a cluster of API usage scenarios that offer similar features or that are situationally relevant . Two API usage scenarios aresimilar, if the code examples use the same API elements (e.g., classes) and they have similar inputs and outputs. Forexample, if two code examples using the GSON API offer conversion of JSON objects to Java objects, they have similarinputs (e.g., JSON object) and similar outputs (e.g., Java objects). As such, we cluster the two code examples into oneconcept. Two API usage scenarios are situationally relevant, if the code examples have similar inputs, but they producedifferent outputs and vice versa. For example, two code examples using the GSON are situationally relevant if both takeas input a JSON object, but one offers conversion of the JSON object to Java objects and another to XML objects.Given that the code examples in a concept should have similar inputs, the code examples should then use similarAPI elements. That means the code examples in a concept should exhibit similar usage pattern involving similar APIelements. Therefore, given as input all the usage scenarios of an API, we first identify frequent itemsets as types (e.g.,classes) of an API that are found to be used frequently together. A set of code examples showing similar usage patternscan be similar, if they are clone of each other or if they conceptually relevant (e.g., similar input or similar output).Therefore, our approach has two major steps:(1) Detect Usage Patterns. We detect API types as itsemsets that are frequently used together in the scenarios. Weassign usage scenarios to the patterns.(2) Detect Concepts. We create clusters of API usage scenarios in the detected usage patterns that are similar toeach other based on inputs and/or outputs. Each cluster is denoted as a ‘concept’.

Manuscript submitted to ACM utomatic API Usage Scenario Documentation from Technical Q&A Sites 11

Import org.codehaus.jackson.map.ObjectMapper;ObjectMapper mapper = new ObjectMapper();registration = mapper.readValue(node.get(“registration”), Registration.class);Import org.codehaus.jackson.map.ObjectMapper;Import com.fasterxml.jackson.xml.XmlMapper;ObjectMapper mapper = new ObjectMapper();foo = mapper.readValue(jsonString, Foo.class);XmlMapper xmlMapper = new XmlMapper();String xml = xmlMapper.writeValueAsStirng(foo); Import com.fasterxml.jackson.core.JsonGenerator;Import com.fasterxml.jackson.databind.SerializerProvider;Import com.fasterxml.jackson.databind.JsonSerializer;Import com.fasterxml.jackson.core.JsonProcessingException;class MyNullKeySerializer extends JsonSerializer