Chunyang Chen
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chunyang Chen.
ieee international conference on software analysis evolution and reengineering | 2016
Chunyang Chen; Sa Gao; Zhenchang Xing
Third-party libraries are an integral part of many software projects. It often happens that developers need to find analogical libraries that can provide comparable features to the libraries they are already familiar with. Existing methods to find analogical libraries are limited by the community-curated list of libraries, blogs, or Q&A posts, which often contain overwhelming or out-of-date information. In this paper, we present a new approach to recommend analogical libraries based on a knowledge base of analogical libraries mined from tags of millions of Stack Overflow questions. The novelty of our approach is to solve analogical-libraries questions by combining state-of-the-art word embedding technique and domain-specific relational and categorical knowledge mined from Stack Overflow. We implement our approach in a proof-of-concept web application (https://graphofknowledge.appspot.com/similartech). The evaluation results show that our approach can make accurate recommendation of analogical libraries (Precision@1=0.81 and Precision@5=0.67). Google Analytics of the website traffic provides initial evidence of the potential usefulness of our web application for software developers.
computer software and applications conference | 2016
Chunyang Chen; Zhenchang Xing
Search engines and Question and Answer (Q&A) sites are the two commonly used ways for developers to seek information on the web. In this paper, we ask whether the questions developers ask on Q&A sites correlate with the information developers search for using search engines. We report on our empirical study to investigate the correlations of the 185 popular technical terms developers search on Google and ask on Stack Overflow using search statistics obtained from Google Trends over a 574-weeks span and question statistics derived from Stack Overflow Data Dump over a 300-weeks span. Our study shows that technical terms searched and asked have strong correlation over time. Search and asking of newer, specific technical terms have stronger correlation, compared with older, general technical terms. We have developed a web interface for accessing our dataset and empirical results available at http://comparetrend.appspot.com/. Inspired by our empirical results, we present future directions that can harness Stack Overflow as sampled data for supporting time-aware search and semantic search.
automated software engineering | 2016
Chunyang Chen; Zhenchang Xing
Third-party libraries are an integral part of many software projects. It often happens that developers need to find analogical libraries that can provide comparable features to the libraries they are already familiar with. Existing methods to find analogical libraries are limited by the community-curated list of libraries, blogs, or Q&A posts, which often contain overwhelming or out-of-date information. This paper presents our tool SimilarTech (https://graphofknowledge. appspot.com/similartech) that makes it possible to automatically recommend analogical libraries by incorporating tag embeddings and domain-specific relational and categorical knowledge mined from Stack Overflow. SimilarTech currently supports recommendation of 6,715 libraries across 6 different programming languages. We release our SimilarTech website for public use. The SimilarTech website attracts more than 2,400 users in the past 6 months. We observe two typical usage patterns of our website in the website visit logs which can satisfy different information needs of developers. The demo video can be seen at https://youtu.be/ubx8h4D4ieE.
international conference on software engineering | 2017
Chunyang Chen; Zhenchang Xing; Ximing Wang
Informal discussions on social platforms (e.g., Stack Overflow) accumulates a large body of programming knowledge in natural language text. Natural language process (NLP) techniques can be exploited to harvest this knowledge base for software engineering tasks. To make an effective use of NLP techniques, consistent vocabulary is essential. Unfortunately, the same concepts are often intentionally or accidentally mentioned in many different morphological forms in informal discussions, such as abbreviations, synonyms and misspellings. Existing techniques to deal with such morphological forms are either designed for general English or predominantly rely on domain-specific lexical rules. A thesaurus of software-specific terms and commonly-used morphological forms is desirable for normalizing software engineering text, but very difficult to build manually. In this work, we propose an automatic approach to build such a thesaurus. Our approach identifies software-specific terms by contrasting software-specific and general corpuses, and infers morphological forms of software-specific terms by combining distributed word semantics, domain-specific lexical rules and transformations, and graph analysis of morphological relations. We evaluate the coverage and accuracy of the resulting thesaurus against community-curated lists of software-specific terms, abbreviations and synonyms. We also manually examine the correctness of the identified abbreviations and synonyms in our thesaurus. We demonstrate the usefulness of our thesaurus in a case study of normalizing questions from Stack Overflow and CodeProject.
international conference on software maintenance | 2016
Chunyang Chen; Zhenchang Xing; Lei Han
Understanding the technology landscape is crucial for the success of the software-engineering project or organization. However, it can be difficult, even for experienced developers, due to the proliferation of similar technologies, the complex and often implicit dependencies among technologies, and the rapid development in which technology landscape evolves. Developers currently rely on online documents such as tutorials and blogs to find out best available technologies, technology correlations, and technology trends. Although helpful, online documents often lack objective, consistent summary of the technology landscape. In this paper, we present the TechLand system for assisting technology landscape inquiries with categorical, relational and trending knowledge of technologies that is aggregated from millions of Stack Overflow questions mentioning the relevant technologies. We implement the TechLand system and evaluate the usefulness of the system against the community answers to 100 technology questions on Stack Overflow and by field deployment and a lab study. Our evaluation shows that the TechLand system can assist developers in technology landscape inquiries by providing direct, objective, and aggregated information about available technologies, technology correlations and technology trends. Developers currently rely on online documents such as tutorials and blogs to find out best available technologies, technology correlations, and technology trends. Although helpful, online documents often lack objective, consistent summary of the technology landscape. In this paper, we present the TechLand system for assisting technology landscape inquiries with categorical, relational and trending knowledge of technologies that is aggregated from millions of Stack Overflow questions mentioning the relevant technologies. We implement the TechLand system and evaluate the usefulness of the system against the community answers to 100 technology questions on Stack Overflow and by field deployment and a lab study. Our evaluation shows that the TechLand system can assist developers in technology landscape inquiries by providing direct, objective, and aggregated information about available technologies, technology correlations and technology trends.
automated software engineering | 2016
Guibin Chen; Chunyang Chen; Zhenchang Xing; Bowen Xu
The lingual barrier limits the ability of millions of non-English speaking developers to make effective use of the tremendous knowledge in Stack Overflow, which is archived in English. For cross-lingual question retrieval, one may use translation-based methods that first translate the non-English queries into English and then perform monolingual question retrieval in English. However, translation-based methods suffer from semantic deviation due to inappropriate translation, especially for domain-specific terms, and lexical gap between queries and questions that share few words in common. To overcome the above issues, we propose a novel cross-lingual question retrieval based on word embed-dings and convolutional neural network (CNN) which are the state-of-the-art deep learning techniques to capture word- and sentence-level semantics. The CNN model is trained with large amounts of examples from Stack Overflow duplicate questions and their corresponding translation by machine, which guides the CNN to learn to capture informative word and sentence features to recognize and quantify semantic similarity in the presence of semantic deviations and lexical gaps. A uniqueness of our approach is that the trained CNN can map documents in two languages (e.g., Chinese queries and English questions) in a dual-language vector space, and thus reduce the cross-lingual question retrieval problem to a simple k-nearest neighbors search problem in the dual-language vector space, where no query or question translation is required. Our evaluation shows that our approach significantly outperforms the translation-based method, and can be extended to dual-language documents retrieval from different sources.
automated software engineering | 2018
Lei Ma; Felix Juefei-Xu; Fuyuan Zhang; Jiyuan Sun; Minhui Xue; Bo Li; Chunyang Chen; Ting Su; Li Li; Yang Liu; Jianjun Zhao; Yadong Wang
Deep learning (DL) defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. We have seen wide adoption of DL in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applied to real-world applications. Currently, the testing adequacy of a DL system is usually measured by the accuracy of test data. Considering the limitation of accessible high quality test data, good accuracy performance on test data can hardly provide confidence to the testing adequacy and generality of DL systems. Unlike traditional software systems that have clear and controllable logic and functionality, the lack of interpretability in a DL system makes system analysis and defect detection difficult, which could potentially hinder its real-world deployment. In this paper, we propose DeepGauge, a set of multi-granularity testing criteria for DL systems, which aims at rendering a multi-faceted portrayal of the testbed. The in-depth evaluation of our proposed testing criteria is demonstrated on two well-known datasets, five DL systems, and with four state-of-the-art adversarial attack techniques against DL. The potential usefulness of DeepGauge sheds light on the construction of more generic and robust DL systems.
automated software engineering | 2018
Yi Huang; Chunyang Chen; Zhenchang Xing; Tian Lin; Yang Liu
Developers can use different technologies for many software development tasks in their work. However, when faced with several technologies with comparable functionalities, it is not easy for developers to select the most appropriate one, as comparisons among technologies are time-consuming by trial and error. Instead, developers can resort to expert articles, read official documents or ask questions in Q&A sites for technology comparison, but it is opportunistic to get a comprehensive comparison as online information is often fragmented or contradictory. To overcome these limitations, we propose the diffTech system that exploits the crowdsourced discussions from Stack Overflow, and assists technology comparison with an informative summary of different comparison aspects. We first build a large database of comparable software technologies by mining tags in Stack Overflow, and locate comparative sentences about comparable technologies with NLP methods. We further mine prominent comparison aspects by clustering similar comparative sentences and represent each cluster with its keywords. The evaluation demonstrates both the accuracy and usefulness of our model and we implement a practical website for public use.
Empirical Software Engineering | 2018
Chunyang Chen; Zhenchang Xing; Yang Liu
Third-party libraries are an integral part of many software projects. It often happens that developers need to find analogical libraries that can provide comparable features to the libraries they are already familiar with for different programming languages or different mobile platforms. Existing methods to find analogical libraries are limited by the community-curated list of libraries, blogs, or Q&A posts, which often contain overwhelming or out-of-date information. In this paper, we present a new approach to recommend analogical libraries based on a knowledge base of analogical libraries mined from tags of millions of Stack Overflow questions. The novelty of our approach is to solve analogical-library questions by combining state-of-the-art word embedding technique and domain-specific relational and categorical knowledge mined from Stack Overflow. Given a library and a recommended analogical library, our approach further extracts questions and answer snippets in Stack Overflow about comparison of analogical libraries, which can potentially offer useful information scents for developers to further their investigation of the recommended analogical libraries. We implement our approach in a proof-of-concept web application and more than 34.8 thousands of users visited our website from November 2015 to August 2017. Our evaluation shows that our approach can make accurate recommendation of analogical libraries. We also demonstrate the usefulness of our analogical-library recommendations by using them to answer analogical-library questions in Stack Overflow. Google Analytics of our website traffic and analysis of the visitors’ interaction with website contents provide the insights into the usage patterns and the system design of our web application.
empirical software engineering and measurement | 2016
Chunyang Chen; Zhenchang Xing