Amritanshu Agrawal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amritanshu Agrawal is active.

Explore More

Publication

Featured researches published by Amritanshu Agrawal.

Information & Software Technology | 2018

What is wrong with topic modeling? And how to fix it using search-based software engineering

Amritanshu Agrawal; Wei Fu; Tim Menzies

Abstract Context Topic modeling finds human-readable structures in unstructured textual data. A widely used topic modeling technique is Latent Dirichlet allocation. When running on different datasets, LDA suffers from “order effects”, i.e., different topics are generated if the order of training data is shuffled. Such order effects introduce a systematic error for any study. This error can relate to misleading results; specifically, inaccurate topic descriptions and a reduction in the efficacy of text mining classification results. Objective To provide a method in which distributions generated by LDA are more stable and can be used for further analysis. Method We use LDADE, a search-based software engineering tool which uses Differential Evolution (DE) to tune the LDA’s parameters. LDADE is evaluated on data from a programmer information exchange site (Stackoverflow), title and abstract text of thousands of Software Engineering (SE) papers, and software defect reports from NASA. Results were collected across different implementations of LDA (Python+Scikit-Learn, Scala+Spark) across Linux platform and for different kinds of LDAs (VEM, Gibbs sampling). Results were scored via topic stability and text mining classification accuracy. Results In all treatments: (i) standard LDA exhibits very large topic instability; (ii) LDADE’s tunings dramatically reduce cluster instability; (iii) LDADE also leads to improved performances for supervised as well as unsupervised learning. Conclusion Due to topic instability, using standard LDA with its “off-the-shelf” settings should now be depreciated. Also, in future, we should require SE papers that use LDA to test and (if needed) mitigate LDA topic instability. Finally, LDADE is a candidate technology for effectively and efficiently reducing that instability.

international conference on software engineering | 2018

Is "better data" better than "better data miners"?: on the benefits of tuning SMOTE for defect prediction

Amritanshu Agrawal; Tim Menzies

We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-performance criteria while (b) fixing the weaker regions of the training data (using SMOTUNED, which is an auto-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions when applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for defects. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.

international conference on software engineering | 2018

What is the connection between issues, bugs, and enhancements?: lessons learned from 800+ software projects

Rahul Krishna; Amritanshu Agrawal; Akond Rahman; Alexander Sobran; Tim Menzies

Agile teams juggle multiple tasks so professionals are often assigned to multiple projects, especially in service organizations that monitor and maintain large suites of software for a large user base. If we could predict changes in project conditions change, then managers could better adjust the staff allocated to those projects. This paper builds such a predictor using data from 832 open source and proprietary projects. Using a time series analysis of the last 4 months of issues, we can forecast how many bug reports and enhancement requests will be generated the next month. The forecasts made in this way only require a frequency count of these issue reports (and do not require an historical record of bugs found in the project). That is, this kind of predictive model is very easy to deploy within a project. We hence strongly recommend this method for forecasting future issues, enhancements, and bugs in a project.

international conference on software engineering | 2017

Trends in topics at SE conferences (1993--2013)

George Mathew; Amritanshu Agrawal; Tim Menzies

Using topic modeling, we analyse the titles and abstracts of nearly 10,000 papers from 20 years published in 11 top-ranked Software Engineering(SE) conferences between 1993 to 2013. Seven topics are identified as the dominant themes in modern software engineering. We show that these topics are not static, rather, some of them are becoming decidedly less prominent over time (modeling) while others are become very prominent indeed (defect analysis). By clustering conferences according to the topics they publish, we identify four large groups of SE conferences, e.g. ASE, FSE and ICSE publish mostly the same work (exceptions: there are more program analysis results in FSE than in ASE or ICSE). Using these results, we offer numerous recommendations including how to plan an individuals research program, when to make or merge conferences, and how to encourage a broader range of topics at SE conferences. An extended version of this paper, that analyzes more conferences and papers, is available on https://goo.gl/mVdyfj.

mining software repositories | 2018

Data-driven search-based software engineering

Vivek Nair; Amritanshu Agrawal; Jianfeng Chen; Wei Fu; George Mathew; Tim Menzies; Leandro L. Minku; Markus Wagner; Zhe Yu

This paper introduces Data-Driven Search-based Software Engineering (DSE), which combines insights from Mining Software Repositories (MSR) and Search-based Software Engineering (SBSE). While MSR formulates software engineering problems as data mining problems, SBSE reformulates Software Engineering (SE) problems as optimization problems and use meta-heuristic algorithms to solve them. Both MSR and SBSE share the common goal of providing insights to improve software engineering. The algorithms used in these two areas also have intrinsic relationships. We, therefore, argue that combining these two fields is useful for situations (a)~which require learning from a large data source or (b)~when optimizers need to know the lay of the land to find better solutions, faster. This paper aims to answer the following three questions: (1) What are the various topics addressed by DSE?, (2) What types of data are used by the researchers in this area?, and (3) What research approaches do researchers use? The paper briefly sets out to act as a practical guide to develop new DSE techniques and also to serve as a teaching resource. This paper also presents a resource (tiny.cc/data-se) for exploring DSE. The resource contains 89 artifacts which are related to DSE, divided into 13 groups such as requirements engineering, software product lines, software processes. All the materials in this repository have been used in recent software engineering papers; i.e., for all this material, there exist baseline results against which researchers can comparatively assess their new ideas.

international workshop on big data software engineering | 2016