Baishakhi Ray | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Baishakhi Ray is active.

Explore More

Publication

Featured researches published by Baishakhi Ray.

symposium on operating systems principles | 2011

PTask: operating system abstractions to manage GPUs as compute devices

Christopher J. Rossbach; Jon Currey; Mark Silberstein; Baishakhi Ray; Emmett Witchel

We propose a new set of OS abstractions to support GPUs and other accelerator devices as first class computing resources. These new abstractions, collectively called the PTask API, support a dataflow programming model. Because a PTask graph consists of OS-managed objects, the kernel has sufficient visibility and control to provide system-wide guarantees like fairness and performance isolation, and can streamline data movement in ways that are impossible under current GPU programming models. Our experience developing the PTask API, along with a gestural interface on Windows 7 and a FUSE-based encrypted file system on Linux show that the PTask API can provide important system-wide guarantees where there were previously none, and can enable significant performance improvements, for example gaining a 5× improvement in maximum throughput for the gestural interface.

international conference on software maintenance | 2013

An Empirical Study of API Stability and Adoption in the Android Ecosystem

Tyler McDonnell; Baishakhi Ray; Miryung Kim

When APIs evolve, clients make corresponding changes to their applications to utilize new or updated APIs. Despite the benefits of new or updated APIs, developers are often slow to adopt the new APIs. As a first step toward understanding the impact of API evolution on software ecosystems, we conduct an in-depth case study of the co-evolution behavior of Android API and dependent applications using the version history data found in github. Our study confirms that Android is evolving fast at a rate of 115 API updates per month on average. Client adoption, however, is not catching up with the pace of API evolution. About 28% of API references in client applications are outdated with a median lagging time of 16 months. 22% of outdated API usages eventually upgrade to use newer API versions, but the propagation time is about 14 months, much slower than the average API release interval (3 months). Fast evolving APIs are used more by clients than slow evolving APIs but the average time taken to adopt new versions is longer for fast evolving APIs. Further, API usage adaptation code is more defect prone than the one without API usage adaptation. This may indicate that developers avoid API instability.

foundations of software engineering | 2014

A large scale study of programming languages and code quality in github

Baishakhi Ray; Daryl Posnett; Vladimir Filkov; Premkumar T. Devanbu

What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (729 projects, 80 Million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static v.s. dynamic typing, strong v.s. weak typing on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages.

human factors in computing systems | 2015

Gender and Tenure Diversity in GitHub Teams

Bogdan Vasilescu; Daryl Posnett; Baishakhi Ray; Mark van den Brand; Alexander Serebrenik; Premkumar T. Devanbu; Vladimir Filkov

Software development is usually a collaborative venture. Open Source Software (OSS) projects are no exception; indeed, by design, the OSS approach can accommodate teams that are more open, geographically distributed, and dynamic than commercial teams. This, we find, leads to OSS teams that are quite diverse. Team diversity, predominantly in offline groups, is known to correlate with team output, mostly with positive effects. How about in OSS? Using GitHub, the largest publicly available collection of OSS projects, we studied how gender and tenure diversity relate to team productivity and turnover. Using regression modeling of GitHub data and the results of a survey, we show that both gender and tenure diversity are positive and significant predictors of productivity, together explaining a sizable fraction of the data variability. These results can inform decision making on all levels, leading to better outcomes in recruiting and performance.

mining software repositories | 2012

An empirical study of supplementary bug fixes

Jihun Park; Miryung Kim; Baishakhi Ray; Doo-Hwan Bae

A recent study finds that errors of omission are harder for programmers to detect than errors of commission. While several change recommendation systems already exist to prevent or reduce omission errors during software development, there have been very few studies on why errors of omission occur in practice and how such errors could be prevented. In order to understand the characteristics of omission errors, this paper investigates a group of bugs that were fixed more than once in open source projects - those bugs whose initial patches were later considered incomplete and to which programmers applied supplementary patches. Our study on Eclipse JDT core, Eclipse SWT, and Mozilla shows that a significant portion of resolved bugs (22% to 33%) involves more than one fix attempt. Our manual inspection shows that the causes of omission errors are diverse, including missed porting changes, incorrect handling of conditional statements, or incomplete refactorings, etc. While many consider that missed updates to code clones often lead to omission errors, only a very small portion of supplementary patches (12% in JDT, 25% in SWT, and 9% in Mozilla) have a content similar to their initial patches. This implies that supplementary change locations cannot be predicted by code clone analysis alone. Furthermore, 14% to 15% of files in supplementary patches are beyond the scope of immediate neighbors of their initial patch locations - they did not overlap with the initial patch locations nor had direct structural dependencies on them (e.g. calls, accesses, subtyping relations, etc.). These results call for new types of omission error prevention approaches that complement existing change recommendation systems.

foundations of software engineering | 2012

A case study of cross-system porting in forked projects

Baishakhi Ray; Miryung Kim

Software forking---creating a variant product by copying and modifying an existing product---is often considered an ad hoc, low cost alternative to principled product line development. To maintain such forked products, developers often need to port an existing feature or bug-fix from one product variant to another. As a first step towards assessing whether forking is a sustainable practice, we conduct an in-depth case study of 18 years of the BSD product family history. Our study finds that maintaining forked projects involves significant effort of porting patches from other projects. Cross-system porting happens periodically and the porting rate does not necessarily decrease over time. A significant portion of active developers participate in porting changes from peer projects. Surprisingly, ported changes are less defect-prone than non-ported changes. Our work is the first to comprehensively characterize the temporal, spatial, and developer dimensions of cross-system porting in the BSD family, and our tool Repertoire is the first automated tool for detecting ported edits with high accuracy of 94% precision and 84% recall. Our study finds that the upkeep work of porting changes from peer projects is significant and currently, porting practice seems to heavily depend on developers doing their porting job on time. This result calls for new techniques to automate cross-system porting to reduce the maintenance cost of forked projects.

international conference on software engineering | 2018

DeepTest: automated testing of deep-neural-network-driven autonomous cars

Yuchi Tian; Kexin Pei; Suman Jana; Baishakhi Ray

Recent advances in Deep Neural Networks (DNNs) have led to the development of DNN-driven autonomous cars that, using sensors like camera, LiDAR, etc., can drive without any human intervention. Most major manufacturers including Tesla, GM, Ford, BMW, and Waymo/Google are working on building and testing different types of autonomous vehicles. The lawmakers of several US states including California, Texas, and New York have passed new legislation to fast-track the process of testing and deployment of autonomous vehicles on their roads. However, despite their spectacular progress, DNNs, just like traditional software, often demonstrate incorrect or unexpected corner-case behaviors that can lead to potentially fatal collisions. Several such real-world accidents involving autonomous cars have already happened including one which resulted in a fatality. Most existing testing techniques for DNN-driven vehicles are heavily dependent on the manual collection of test data under different driving conditions which become prohibitively expensive as the number of test conditions increases. In this paper, we design, implement, and evaluate DeepTest, a systematic testing tool for automatically detecting erroneous behaviors of DNN-driven vehicles that can potentially lead to fatal crashes. First, our tool is designed to automatically generated test cases leveraging real-world changes in driving conditions like rain, fog, lighting conditions, etc. DeepTest systematically explore different parts of the DNN logic by generating test inputs that maximize the numbers of activated neurons. DeepTest found thousands of erroneous behaviors under different realistic driving conditions (e.g., blurring, rain, fog, etc.) many of which lead to potentially fatal crashes in three top performing DNNs in the Udacity self-driving car challenge.

automated software engineering | 2013

Detecting and characterizing semantic inconsistencies in ported code

Baishakhi Ray; Miryung Kim; Suzette Person; Neha Rungta

Adding similar features and bug fixes often requires porting program patches from reference implementations and adapting them to target implementations. Porting errors may result from faulty adaptations or inconsistent updates. This paper investigates (1) the types of porting errors found in practice, and (2) how to detect and characterize potential porting errors. Analyzing version histories, we define five categories of porting errors, including incorrect control- and data-flow, code redundancy, inconsistent identifier renamings, etc. Leveraging this categorization, we design a static control- and data-dependence analysis technique, SPA, to detect and characterize porting inconsistencies. Our evaluation on code from four open-source projects shows that SPA can detect porting inconsistencies with 65% to 73% precision and 90% recall, and identify inconsistency types with 58% to 63% precision and 92% to 100% recall. In a comparison with two existing error detection tools, SPA improves precision by 14 to 17 percentage points.

foundations of software engineering | 2012

REPERTOIRE: a cross-system porting analysis tool for forked software projects

Baishakhi Ray; Christopher Wiley; Miryung Kim

To create a new variant of an existing project, developers often copy an existing codebase and modify it. This process is called software forking. After forking software, developers often port new features or bug fixes from peer projects. Repertoire analyzes repeated work of cross-system porting among forked projects. It takes the version histories as input and identifies ported edits by comparing the content of individual patches. It also shows users the extent of ported edits, where and when the ported edits occurred, which developers ported code from peer projects, and how long it takes for patches to be ported.

international symposium on software testing and analysis | 2017

GitcProc: a tool for processing and classifying GitHub commits

Casey Casalnuovo; Yagnik Suchak; Baishakhi Ray; Cindy Rubio-González

Sites such as GitHub have created a vast collection of software artifacts that researchers interested in understanding and improving software systems can use. Current tools for processing such GitHub data tend to target project metadata and avoid source code processing, or process source code in a manner that requires significant effort for each language supported. This paper presents GitcProc, a lightweight tool based on regular expressions and source code blocks, which downloads projects and extracts their project history, including fine-grained source code information and development time bug fixes. GitcProc can track changes to both single-line and block source code structures and associate these changes to the surrounding function context with minimal set up required from users. We demonstrate GitcProcs ability to capture changes in multiple languages by evaluating it on C, C++, Java, and Python projects, and show it finds bug fixes and the context of source code changes effectively with few false positives.

Explore More