Robert M. Bell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert M. Bell is active.

Explore More

Publication

Featured researches published by Robert M. Bell.

IEEE Computer | 2009

Matrix Factorization Techniques for Recommender Systems

Yehuda Koren; Robert M. Bell; Chris Volinsky

As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.

IEEE Transactions on Software Engineering | 2005

Predicting the location and number of faults in large software systems

Thomas J. Ostrand; Elaine J. Weyuker; Robert M. Bell

Advance knowledge of which files in the next release of a large software system are most likely to contain the largest numbers of faults can be a very valuable asset. To accomplish this, a negative binomial regression model has been developed and used to predict the expected number of faults in each file of the next release of a system. The predictions are based on the code of the file in the current release, and fault and modification history of the file from previous releases. The model has been applied to two large industrial systems, one with a history of 17 consecutive quarterly releases over 4 years, and the other with nine releases over 2 years. The predictions were quite accurate: for each release of the two systems, the 20 percent of the files with the highest predicted number of faults contained between 71 percent and 92 percent of the faults that were actually detected, with the overall average being 83 percent. The same model was also used to predict which files of the first system were likely to have the highest fault densities (faults per KLOC). In this case, the 20 percent of the files with the highest predicted fault densities contained an average of 62 percent of the systems detected faults. However, the identified files contained a much smaller percentage of the code mass than the files selected to maximize the numbers of faults. The model was also used to make predictions from a much smaller input set that only contained fault data from integration testing and later. The prediction was again very accurate, identifying files that contained from 71 percent to 93 percent of the faults, with the average being 84 percent. Finally, a highly simplified version of the predictor selected files containing, on average, 73 percent and 74 percent of the faults for the two systems.

Recommender Systems Handbook | 2011

Advances in Collaborative Filtering

Yehuda Koren; Robert M. Bell

The collaborative filtering (CF) approach to recommenders h as recently enjoyed much interest and progress. The fact that it played a central role within the recently completed Netflix competition has contributed to its popula rity. This chapter surveys the recent progress in the field. Matrix factorization techn iques, which became a first choice for implementing CF, are described together with rec ent innovations. We also describe several extensions that bring competitive ac cur y into neighborhood methods, which used to dominate the field. The chapter demons trate how to utilize temporal models and implicit feedback to extend models accu r y. In passing, we include detailed descriptions of some the central methods d eveloped for tackling the challenge of the Netflix Prize competition.

Sigkdd Explorations | 2007

Lessons from the Netflix prize challenge

Robert M. Bell; Yehuda Koren

This article outlines the overall strategy and summarizes a few key innovations of the team that won the first Netflix progress prize.

international conference on data mining | 2007

Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

Robert M. Bell; Yehuda Koren

Recommender systems based on collaborative filtering predict user preferences for products or services by learning past user-item relationships. A predominant approach to collaborative filtering is neighborhood based (k-nearest neighbors), where a user-item preference rating is interpolated from ratings of similar items and/or users. We enhance the neighborhood-based approach leading to substantial improvement of prediction accuracy, without a meaningful increase in running time. First, we remove certain so-called global effects from the data to make the ratings more comparable, thereby improving interpolation accuracy. Second, we show how to simultaneously derive interpolation weights for all nearest neighbors, unlike previous approaches where each weight is computed separately. By globally solving a suitable optimization problem, this simultaneous interpolation accounts for the many interactions between neighbors leading to improved accuracy. Our method is very fast in practice, generating a prediction in about 0.2 milliseconds. Importantly, it does not require training many parameters or a lengthy preprocessing, making it very practical for large scale applications. Finally, we show how to apply these methods to the perceivably much slower user-oriented approach. To this end, we suggest a novel scheme for low dimensional embedding of the users. We evaluate these methods on the netflix dataset, where they deliver significantly better results than the commercial netflix cinematch recommender system.

international symposium on software testing and analysis | 2004

Where the bugs are

Thomas J. Ostrand; Elaine J. Weyuker; Robert M. Bell

The ability to predict which files in a large software system are most likely to contain the largest numbers of faults in the next release can be a very valuable asset. To accomplish this, a negative binomial regression model using information from previous releases has been developed and used to predict the numbers of faults for a large industrial inventory system. The files of each release were sorted in descending order based on the predicted number of faults and then the first 20% of the files were selected. This was done for each of fifteen consecutive releases, representing more than four years of field usage. The predictions were extremely accurate, correctly selecting files that contained between 71% and 92% of the faults, with the overall average being 83%. In addition, the same model was used on data for the same systems releases, but with all fault data prior to integration testing removed. The prediction was again very accurate, ranging from 71% to 93%, with the average being 84%. Predictions were made for a second system, and again the first 20% of files accounted for 83% of the identified faults. Finally, a highly simplified predictor was considered which correctly predicted 73% and 74% of the faults for the two systems.

Empirical Software Engineering | 2008

Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Elaine J. Weyuker; Thomas J. Ostrand; Robert M. Bell

Fault prediction by negative binomial regression models is shown to be effective for four large production software systems from industry. A model developed originally with data from systems with regularly scheduled releases was successfully adapted to a system without releases to identify 20% of that system’s files that contained 75% of the faults. A model with a pre-specified set of variables derived from earlier research was applied to three additional systems, and proved capable of identifying averages of 81, 94 and 76% of the faults in those systems. A primary focus of this paper is to investigate the impact on predictive accuracy of using data about the number of developers who access individual code units. For each system, including the cumulative number of developers who had previously modified a file yielded no more than a modest improvement in predictive accuracy. We conclude that while many factors can “spoil the broth” (lead to the release of software with too many defects), the number of developers is not a major influence.

international symposium on software testing and analysis | 2006

Looking for bugs in all the right places

Robert M. Bell; Thomas J. Ostrand; Elaine J. Weyuker

We continue investigating the use of a negative binomial regression model to predict which files in a large industrial software system are most likely to contain many faults in the next release. A new empirical study is described whose subject is an automated voice response system. Not only is this systems functionality substantially different from that of the earlier systems we studied (an inventory system and a service provisioning system), it also uses a significantly different software development process. Instead of having regularly scheduled releases as both of the earlier systems did, this system has what are referred to as continuous releases. We explore the use of three versions of the negative binomial regression model, as well as a simple lines-of-code based model, to make predictions for this system and discuss the differences observed from the earlier studies. Despite the different development process, the best version of the prediction model was able to identify, over the lifetime of the project, 20% of the systems files that contained, on average, nearly three quarters of the faults that were detected in the systems next releases.

Empirical Software Engineering | 2010

Comparing the effectiveness of several modeling methods for fault prediction

Elaine J. Weyuker; Thomas J. Ostrand; Robert M. Bell

We compare the effectiveness of four modeling methods—negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees—for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor variables included lines of code, file age, faults in the previous release, changes in the previous two releases, and programming language. To compare the effectiveness of the different models, we use two metrics—the percent of faults contained in the top 20% of files identified by the model, and a new, more general metric, the fault-percentile-average. The negative binomial regression and random forests models performed significantly better than recursive partitioning and Bayesian additive regression trees, as assessed by either of the metrics. For each of the three systems, the negative binomial and random forests models identified 20% of the files in each release that contained an average of 76% to 94% of the faults.

user interface software and technology | 2007

Rethinking the progress bar

Chris Harrison; Brian Amento; Stacey Kuznetsov; Robert M. Bell

Progress bars are prevalent in modern user interfaces. Typically, a linear function is employed such that the progress of the bar is directly proportional to how much work has been completed. However, numerous factors cause progress bars to proceed at non-linear rates. Additionally, humans perceive time in a non-linear way. This paper explores the impact of various progress bar behaviors on user perception of process duration. The results are used to suggest several design considerations that can make progress bars appear faster and ultimately improve users computing experience.

Explore More