Joshua Seth Herbach
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joshua Seth Herbach.
very large data bases | 2009
Biswanath Panda; Joshua Seth Herbach; Sugato Basu; Roberto J. Bayardo
Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory on a single machine. While more scalable implementations of tree learning have been proposed, they typically require specialized parallel computing architectures. In contrast, the majority of Googles computing infrastructure is based on commodity hardware. In this paper, we describe PLANET: a scalable distributed framework for learning tree models over large datasets. PLANET defines tree learning as a series of distributed computations, and implements each one using the MapReduce model of distributed computation. We show how this framework supports scalable construction of classification and regression trees, as well as ensembles of such models. We discuss the benefits and challenges of using a MapReduce compute cluster for tree learning, and demonstrate the scalability of this approach by applying it to a real world learning task from the domain of computational advertising.
Archive | 2013
Joshua Seth Herbach; Nathaniel Fairfield
Archive | 2015
Nathaniel Fairfield; Joshua Seth Herbach; Vadim Furman
Archive | 2013
Joshua Seth Herbach; Nathaniel Fairfield; Peter Colijn
Archive | 2014
Nathaniel Fairfield; Joshua Seth Herbach; Andrew Hughes Chatham; Michael Steven Montemerlo
Archive | 2015
Joshua Seth Herbach; Nathaniel Fairfield
Archive | 2017
Joshua Seth Herbach; Nathaniel Fairfield
Archive | 2016
Peter Colijn; Joshua Seth Herbach; Matthew Paul Mcnaughton
Archive | 2014
Nathaniel Fairfield; Joshua Seth Herbach
Archive | 2011
Biswanath Panda; Joshua Seth Herbach; Sugato Basu; Roberto J. Bayardo