Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jeremy Kubica is active.

Publication


Featured researches published by Jeremy Kubica.


knowledge discovery and data mining | 2013

Ad click prediction: a view from the trenches

H. Brendan McMahan; Gary Holt; D. Sculley; Michael Young; Dietmar Ebner; Julian Paul Grady; Lan Nie; Todd Phillips; Eugene Davydov; Daniel Golovin; Sharat Chikkerur; Dan Liu; Martin Wattenberg; Arnar Mar Hrafnkelsson; Tom Boulos; Jeremy Kubica

Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a deployed CTR prediction system. These include improvements in the context of traditional supervised learning based on an FTRL-Proximal online learning algorithm (which has excellent sparsity and convergence properties) and the use of per-coordinate learning rates. We also explore some of the challenges that arise in a real-world system that may appear at first to be outside the domain of traditional machine learning research. These include useful tricks for memory savings, methods for assessing and visualizing performance, practical methods for providing confidence estimates for predicted probabilities, calibration methods, and methods for automated management of features. Finally, we also detail several directions that did not turn out to be beneficial for us, despite promising results elsewhere in the literature. The goal of this paper is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system.


arXiv: Astrophysics | 2006

LSST: Comprehensive NEO detection, characterization, and orbits

Željko Ivezić; J. Anthony Tyson; Mario Juric; Jeremy Kubica; Andrew J. Connolly; Francesco Pierfederici; Alan W. Harris; Edward Bowell

(Abridged) The Large Synoptic Survey Telescope (LSST) is currently by far the most ambitious proposed ground-based optical survey. Solar System mapping is one of the four key scientific design drivers, with emphasis on efficient Near-Earth Object (NEO) and Potentially Hazardous Asteroid (PHA) detection, orbit determination, and characterization. In a continuous observing campaign of pairs of 15 second exposures of its 3,200 megapixel camera, LSST will cover the entire available sky every three nights in two photometric bands to a depth of V=25 per visit (two exposures), with exquisitely accurate astrometry and photometry. Over the proposed survey lifetime of 10 years, each sky location would be visited about 1000 times. The baseline design satisfies strong constraints on the cadence of observations mandated by PHAs such as closely spaced pairs of observations to link different detections and short exposures to avoid trailing losses. Equally important, due to frequent repeat visits LSST will effectively provide its own follow-up to derive orbits for detected moving objects. Detailed modeling of LSST operations, incorporating real historical weather and seeing data from LSST site at Cerro Pachon, shows that LSST using its baseline design cadence could find 90% of the PHAs with diameters larger than 250 m, and 75% of those greater than 140 m within ten years. However, by optimizing sky coverage, the ongoing simulations suggest that the LSST system, with its first light in 2013, can reach the Congressional mandate of cataloging 90% of PHAs larger than 140m by 2020.


Archive | 2011

Scaling Up Machine Learning: Parallel Large-Scale Feature Selection

Jeremy Kubica; Sameer Singh; Daria Sorokina

The set of features used by a learning algorithm can have a dramatic impact on the performance of that algorithm. Including extraneous features can make the learning problem harder by adding useless, noisy dimensions that lead to over-fitting and increased computational complexity. Conversely, leaving out useful features can deprive the model of important signals. The problem of feature selection is to find a subset of features that allows the learning algorithm to learn the “best” model in terms of measures such as accuracy or model simplicity. The problem of feature selection continues to grow in both importance and difficulty as extremely high-dimensional data sets become the standard in real-world machine learning tasks. Scalability can become a problem for even simple approaches. For example, common feature selection approaches that evaluate each new feature by training a new model containing that feature require a learning a linear number of models each time they add a new feature. This computational cost can add up quickly when we are iteratively adding many new features. Even techniques that use relatively computationally inexpensive tests of a feature’s value, such as mutual information, require at least linear time in the number of features being evaluated. As a simple illustrative example consider the task of classifying websites. In this case the data set could easily contain many millions of examples. Just including very basic features such as text unigrams on the page or HTML tags could easily provide many thousands of potential features for the model. Considering more complex attributes such as bigrams of words


siam international conference on data mining | 2009

Parallel Large Scale Feature Selection for Logistic Regression.

Sameer Singh; Jeremy Kubica; Scott Larsen; Daria Sorokina


Earth Moon and Planets | 2009

Solar System Science with LSST

Roger W. L. Jones; S. R. Chesley; Andrew J. Connolly; Alan W. Harris; Z. Ivezic; Zoran Knezevic; Jeremy Kubica; Andrea Milani; David E. Trilling


Archive | 2008

Feature selection for large scale models

Sameer Singh; Eldon S. Larsen; Jeremy Kubica; Andrew W. Moore


Archive | 2011

Predictive model performance

Jeremy Kubica


Archive | 2008

Efficient Methods For Object Searching In LSST And Pan-STARRS

Jonathan Myers; Francesco Pierfederici; Timothy S. Axelrod; Jeremy Kubica; Robert Jedicke; Larry Denneau


Archive | 2006

LSST: Taking Inventory of the Solar System

Steven R. Chesley; Andrew J. Connolly; Alan W. Harris; Zeljko Ivezic; Jeremy Kubica


Archive | 2006

Scalable Detection and Optimization of N-ARY Linkages

Andrew W. Moore; Jeff G. Schneider; Jeremy Kubica; Anna Goldenberg; Artur Dubrawski; John Ostlund; Patrick Pakyan Choi; Jeanie Komarek; Adam Goode; Purna Sarkar

Collaboration


Dive into the Jeremy Kubica's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andrew W. Moore

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sameer Singh

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mario Jurić

University of Washington

View shared research outputs
Researchain Logo
Decentralizing Knowledge