Yucheng Low
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yucheng Low.
very large data bases | 2012
Yucheng Low; Danny Bickson; Joseph E. Gonzalez; Carlos Guestrin; Aapo Kyrola; Joseph M. Hellerstein
While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronous, dynamic, graph-parallel computation while ensuring data consistency and achieving a high degree of parallel performance in the shared-memory setting. In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. We develop graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency. We also introduce fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can be easily implemented by exploiting the GraphLab abstraction itself. Finally, we evaluate our distributed implementation of the GraphLab abstraction on a large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains over Hadoop-based implementations.
knowledge discovery and data mining | 2011
Amr Ahmed; Yucheng Low; Mohamed Aly; Vanja Josifovski; Alexander J. Smola
Historical user activity is key for building user profiles to predict the user behavior and affinities in many web applications such as targeting of online advertising, content personalization and social recommendations. User profiles are temporal, and changes in a users activity patterns are particularly useful for improved prediction and recommendation. For instance, an increased interest in car-related web pages may well suggest that the user might be shopping for a new vehicle.In this paper we present a comprehensive statistical framework for user profiling based on topic models which is able to capture such effects in a fully \emph{unsupervised} fashion. Our method models topical interests of a user dynamically where both the user association with the topics and the topics themselves are allowed to vary over time, thus ensuring that the profiles remain current. We describe a streaming, distributed inference algorithm which is able to handle tens of millions of users. Our results show that our model contributes towards improved behavioral targeting of display advertising relative to baseline models that do not incorporate topical and/or temporal dependencies. As a side-effect our model yields human-understandable results which can be used in an intuitive fashion by advertisers.
knowledge discovery and data mining | 2011
Yucheng Low; Deepak Agarwal; Alexander J. Smola
Content personalization is a key tool in creating attractive websites. Synergies can be obtained by integrating personalization between several Internet properties. In this paper we propose a hierarchical Bayesian model to address these issues. Our model allows the integration of multiple properties without changing the overall structure, which makes it easily extensible across large Internet portals. It relies at its lowest level on Latent Dirichlet Allocation, while making use of latent side features for cross-property integration. We demonstrate the efficiency of our approach by analyzing data from several properties of a major Internet portal.
conference on information and knowledge management | 2012
Yucheng Low; Alice X. Zheng
In this paper, we propose a novel method to efficiently compute the top-K most similar items given a query item, where similarity is defined by the set of items that have the highest vector inner products with the query. The task is related to the classical k-Nearest-Neighbor problem, and is widely applicable in a number of domains such as information retrieval, online advertising and collaborative filtering. Our method assumes an in-memory representation of the dataset and is designed to scale to query lengths of 100,000s of terms. Our algorithm uses a generalized Holders inequality to upper bound the inner product with the norms of the constituent vectors. We also propose a novel compression scheme that computes bounds for groups of candidate items, thereby speeding up computation and minimizing memory requirements per query. We conduct extensive experiments on the publicly available Wikipedia dataset, and demonstrate that, with a memory overhead of 21%, our method can provide 1-3 orders of magnitude improvement in query run-time compared to naive methods and state of the art competing methods. Our median top-10 word query time is 25 us on 7.5 million words and 2.3 million documents.
operating systems design and implementation | 2012
Joseph E. Gonzalez; Yucheng Low; Haijie Gu; Danny Bickson; Carlos Guestrin
uncertainty in artificial intelligence | 2010
Yucheng Low; Joseph E. Gonzalez; Aapo Kyrola; Danny Bickson; Carlos Guestrin; Joseph M. Hellerstein
Archive | 2010
Yucheng Low; Joseph E. Gonzalez; Aapo Kyrola; Danny Bickson; Carlos Guestrin; Joseph M. Hellerstein
Proceedings of The Vldb Endowment | 2012
Yucheng Low; Joseph E. Gonzalez; Aapo Kyrola; Danny Bickson; Carlos Guestrin; Joseph M. Hellerstein
international conference on artificial intelligence and statistics | 2009
Joseph E. Gonzalez; Yucheng Low; Carlos Guestrin
international conference on artificial intelligence and statistics | 2011
Joseph E. Gonzalez; Yucheng Low; Arthur Gretton; Carlos Guestrin