Tianqi Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tianqi Chen is active.

Explore More

Publication

Featured researches published by Tianqi Chen.

knowledge discovery and data mining | 2016

XGBoost: A Scalable Tree Boosting System

Tianqi Chen; Carlos Guestrin

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

international acm sigir conference on research and development in information retrieval | 2013

Optimizing top-n collaborative filtering via dynamic negative item sampling

Weinan Zhang; Tianqi Chen; Jun Wang; Yong Yu

Collaborative filtering techniques rely on aggregated user preference data to make personalized predictions. In many cases, users are reluctant to explicitly express their preferences and many recommender systems have to infer them from implicit user behaviors, such as clicking a link in a webpage or playing a music track. The clicks and the plays are good for indicating the items a user liked (i.e., positive training examples), but the items a user did not like (negative training examples) are not directly observed. Previous approaches either randomly pick negative training samples from unseen items or incorporate some heuristics into the learning model, leading to a biased solution and a prolonged training period. In this paper, we propose to dynamically choose negative training samples from the ranked list produced by the current prediction model and iteratively update our model. The experiments conducted on three large-scale datasets show that our approach not only reduces the training time, but also leads to significant performance gains.

Scientific Reports | 2015

Biological Sources of Intrinsic and Extrinsic Noise in cI Expression of Lysogenic Phage Lambda.

Xue Lei; Wei Tian; Hongyuan Zhu; Tianqi Chen; Ping Ao

Genetically identical cells exposed to homogeneous environment can show remarkable phenotypic difference. To predict how phenotype is shaped, understanding of how each factor contributes is required. During gene expression processes, noise could arise either intrinsically in biochemical processes of gene expression or extrinsically from other cellular processes such as cell growth. In this work, important noise sources in gene expression of phage λ lysogen are quantified using models described by stochastic differential equations (SDEs). Results show that DNA looping has sophisticated impacts on gene expression noise: When DNA looping provides autorepression, like in wild type, it reduces noise in the system; When the autorepression is defected as it is in certain mutants, DNA looping increases expression noise. We also study how each gene operator affects the expression noise by changing the binding affinity between the gene and the transcription factor systematically. We find that the system shows extraordinarily large noise when the binding affinity is in certain range, which changes the system from monostable to bistable. In addition, we find that cell growth causes non-negligible noise, which increases with gene expression level. Quantification of noise and identification of new noise sources will provide deeper understanding on how stochasticity impacts phenotype.

Challenge | 2011

Informative household recommendation with feature-based matrix factorization

Qiuxia Lu; Diyi Yang; Tianqi Chen; Weinan Zhang; Yong Yu

In this paper, we describe our solutions to the first track of CAMRa2011 challenge. The goal of this track is to generate a movie ranking list for each household. To achieve this goal, we propose to use the ranking oriented matrix factorization and the matrix factorization with negative examples sampling. We also adopt feature-based matrix factorization framework to incorporate various contextual information to our model, including user-household relations, item neighborhood, user implicit feedback, etc. Finally, we elaborate two kinds of methods to recommend movies for each household based on our models. Experimental results show that our proposed approaches achieve significant improvement over baseline methods.

Statistics and Computing | 2018

Irreversible samplers from jump and continuous Markov processes

Yi-An Ma; Tianqi Chen; Lei Wu

In this paper, we propose irreversible versions of the Metropolis–Hastings (MH) and Metropolis-adjusted Langevin algorithm (MALA) with a main focus on the latter. For the former, we show how one can simply switch between different proposal and acceptance distributions upon rejection to obtain an irreversible jump sampler (I-Jump). The resulting algorithm has a simple implementation akin to MH, but with the demonstrated benefits of irreversibility. We then show how the previously proposed MALA method can also be extended to exploit irreversible stochastic dynamics as proposal distributions in the I-Jump sampler. Our experiments explore how irreversibility can increase the efficiency of the samplers in different situations.

international conference on data mining | 2014

A Parallel and Efficient Algorithm for Learning to Match

Jingbo Shang; Tianqi Chen; Hang Li; Zhengdong Lu; Yong Yu

Many tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques, referred to as learning-to-match in this paper, have been successfully applied to the problems. Among them, a class of state-of-the-art methods, named feature-based matrix factorization, formalize the task as an extension to matrix factorization by incorporating auxiliary features into the model. Unfortunately, making those algorithms scale to real world problems is challenging, and simple parallelization strategies fail due to the complex cross talking patterns between sub-tasks. In this paper, we tackle this challenge with a novel parallel and efficient algorithm. Our algorithm, based on coordinate descent, can easily handle hundreds of millions of instances and features on a single machine. The key recipe of this algorithm is an iterative relaxation of the objective to facilitate parallel updates of parameters, with guaranteed convergence on minimizing the original objective function. Experimental results demonstrate that the proposed method is effective on a wide range of matching problems, with efficiency significantly improved upon the baselines while accuracy retained unchanged.

international acm sigir conference on research and development in information retrieval | 2012

Collaborative filtering with short term preferences mining

Diyi Yang; Tianqi Chen; Weinan Zhang; Yong Yu

Recently, recommender systems have fascinated researchers and benefited a variety of peoples online activities, enabling users to survive the explosive web information. Traditional collaborative filtering techniques handle the general recommendation well. However, most such approaches usually focus on long term preferences. To discover more short term factors influencing peoples decisions, we propose a short term preferences model, implemented with implicit user feedback. We conduct experiments comparing the performances of different short term models, which show that our model outperforms significantly compared to those long term models.

european conference on machine learning | 2012

Discriminative factor alignment across heterogeneous feature space

Fangwei Hu; Tianqi Chen; Nathan Nan Liu; Qiang Yang; Yong Yu

Transfer learning as a new machine learning paradigm has gained increasing attention lately. In situations where the training data in a target domain are not sufficient to learn predictive models effectively, transfer learning leverages auxiliary source data from related domains for learning. While most of the existing works in this area are only focused on using the source data with the same representational structure as the target data, in this paper, we push this boundary further by extending transfer between text and images. We integrate documents , tags and images to build a heterogeneous transfer learning factor alignment model and apply it to improve the performance of tag recommendation. Many algorithms for tag recommendation have been proposed, but many of them have problem; the algorithm may not perform well under cold start conditions or for items from the long tail of the tag frequency distribution. However, with the help of documents, our algorithm handles these problems and generally outperforms other tag recommendation methods, especially the non-transfer factor alignment model.

architectural support for programming languages and operating systems | 2018

Leveraging the VTA-TVM Hardware-Software Stack for FPGA Acceleration of 8-bit ResNet-18 Inference

Thierry Moreau; Tianqi Chen; Luis Ceze

We present a full-stack design to accelerate deep learning inference with FPGAs. Our contribution is two-fold. At the software layer, we leverage and extend TVM, the end-to-end deep learning optimizing compiler, in order to harness FPGA-based acceleration. At the the hardware layer, we present the Versatile Tensor Accelerator (VTA) which presents a generic, modular, and customizable architecture for TPU-like accelerators. Our results take a ResNet-18 description in MxNet and compiles it down to perform 8-bit inference on a 256-PE accelerator implemented on a low-cost Xilinx Zynq FPGA, clocked at 100MHz. Our full hardware acceleration stack will be made available for the community to reproduce, and build upon at http://github.com/uwsaml/vta.

architectural support for programming languages and operating systems | 2018

PANEL: Open panel and discussion on tackling complexity, reproducibility and tech transfer challenges in a rapidly evolving AI/ML/systems research

Grigori Fursin; Thierry Moreau; Hillery C. Hunter; Yiran Chen; Charles Qi; Tianqi Chen

Discussion is centered around the following questions: * How do we facilitate tech transfer between academia and industry in a quickly evolving research landscape? * How do we incentivize companies and academic researchers to release more artifacts and open source projects as portable, customizable and reusable components which can be collaboratively optimized by the community across diverse models, data sets and platforms from the cloud to edge? * How do we ensure reproducible evaluation and fair comparison of diverse AI/ML frameworks, libraries, techniques and tools? * What other workloads (AI, ML, quantum) and exciting research challenges should ReQuEST attempt to solve in its future iterations with the help of the multi-disciplinary community: reducing training time and costs, comparing specialized hardware (TPU/FPGA/DSP), distributing learning across edge devices, ...

Explore More