Is this you? Create Your Porfile

Cho-Jui Hsieh

University of California, Davis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cho-Jui Hsieh is active.

Explore More

Publication

Featured researches published by Cho-Jui Hsieh.

international conference on data mining | 2012

Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Hsiang-Fu Yu; Cho-Jui Hsieh; Si Si; Inderjit S. Dhillon

Matrix factorization, when the matrix has missing values, has become one of the leading techniques for recommender systems. To handle web-scale datasets with millions of users and billions of ratings, scalability becomes an important issue. Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD) are two popular approaches to compute matrix factorization. There has been a recent flurry of activity to parallelize these algorithms. However, due to the cubic time complexity in the target rank, ALS is not scalable to large-scale datasets. On the other hand, SGD conducts efficient updates but usually suffers from slow convergence that is sensitive to the parameters. Coordinate descent, a classical optimization approach, has been used for many other large-scale problems, but its application to matrix factorization for recommender systems has not been explored thoroughly. In this paper, we show that coordinate descent based methods have a more efficient update rule compared to ALS, and are faster and have more stable convergence than SGD. We study different update sequences and propose the CCD++ algorithm, which updatesrank-one factors one by one. In addition, CCD++ can be easily parallelized on both multi-core and distributed systems. We empirically show that CCD++ is much faster than ALS and SGD in both settings. As an example, on a synthetic dataset with 2 billion ratings, CCD++ is 4 times faster than both SGD and ALS using a distributed system with 20 machines.

knowledge discovery and data mining | 2011

Fast coordinate descent methods with variable selection for non-negative matrix factorization

Cho-Jui Hsieh; Inderjit S. Dhillon

Nonnegative Matrix Factorization (NMF) is an effective dimension reduction method for non-negative dyadic data, and has proven to be useful in many areas, such as text mining, bioinformatics and image processing. NMF is usually formulated as a constrained non-convex optimization problem, and many algorithms have been developed for solving it. Recently, a coordinate descent method, called FastHals, has been proposed to solve least squares NMF and is regarded as one of the state-of-the-art techniques for the problem. In this paper, we first show that FastHals has an inefficiency in that it uses a cyclic coordinate descent scheme and thus, performs unneeded descent steps on unimportant variables. We then present a variable selection scheme that uses the gradient of the objective function to arrive at a new coordinate descent method. Our new method is considerably faster in practice and we show that it has theoretical convergence guarantees. Moreover when the solution is sparse, as is often the case in real applications, our new method benefits by selecting important variables to update more often, thus resulting in higher speed. As an example, on a text dataset RCV1, our method is 7 times faster than FastHals, and more than 15 times faster when the sparsity is increased by adding an L1 penalty. We also develop new coordinate descent methods when error in NMF is measured by KL-divergence by applying the Newton method to solve the one-variable sub-problems. Experiments indicate that our algorithm for minimizing the KL-divergence is faster than the Lee & Seung multiplicative rule by a factor of 10 on the CBCL image dataset.

very large data bases | 2014

NOMAD: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion

Hyokun Yun; Hsiang-Fu Yu; Cho-Jui Hsieh; S. V. N. Vishwanathan; Inderjit S. Dhillon

We develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred between processors in a decentralized fashion. As a consequence it is a lock-free parallel algorithm. In spite of being asynchronous, the variable updates of NOMAD are serializable, that is, there is an equivalent update ordering in a serial implementation. NOMAD outperforms synchronous algorithms which require explicit bulk synchronization after every iteration: our extensive empirical evaluation shows that not only does our algorithm perform well in distributed setting on commodity hardware, but also outperforms state-of-the-art algorithms on a HPC cluster both in multi-core and distributed memory settings.

knowledge discovery and data mining | 2012

Low rank modeling of signed networks

Cho-Jui Hsieh; Kai-Yang Chiang; Inderjit S. Dhillon

Trust networks, where people leave trust and distrust feedback, are becoming increasingly common. These networks may be regarded as signed graphs, where a positive edge weight captures the degree of trust while a negative edge weight captures the degree of distrust. Analysis of such signed networks has become an increasingly important research topic. One important analysis task is that of sign inference, i.e., infer unknown (or future) trust or distrust relationships given a partially observed signed network. Most state-of-the-art approaches consider the notion of structural balance in signed networks, building inference algorithms based on information about links, triads, and cycles in the network. In this paper, we first show that the notion of weak structural balance in signed networks naturally leads to a global low-rank model for the network. Under such a model, the sign inference problem can be formulated as a low-rank matrix completion problem. We show that we can perfectly recover missing relationships, under certain conditions, using state-of-the-art matrix completion algorithms. We also propose the use of a low-rank matrix factorization approach with generalized loss functions as a practical method for sign inference - this approach yields high accuracy while being scalable to large signed networks, for instance, we show that this analysis can be performed on a synthetic graph with 1.1 million nodes and 120 million edges in 10 minutes. We further show that the low-rank model can be used for other analysis tasks on signed networks, such as user segmentation through signed graph clustering, with theoretical guarantees. Experiments on synthetic as well as real data show that our low rank model substantially improves accuracy of sign inference as well as clustering. As an example, on the largest real dataset available to us (Epinions data with 130K nodes and 840K edges), our matrix factorization approach yields 94.6% accuracy on the sign inference task as compared to 90.8% accuracy using a state-of-the-art cycle-based method - moreover, our method runs in 40 seconds as compared to 10,000 seconds for the cycle-based method.

arXiv: Machine Learning | 2017

ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Pin-Yu Chen; Huan Zhang; Yash Sharma; Jinfeng Yi; Cho-Jui Hsieh

Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs. Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack (e.g., Carlini and Wagners attack) and significantly outperforms existing black-box attacks via substitute models.

international world wide web conferences | 2015

A Scalable Asynchronous Distributed Algorithm for Topic Modeling

Hsiang-Fu Yu; Cho-Jui Hsieh; Hyokun Yun; S. V. N. Vishwanathan; Inderjit S. Dhillon

Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons. First, one needs to deal with a large number of topics (typically on the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple machines. In this paper, we present a novel algorithm F+Nomad LDA which simultaneously tackles both these problems. In order to handle large number of topics we use an appropriately modified Fenwick tree. This data structure allows us to sample from a multinomial distribution over T items in O(log T) time. Moreover, when topic counts change the data structure can be updated in O(log T) time. In order to distribute the computation across multiple processors, we present a novel asynchronous framework inspired by the Nomad algorithm of Yun et al, 2014. We show that F+Nomad LDA significantly outperforms recent state-of-the-art topic modeling approaches on massive problems which involve millions of documents, billions of words, and thousands of topics.

knowledge discovery and data mining | 2016

Goal-Directed Inductive Matrix Completion

Si Si; Kai-Yang Chiang; Cho-Jui Hsieh; Nikhil Rao; Inderjit S. Dhillon

Matrix completion (MC) with additional information has found wide applicability in several machine learning applications. Among algorithms for solving such problems, Inductive Matrix Completion(IMC) has drawn a considerable amount of attention, not only for its well established theoretical guarantees but also for its superior performance in various real-world applications. However, IMC based methods usually place very strong constraints on the quality of the features(side information) to ensure accurate recovery, which might not be met in practice. In this paper, we propose Goal-directed Inductive Matrix Completion(GIMC) to learn a nonlinear mapping of the features so that they satisfy the required properties. A key distinction between GIMC and IMC is that the feature mapping is learnt in a supervised manner, deviating from the traditional approach of unsupervised feature learning followed by model training. We establish the superiority of our method on several popular machine learning applications including multi-label learning, multi-class classification, and semi-supervised clustering.

international world wide web conferences | 2013

Organizational overlap on social networks and its applications

Cho-Jui Hsieh; Mitul Tiwari; Deepak Agarwal; Xinyi Huang; Sam Shah

Online social networks have become important for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed because two individuals might know each other, but may not have established a connection on the site. Therefore, link prediction and recommendations are important tasks for any online social network. In this paper, we address the problem of computing edge affinity between two users on a social network, based on the users belonging to organizations such as companies, schools, and online groups. We present experimental insights from social network data on organizational overlap, a novel mathematical model to compute the probability of connection between two people based on organizational overlap, and experimental validation of this model based on real social network data. We also present novel ways in which the organization overlap model can be applied to link prediction and community detection, which in itself could be useful for recommending entities to follow and generating personalized news feed.

european conference on computer vision | 2018

Towards Robust Neural Networks via Random Self-ensemble

Xuanqing Liu; Minhao Cheng; Huan Zhang; Cho-Jui Hsieh

Recent studies have revealed the vulnerability of deep neural networks: A small adversarial perturbation that is imperceptible to human can easily make a well-trained deep neural network misclassify. This makes it unsafe to apply neural networks in security-critical applications. In this paper, we propose a new defense algorithm called Random Self-Ensemble (RSE) by combining two important concepts: {\bf randomness} and {\bf ensemble}. To protect a targeted model, RSE adds random noise layers to the neural network to prevent the strong gradient-based attacks, and ensembles the prediction over random noises to stabilize the performance. We show that our algorithm is equivalent to ensemble an infinite number of noisy models

IEEE Computer | 2016