Is this you? Create Your Porfile

Mu Li

Carnegie Mellon University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mu Li is active.

Explore More

Publication

Featured researches published by Mu Li.

knowledge discovery and data mining | 2014

Efficient mini-batch training for stochastic optimization

Mu Li; Tong Zhang; Yuqiang Chen; Alexander J. Smola

Stochastic gradient descent (SGD) is a popular technique for large-scale optimization problems in machine learning. In order to parallelize SGD, minibatch training needs to be employed to reduce the communication cost. However, an increase in minibatch size typically decreases the rate of convergence. This paper introduces a technique based on approximate optimization of a conservatively regularized objective function within each minibatch. We prove that the convergence rate does not decrease with increasing minibatch size. Experiments demonstrate that with suitable implementations of approximate optimization, the resulting algorithm can outperform standard SGD in many scenarios.

international conference of the ieee engineering in medicine and biology society | 2009

Emotion classification based on gamma-band EEG

Mu Li; Bao-Liang Lu

In this paper, we use EEG signals to classify two emotions-happiness and sadness. These emotions are evoked by showing subjects pictures of smile and cry facial expressions. We propose a frequency band searching method to choose an optimal band into which the recorded EEG signal is filtered. We use common spatial patterns (CSP) and linear-SVM to classify these two emotions. To investigate the time resolution of classification, we explore two kinds of trials with lengths of 3s and 1s. Classification accuracies of 93.5% ± 6.7% and 93.0%±6.2% are achieved on 10 subjects for 3s-trials and 1s-trials, respectively. Our experimental results indicate that the gamma band (roughly 30–100 Hz) is suitable for EEG-based emotion classification.

foundations of computer science | 2013

Iterative Row Sampling

Mu Li; Gary L. Miller; Richard Peng

There has been significant interest and progress recently in algorithms that solve regression problems involving tall and thin matrices in input sparsity time. Given a n * d matrix where n ≥ d, these algorithms find an approximation with fewer rows, allowing one to solve a poly(d) sized problem instead. In practice, the best performances are often obtained by invoking these routines in an iterative fashion. We show these iterative methods can be adapted to give theoretical guarantees comparable to and better than the current state of the art. Our approaches are based on computing the importances of the rows, known as leverage scores, in an iterative manner. We show that alternating between computing a short matrix estimate and finding more accurate approximate leverage scores leads to a series of geometrically smaller instances. This gives an algorithm whose runtime is input sparsity plus an overhead comparable to the cost of solving a regression problem on the smaller approximation. Our results build upon the close connection between randomized matrix algorithms, iterative methods, and graph sparsification.

computer vision and pattern recognition | 2011

Time and space efficient spectral clustering via column sampling

Mu Li; Xiao-Chen Lian; James Tin-Yau Kwok; Bao-Liang Lu

Spectral clustering is an elegant and powerful approach for clustering. However, the underlying eigen-decomposition takes cubic time and quadratic space w.r.t. the data set size. These can be reduced by the Nyström method which samples only a subset of columns from the matrix. However, the manipulation and storage of these sampled columns can still be expensive when the data set is large. In this paper, we propose a time- and space-efficient spectral clustering algorithm which can scale to very large data sets. A general procedure to orthogonalize the approximated eigenvectors is also proposed. Extensive spectral clustering experiments on a number of data sets, ranging in size from a few thousands to several millions, demonstrate the accuracy and scalability of the proposed approach. We further apply it to the task of image segmentation. For images with more than 10 millions pixels, this algorithm can obtain the eigenvectors in 1 minute on a single machine.

computer vision and pattern recognition | 2010

Online multiple instance learning with no regret

Mu Li; James Tin-Yau Kwok; Bao-Liang Lu

Multiple instance (MI) learning is a recent learning paradigm that is more flexible than standard supervised learning algorithms in the handling of label ambiguity. It has been used in a wide range of applications including image classification, object detection and object tracking. Typically, MI algorithms are trained in a batch setting in which the whole training set has to be available before training starts. However, in applications such as tracking, the classifier needs to be trained continuously as new frames arrive. Motivated by the empirical success of a batch MI algorithm called MILES, we propose in this paper an online MI learning algorithm that has an efficient online update procedure and also performs joint feature selection and classification as MILES. Besides, while existing online MI algorithms lack theoretical properties, we prove that the proposed online algorithm has a (cumulative) regret of O(√T), where T is the number of iterations. In other words, the average regret goes to zero asymptotically and it thus achieves the same performance as the best solution in hindsight. Experiments on a number of MI classification and object tracking data sets demonstrate encouraging results.

web search and data mining | 2015

Inferring Movement Trajectories from GPS Snippets

Mu Li; Amr Ahmed; Alexander J. Smola

Inferring movement trajectories can be a challenging task, in particular when detailed tracking information is not available due to privacy and data collection constraints. In this paper we present a complete and computationally tractable model for estimating and predicting trajectories based on sparsely sampled, anonymous GPS land-marks that we call GPS snippets. To combat data sparsity we use mapping data as side information to constrain the inference process. We show the efficacy of our approach on a set of prediction tasks over data collected from different cities in the US.

IEEE Transactions on Neural Networks | 2015

Large-Scale Nyström Kernel Matrix Approximation Using Randomized SVD

Mu Li; Wei Bi; James Tin-Yau Kwok; Bao-Liang Lu

The Nyström method is an efficient technique for the eigenvalue decomposition of large kernel matrices. However, to ensure an accurate approximation, a sufficient number of columns have to be sampled. On very large data sets, the singular value decomposition (SVD) step on the resultant data submatrix can quickly dominate the computations and become prohibitive. In this paper, we propose an accurate and scalable Nyström scheme that first samples a large column subset from the input matrix, but then only performs an approximate SVD on the inner submatrix using the recent randomized low-rank matrix approximation algorithms. Theoretical analysis shows that the proposed algorithm is as accurate as the standard Nyström method that directly performs a large SVD on the inner submatrix. On the other hand, its time complexity is only as low as performing a small SVD. Encouraging results are obtained on a number of large-scale data sets for low-rank approximation. Moreover, as the most computational expensive steps can be easily distributed and there is minimal data transfer among the processors, significant speedup can be further obtained with the use of multiprocessor and multi-GPU systems.

web search and data mining | 2016

DiFacto: Distributed Factorization Machines

Mu Li; Ziqi Liu; Alexander J. Smola; Yu-Xiang Wang

Factorization Machines offer good performance and useful embeddings of data. However, they are costly to scale to large amounts of data and large numbers of features. In this paper we describe DiFacto, which uses a refined Factorization Machine model with sparse memory adaptive constraints and frequency adaptive regularization. We show how to distribute DiFacto over multiple machines using the Parameter Server framework by computing distributed subgradients on minibatches asynchronously. We analyze its convergence and demonstrate its efficiency in computational advertising datasets with billions examples and features.

knowledge discovery and data mining | 2015

Cuckoo Linear Algebra

Li Zhou; David G. Andersen; Mu Li; Alexander J. Smola

In this paper we present a novel data structure for sparse vectors based on Cuckoo hashing. It is highly memory efficient and allows for random access at near dense vector level rates. This allows us to solve sparse l1 programming problems exactly and without preprocessing at a cost that is identical to dense linear algebra both in terms of memory and speed. Our approach provides a feasible alternative to the hash kernel and it excels whenever exact solutions are required, such as for feature selection.

arXiv: Distributed, Parallel, and Cluster Computing | 2015