Philip M. Long | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philip M. Long is active.

Explore More

Publication

Featured researches published by Philip M. Long.

international conference on machine learning | 2008

Random classification noise defeats all convex potential boosters

Philip M. Long; Rocco A. Servedio

A broad class of boosting algorithms can be interpreted as performing coordinate-wise gradient descent to minimize some potential function of the margins of a data set. This class includes AdaBoost, LogitBoost, and other widely used and well-studied boosters. In this paper we show that for a broad class of convex potential functions, any such boosting algorithm is highly susceptible to random classification noise. We do this by showing that for any such booster and any nonzero random classification noise rate η, there is a simple data set of examples which is efficiently learnable by such a booster if there is no noise, but which cannot be learned to accuracy better than 1/2 if there is random classification noise at rate η. This negative result is in contrast with known branching program based boosters which do not fall into the convex potential function framework and which can provably learn to high accuracy in the presence of random classification noise.

international colloquium on automata languages and programming | 2009

Learning Halfspaces with Malicious Noise

Adam R. Klivans; Philip M. Long; Rocco A. Servedio

We give new algorithms for learning halfspaces in the challenging malicious noise model, where an adversary may corrupt both the labels and the underlying distribution of examples. Our algorithms can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension n , and succeed for the fairly broad class of all isotropic log-concave distributions. n nWe give poly(n , 1/*** )-time algorithms for solving the following problems to accuracy *** : n n nLearning origin-centered halfspaces in R n with respect to the uniform distribution on the unit ball with malicious noise rate *** = *** (*** 2/log(n /*** )). (The best previous result was *** (*** /(n log(n /*** ))1/4).) nLearning origin-centered halfspaces with respect to any isotropic log- concave distribution on R n with malicious noise rate *** = *** (*** 3/log(n /*** )). This is the first efficient algorithm for learning under isotropic log-concave distributions in the presence of malicious noise. n n n nWe also give a poly(n ,1/*** )-time algorithm for learning origin-centered halfspaces under any isotropic log-concave distribution on R n in the presence of adversarial label noise at rate *** = *** (*** 3/log(1/*** )). In the adversarial label noise setting (or agnostic model), labels can be noisy, but not example points themselves. Previous results could handle *** = *** (*** ) but had running time exponential in an unspecified function of 1/*** . n nOur analysis crucially exploits both concentration and anti-concentration properties of isotropic log-concave distributions. Our algorithms combine an iterative outlier removal procedure using Principal Component Analysis together with smooth boosting.

BMC Infectious Diseases | 2004

Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003

Vinsensius B. Vega; Yijun Ruan; Jianjun Liu; Wah Heng Lee; Chia Lin Wei; Su Yun Se-Thoe; Kin Fai Tang; Tao Zhang; Prasanna R. Kolatkar; Eng Eong Ooi; Ai Ee Ling; Lawrence W. Stanton; Philip M. Long; Edison T. Liu

BackgroundThe SARS coronavirus is the etiologic agent for the epidemic of the Severe Acute Respiratory Syndrome. The recent emergence of this new pathogen, the careful tracing of its transmission patterns, and the ability to propagate in culture allows the exploration of the mutational dynamics of the SARS-CoV in human populations.MethodsWe sequenced complete SARS-CoV genomes taken from primary human tissues (SIN3408, SIN3725V, SIN3765V), cultured isolates (SIN848, SIN846, SIN842, SIN845, SIN847, SIN849, SIN850, SIN852, SIN3408L), and five consecutive Vero cell passages (SIN2774_P1, SIN2774_P2, SIN2774_P3, SIN2774_P4, SIN2774_P5) arising from SIN2774 isolate. These represented individual patient samples, serial in vitro passages in cell culture, and paired human and cell culture isolates. Employing a refined mutation filtering scheme and constant mutation rate model, the mutation rates were estimated and the possible date of emergence was calculated. Phylogenetic analysis was used to uncover molecular relationships between the isolates.ResultsClose examination of whole genome sequence of 54 SARS-CoV isolates identified before 14th October 2003, including 22 from patients in Singapore, revealed the mutations engendered during human-to-Vero and Vero-to-human transmission as well as in multiple Vero cell passages in order to refine our analysis of human-to-human transmission. Though co-infection by different quasipecies in individual tissue samples is observed, the in vitro mutation rate of the SARS-CoV in Vero cell passage is negligible. The in vivo mutation rate, however, is consistent with estimates of other RNA viruses at approximately 5.7 × 10-6 nucleotide substitutions per site per day (0.17 mutations per genome per day), or two mutations per human passage (adjusted R-square = 0.4014). Using the immediate Hotel M contact isolates as roots, we observed that the SARS epidemic has generated four major genetic groups that are geographically associated: two Singapore isolates, one Taiwan isolate, and one North China isolate which appears most closely related to the putative SARS-CoV isolated from a palm civet. Non-synonymous mutations are centered in non-essential ORFs especially in structural and antigenic genes such as the S and M proteins, but these mutations did not distinguish the geographical groupings. However, no non-synonymous mutations were found in the 3CLpro and the polymerase genes.ConclusionsOur results show that the SARS-CoV is well adapted to growth in culture and did not appear to undergo specific selection in human populations. We further assessed that the putative origin of the SARS epidemic was in late October 2002 which is consistent with a recent estimate using cases from China. The greater sequence divergence in the structural and antigenic proteins and consistent deletions in the 3 – most portion of the viral genome suggest that certain selection pressures are interacting with the functional nature of these validated and putative ORFs.

Journal of Computer and System Sciences | 2009

Using the doubling dimension to analyze the generalization of learning algorithms

Nader H. Bshouty; Yi Li; Philip M. Long

Given a set F of classifiers and a probability distribution over their domain, one can define a metric by taking the distance between a pair of classifiers to be the probability that they classify a random item differently. We prove bounds on the sample complexity of PAC learning in terms of the doubling dimension of this metric. These bounds imply known bounds on the sample complexity of learning halfspaces with respect to the uniform distribution that are optimal up to a constant factor. We then prove a bound that holds for any algorithm that outputs a classifier with zero error whenever this is possible; this bound is in terms of the maximum of the doubling dimension and the VC-dimension of F and strengthens the best known bound in terms of the VC-dimension alone. Finally, we show that there is no bound on the doubling dimension of halfspaces in R^n in terms of n that holds independently of the domain distribution. This implies that there is no such a bound in terms of the VC-dimension of F (in contrast with the metric dimension).

conference on learning theory | 2006

Online multitask learning

Ofer Dekel; Philip M. Long; Yoram Singer

We study the problem of online learning of multiple tasks in parallel. On each online round, the algorithm receives an instance and makes a prediction for each one of the parallel tasks. We consider the case where these tasks all contribute toward a common goal. We capture the relationship between the tasks by using a single global loss function to evaluate the quality of the multiple predictions made on each round. Specifically, each individual prediction is associated with its own individual loss, and then these loss values are combined using a global loss function. We present several families of online algorithms which can use any absolute norm as a global loss function. We prove worst-case relative loss bounds for all of our algorithms.

symposium on the theory of computing | 2014

The power of localization for efficiently learning linear separators with noise

Pranjal Awasthi; Maria-Florina Balcan; Philip M. Long

We introduce a new approach for designing computationally efficient and noise tolerant algorithms for learning linear separators. We consider the malicious noise model of Valiant [41, 32] and the adversarial label noise model of Kearns, Schapire, and Sellie [34]. For malicious noise, where the adversary can corrupt an η of fraction both the label part and the feature part, we provide a polynomial-time algorithm for learning linear separators in Rd under the uniform distribution with nearly information-theoretically optimal noise tolerance of η = Ω(ε), improving on the Ω(&epsilon/d1/4) noise-tolerance of [31] and the Ω(ε2/log(d/ε) of [35]. For the adversarial label noise model, where the distribution over the feature vectors is unchanged, and the overall probability of a noisy label is constrained to be at most η, we give a polynomial-time algorithm for learning linear separators in Rd under the uniform distribution that can also handle a noise rate of η = Ω(ε). This improves over the results of [31] which either required runtime super-exponential in 1/ε (ours is polynomial in 1/ε) or tolerated less noise. In the case that the distribution is isotropic log-concave, we present a polynomial-time algorithm for the malicious noise model that tolerates Ω(ε/log2(1/ε)) noise, and a polynomial-time algorithm for the adversarial label noise model that also handles Ω(ε/log2(1/ε)) noise. Both of these also improve on results from [35]. In particular, in the case of malicious noise, unlike previous results, our noise tolerance has no dependence on the dimension d of the space. Our algorithms are also efficient in the active learning setting, where learning algorithms only receive the classifications of examples when they ask for them. We show that, in this model, our algorithms achieve a label complexity whose dependence on the error parameter ε is polylogarithmic (and thus exponentially better than that of any passive algorithm). This provides the first polynomial time active learning algorithm for learning linear separators in the presence of malicious noise or adversarial label noise.

international workshop and international workshop on approximation randomization and combinatorial optimization algorithms and techniques | 2009

Baum's Algorithm Learns Intersections of Halfspaces with Respect to Log-Concave Distributions

Adam R. Klivans; Philip M. Long; Alex K. Tang

In 1990, E. Baum gave an elegant polynomial-time algorithm for learning the intersection of two origin-centered halfspaces with respect to any symmetric distribution (i.e., any

Information Processing Letters | 2007