Scott E. Decatur | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Scott E. Decatur is active.

Explore More

Publication

Featured researches published by Scott E. Decatur.

Journal of Computational Biology | 1997

Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP model.

Richa Agarwala; Serafim Batzoglou; Vlado Dančík; Scott E. Decatur; Sridhar Hannenhalli; Martin Farach; S. Muthukrishnan; Steven Skiena

We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill (1985), which models protein as a chain of amino acid residues that are either hydrophobic or polar, and hydrophobic interactions are the dominant initial driving force for the protein folding. Hart and Istrail (1996a) gave approximation algorithms for folding proteins on the cubic lattice under the HP model. In this paper, we examine the choice of a lattice by considering its algorithmic and geometric implications and argue that the triangular lattice is a more reasonable choice. We present a set of folding rules for a triangular lattice and analyze the approximation ratio they achieve. In addition, we introduce a generalization of the HP model to account for residues having different levels of hydrophobicity. After describing the biological foundation for this generalization, we show that in the new model we are able to achieve similar constant factor approximation guarantees on the triangular lattice as were achieved in the standard HP model. While the structures derived from our folding rules are probably still far from biological reality, we hope that having a set of folding rules with different properties will yield more interesting folds when combined.

conference on learning theory | 1995

Specification and simulation of statistical query algorithms for efficiency and noise tolerance

Javed A. Aslam; Scott E. Decatur

A recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the probably approximately correct (PAC) model, both in the absenceandin the presence of noise. However, simulations of SQ algorithms in the PAC model have non-optimal time and sample complexities. In this paper, we introduce a new method for specifying statistical query algorithms based on a type ofrelative errorand provide simulations in the noise-free and noise-tolerant PAC models which yield more efficient algorithms. Requests for estimates of statistics in this new model take the following form: “Return an estimate of the statistic within a 1±?factor, or return ?, promising that the statistic is less than?.” In addition to showing that this is a very natural language for specifying learning algorithms, we also show that this new specification is polynomially equivalent to standard SQ, and thus, known learnability and hardness results for statistical query learning are preserved. We then give highly efficient PAC simulations of relative error SQ algorithms. We show that the learning algorithms obtained by simulating efficient relative error SQ algorithms both in the absence of noise and in the presence of malicious noise have roughly optimal sample complexity. We also show that the simulation of efficient relative error SQ algorithms in the presence of classification noise yields learning algorithms at least as efficient as those obtained through standard methods, and in some cases improved, roughly optimal results are achieved. The sample complexities for all of these simulations are based on thed?metric, which is a type of relative error metric useful for quantities which are small or even zero. We show that uniform convergence with respect to thed?metric yields “uniform convergence” with respect to (?, ?) accuracy. Finally, while we show that manyspecificlearning algorithms can be written as highly efficient relative error SQ algorithms, we also show, in fact, thatallSQ algorithms can be written efficiently by proving general upper bounds on the complexity of (?, ?) queries as a function of the accuracy parameter?. As a consequence of this result, we give general upper bounds on the complexity of learning algorithms achieved through the use of relative error SQ algorithms and the simulations described above.

conference on learning theory | 1993

Statistical queries and faulty PAC oracles

Scott E. Decatur

In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [12] recently showed that efficient learning in a new model using statistical queries is a sufficient condition for PAC learning with classification noise. We show that efficient learning with statistical queries is sufficient for learning in the PAC model with malicious error rate proportional to the required statistical query accuracy. One application of this result is a new lower bound for tolerable malicious error in learning monomials of k literals. This is the first such bound which is independent of the number of irrelevant attributes n. We also use the statistical query model to give sufficient conditions for using distribution specific algorithms on distributions outside their prescribed domains. A corollary of this result expands the class of distributions on which we can weakly learn monotone Boolean formulae. We also consider new models of learning in which examples are not chosen according to the distribution on which the learner will be tested. We examine three variations of distribution notse and give necessary and sufficient conditions for polynomial tzme learning with such noise. We show containment and separations between the various models of faulty oracles. Finally? we examine hypothesis boosting algorithms m the context of learning with distribution noise, and show that Schapire’s result re arding the strength of weak learnabil~f ity 17 is in some sense tight in requiring the wea learner to be nearly distribution free.

Information & Computation | 1998

General bounds on statistical query learning and PAC learning with noise via hypothesis boosting

Javed A. Aslam; Scott E. Decatur

Abstract We derive general bounds on the complexity of learning in the statistical query (SQ) model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the SQ model. This new model was introduced by Kearns to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Since all SQ algorithms can be simulated in the PAC model with classification noise, we also obtain general upper bounds on learning in the presence of classification noise for classes which can be learned in the SQ model.

conference on learning theory | 1995

On learning from noisy and incomplete examples

Scott E. Decatur; Rosario Gennaro

We investigate learnability in the PAC model when the data used for learning, attributes and labels, is either corrupted or incomplete. In order to prove our main results, we define a new complexity measure on statistical query (SQ) learning algorithms. The view of an SQ algorithm is the maximum over all queries in the algorithm, of the number of input bits on which the query depends. We show that a restricted view SQ algorithm for a class is a general sufficient condition for learnability in both the models of attribute noise and covered (or missing) attributes, We further show that since the algorithms in question are statistical, they can also simultaneously tolerate classification noise. Classes for which these results hold, and can therefore be learned with simultaneous attribute noise and classification noise, include k-DNF, k-term-DNF by DNF representations, conjunctions with few relevant variables, and over the uniform distribution, decision lists. These noise models are the first PAC models in which all training data, attributes and labels, may be corrupted by a random process. Previous researchers had shown that the class of k-DNF is learnable with attribute noise if the attribute noise rate is known exactly. We show that all of our attribute noise learnability results, either with or without classification noise, also hold when the exact noise rate is not *Author was supported by an NDSEG Fellowship and by NSF Grant CCR-92-00884. Author’s email address: sed@das. hsrvard. edu ‘Author was supported by NSF Grant 9121466 CCR and a graduate fellowship from the Consiglio Nazionale delle Ricerche, Italy. Author’s email address: rosario@theory. lcs .mit . edu Permission to make digital/hard copies of all or part of this material withmt fee is grantc.d provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright(scrvcr notice. the title of the publication and its date appear, and notice is given known, provided that the learner instead has a polynomially good approximation of the noise rate. In addition, we show that the results also hold when there is not one single noise rate, but a distinct noise rate for each attribute. Our results for learning with random covering do not require the learner to be told even an approximation of the covering rate and in addition hold in the setting with distinct covering rates for each attribute. Finally, we give lower bounds on the number of examples required for learning in the presence of attribute noise or covering.

Information Processing Letters | 1996

On the sample complexity of noise-tolerant learning

Javed A. Aslam; Scott E. Decatur

Abstract In this paper, we further characterize the complexity of noise-tolerant learning in the PAC model. Specifically, we show a general lower bound of Ω(log( 1 δ ) e(1 − 2η) 2 ) on the number of examples required for PAC learning in the presence of classification noise. Combined with a result of Simon, we effectively show that the sample complexity of PAC learning in the presence of classification noise is Ω(VC(F) e(1 − 2η) 2 + log( 1 δ ) e(1 − 2η) 2 ) . Furthermore, we demonstrate the optimality of the general lower bound by providing a noise-tolerant learning algorithm for the class of symmetric Boolean functions which uses a sample size within a constant factor of this bound. Finally, we note that our general lower bound compares favorably with various general upper bounds for PAC learning in the presence of classification noise.

SIAM Journal on Computing | 1999

Computational Sample Complexity

Scott E. Decatur; Oded Goldreich; Dana Ron

In a variety of PAC learning models, a trade-off between time and information seems to exist: with unlimited time, a small amount of information suffices, but with time restrictions, more information sometimes seems to be required. In addition, it has long been known that there are concept classes that can be learned in the absence of computational restrictions, but (under standard cryptographic assumptions) cannot be learned in polynomial time (regardless of sample size). Yet, these results do not answer the question of whether there are classes for which learning from a small set of examples is computationally infeasible, but becomes feasible when the learner has access to (polynomially) more examples. To address this question, we introduce a new measure of learning complexity called computational sample complexity that represents the number of examples sufficient for polynomial time learning with respect to a fixed distribution. We then show concept classes that (under similar cryptographic assumptions) possess arbitrarily sized gaps between their standard (information-theoretic) sample complexity and their computational sample complexity. We also demonstrate such gaps for learning from membership queries and learning from noisy examples.

international conference on artificial intelligence and statistics | 1996

Learning in Hybrid Noise Environments Using Statistical Queries

Scott E. Decatur

We consider formal models of learning from noisy data. Specifically, we focus on learning in the probability approximately correct model as defined by Valiant. Two of the most widely studied models of noise in this setting have been classification noise and malicious errors. However, a more realistic model combining the two types of noise has not been formalized. We define a learning environment based on a natural combination of these two noise models. We first show that hypothesis testing is possible in this model. We next describe a simple technique for learning in this model, and then describe a more powerful technique based on statistical query learning. We show that the noise tolerance of this improved technique is roughly optimal with respect to the desired learning accuracy and that it provides a smooth tradeoff between the tolerable amounts of the two types of noise. Finally, we show that statistical query simulation yields learning algorithms for other combinations of noise models, thus demonstrating that statistical query specification truly captures the generic fault tolerance of a learning algorithm.

research in computational molecular biology | 1997

Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP model

Richa Agarwala; Serafim Batzoglou; Vlado Dančík; Scott E. Decatur; Martin Farach; Sridhar Hannenhalli; S. Muthukrishnan; Steven Skiena

conference on learning theory | 1995

On the learnability of Z n -DNF formulas (extended abstract)

Nader H. Bshouty; Zhixiang Chen; Scott E. Decatur; Steven Homer

Although many learning problems can be reduced to learning Boolean functions, in many cases a more efficient learning algorithm can be derived when the problem is considered over a larger domain. In this paper we give a natural generalization of DNF formulas, ZN-DNF formulas over the ring of integers modulo lV. We first show using elementary number theory that for almost all larger rings the learnability of ZjV-DNF formulas is easy. This shows that the difficulty of learning Boolean DNF formulas lies in the fact that the domain is small. We then establish upper and lower bounds on the number of equivalence queries required for the exact learning of Z~-terms. We show that a(lV)n + 1 < (log AJ)n + 1 equivalence queries are sufficient and y(lV)n equivalence queries are necessary, where a(lV) is the sum of the exponents in the prime decomposition of N, and Y(N) is the sum of logarithms of the exponents in the prime decomposition of N. We also demonstrate how the additional power of membership queries allows improved learning in two different ways: (1) more efficient learning for some classes learnable with equivalence *Department of Computer Science, the University of Calgary, 2500 University Drive N. W., Calgary, Alberta T2N IN4, Canada. [email protected]. ca. f Department of Computer Science, Boston University, Boston, MA 02215. zchen~cs. bu. edu. The author was supported by NSF grants CCR-9103O55 and CCR-9400229. ‘Aiken Computation Laboratory, Harvard University, Cambridge, MA 02138. [email protected]. edu. Supported by an NDSEG Fellowship and by NSF Grant CCR-92-O0884. ~Department of Computer Science, Boston University, Boston, MA 02215. homer~cs.bu.edu. The author was supported by NSF grants CCR-9103O55 and CCR-9400229. Permission to make digital/hard copies of all or part of this material without fee is granted provided [hat the copies are not made or distributed for profit or commercial advantage, the ACM copyright/serv:r notice, the title of the publication and its date appear, and not]ce is give: that copyright is by permission of the Association for Computing Machinery! Inc, (ACM) To copy ot[le~ise, to republish,to post on servers or tO redistribute to lists, requires specific permission and/or fee. COLT’ ’95 Smta Cruz, CA USA@ 1995 ACM 0-89723 -5/95 /0007 ..

Explore More