Stephen Kwek
University of Texas at San Antonio
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stephen Kwek.
european conference on machine learning | 2004
Rehan Akbani; Stephen Kwek; Nathalie Japkowicz
Support Vector Machines (SVM) have been extensively studied and have shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in gene profiling and detecting credit card fraud). This paper discusses the factors behind this failure and explains why the common strategy of undersampling the training data may not be the best choice for SVM. We then propose an algorithm for overcoming these problems which is based on a variant of the SMOTE algorithm by Chawla et al, combined with Veropoulos et als different error costs algorithm. We compare the performance of our algorithm against these two algorithms, along with undersampling and regular SVM and show that our algorithm outperforms all of them.
Journal of Computer and System Sciences | 2001
Sally A. Goldman; Stephen Kwek; Stephen D. Scott
P. W. Goldberg, S. A. Goldman, and S. D. Scott (Mach. Learning25, No. 1 (1996), 51?70) discussed how the problem of recognizing a landmark from a one-dimensional visual image might be mapped to that of learning a one-dimensional geometric pattern and gave a PAC algorithm to learn that class. In this paper, we present an efficient online agnostic learning algorithm for learning the class of constant-dimensional geometric patterns. Our algorithm can tolerate both classification and attribute noise. By working in higher dimensional spaces we can represent more features from the visual image in the geometric pattern. Our mapping of the data to a geometric pattern and, hence, our learning algorithm are applicable to any data representable as a constant-dimensional array of values, e.g., sonar data, temporal difference information, amplitudes of a waveform, or other pattern recognition data. To our knowledge, these classes of patterns are more complex than any class of geometric patterns previously studied. Also, our results are easily adapted to learn the union of fixed-dimensional boxes from multiple-instance examples. Finally, our algorithms are tolerant of concept shift, where the target concept that labels the examples can change over time.
international conference hybrid intelligent systems | 2005
Amitava Karmaker; Stephen Kwek
Ensemble methods have been known to improve prediction accuracy over the base learning algorithm. AdaBoost is well-recognized for that in its class. However, it is susceptible to overfitting the training instances corrupted by class label noise. This paper proposes a modification to AdaBoost that is more tolerant to class label noise, which further enhances its ability to boost prediction accuracy. In particular, we observe that in Adaboost, the weight-hike of noisy examples can be constrained by careful application of a cut-off in their weights. Effectiveness of our algorithm is demonstrated empirically using some artificially generated data. We also corroborate this on a number of data sets from UCI repository (Blake and Mertz, 1998). In both experimental settings, the results obtained affirm the efficacy of our approach. Finally, some of the significant characteristics of our technique related to noisy environments have been investigated.
Neural Computing and Applications | 2007
Kihoon Yoon; Stephen Kwek
Learning from imbalanced data occurs frequently in many machine learning applications. One positive example to thousands of negative instances is common in scientific applications. Unfortunately, traditional machine learning techniques often treat rare instances as noise. One popular approach for this difficulty is to resample the training data. However, this results in high false positive predictions. Hence, we propose preprocessing training data by partitioning them into clusters. This greatly reduces the imbalance between minority and majority instances in each cluster. For moderate imbalance ratio, our technique gives better prediction accuracy than other resampling method. For extreme imbalance ratio, this technique serves as a good filter that reduces the amount of imbalance so that traditional classification techniques can be deployed. More importantly, we have successfully applied our techniques to splice site prediction and protein subcellular localization problem, with significant improvements over previous predictors.
international conference hybrid intelligent systems | 2005
Kihoon Yoon; Stephen Kwek
Learning from imbalanced data occurs very frequently in functional genomic applications. One positive example to thousands of negative instances is common in scientific applications. Unfortunately, traditional machine learning treats the extremely small instances as noise. The standard approach for this difficulty is balancing training data by resampling them. However, this results in high false positive predictions. Hence, we propose preprocessing majority instances by partitioning them into clusters. This greatly reduces the ambiguity between minority instances and instances in each cluster. For moderately high imbalance ratio and low in-class complexity, our technique gives better prediction accuracy than undersampling method. For extreme imbalance ratio like splice site prediction problem, we demonstrate that this technique serves as a good filter with almost perfect recall that reduces the amount of imbalance so that traditional classification techniques can be deployed and yield significant improvements over previous predictor. We also show that the technique works for sub cellular localization and post-translational modification site prediction problems.
conference on learning theory | 1998
Stephen Kwek; Leonard Pitt
Abstract. A randomized learning algorithm {POLLY} is presented that efficiently learns intersections of s halfspaces in n dimensions, in time polynomial in both s and n . The learning protocol is the PAC (probably approximately correct) model of Valiant, augmented with membership queries. In particular, {POLLY} receives a set S of m = poly(n,s,1/ε,1/δ) randomly generated points from an arbitrary distribution over the unit hypercube, and is told exactly which points are contained in, and which points are not contained in, the convex polyhedron P defined by the halfspaces. {POLLY} may also obtain the same information about points of its own choosing. It is shown that after poly(n , s , 1/ε , 1/δ , log(1/d) ) time, the probability that {POLLY} fails to output a collection of s halfspaces with classification error at most ε , is at most δ . Here, d is the minimum distance between the boundary of the target and those examples in S that are not lying on the boundary. The parameter log(1/d) can be bounded by the number of bits needed to encode the coefficients of the bounding hyperplanes and the coordinates of the sampled examples S . Moreover, {POLLY} can be extended to learn unions of k disjoint polyhedra with each polyhedron having at most s facets, in time poly(n , k , s , 1/ε , 1/δ , log(1/d) , 1/γ ) where γ is the minimum distance between any two distinct polyhedra.
workshop on algorithms and data structures | 1997
Stephen Kwek
We preset a siple epth-first search strategy for exploring (constructing) an unknown strongly connected graph G with m edges and n vertices by traversing at most min (mn,dn2 + m) edges. Here, d is the minimum number of edges needed to add to G to make it Eulerian. This parameter d is known as the deficiency of a graph and was introduced by Kutten [Kut88]. It was conjectured that graphs with high deficiency. Deng and Papadimitriou [DP90] provided evidence that the conjecture may be true by exhibiting a family of graphs where the robot can be forced to traverse Ω (d2m) edges in the worst case. Since then, there has been some interest in determining whether a graph with deficiency d can be explored by traversing O(poly(d)m) edges. Our algorithm achieves such bound when the graph is dense, say m = Ω(n2).
algorithmic learning theory | 1999
Sally A. Goldman; Stephen Kwek
We present efficient on-line algorithms for learning unions of a constant number of tree patterns, unions of a constant number of one-variable pattern languages, and unions of a constant number of pattern languages with fixed length substitutions. By fixed length substitutions we mean that each occurence of variable xi must be substituted by terminal strings of fixed length l(xi). We prove that if an arbitrary unions of pattern languages with fixed length substitutions can be learned efficiently then DNFs are efficiently learnable in the mistake bound model. Since we use a reduction to Winnow, our algorithms are robust against attribute noise. Furthermore, they can be modified to handle concept drift. Also, our approach is quite general and may be applicable to learning other pattern related classes. For example, we could learn a more general pattern language class in which a penalty (i.e. weight) is assigned to each violation of the rule that a terminal symbol cannot be changed or that a pair of variable symbols, of the same variable, must be substituted by the same terminal string. An instance is positive iff the penalty incurred for violating these rules is below a given tolerable threshold.
international conference hybrid intelligent systems | 2005
Amitava Karmaker; Stephen Kwek
Data with missing attribute-values are quite common in many classification problems. In this paper, we incorporate an expectation-maximization (EM) inspired approach for filling up missing values to decision tree learning with the objective of improving classification accuracy. Here, each missing attribute-value is iteratively filled using a predictor constructed from the known values and predicted values of the missing attribute-values from the previous iteration. We show that our approach significantly outperforms some standard machine learning methods for handling missing values in classification tasks.
algorithmic learning theory | 2002
Sally A. Goldman; Stephen Kwek
We present efficient on-line algorithms for learning unions of a constant number of tree patterns, unions of a constant number of one-variable pattern languages, and unions of a constant number of pattern languages with fixed length substitutions. By fixed length substitutions we mean that each occurrence of variable xi must be substituted by terminal strings of fixed length l(xi). We prove that if arbitrary unions of pattern languages with fixed length substitutions can be learned efficiently then DNFs are efficiently learnable in the mistake bound model. Since we use a reduction to Winnow, our algorithms are robust against attribute noise. Furthermore, they can be modified to handle concept drift. Also, our approach is quite general and we give results to learn a class that generalizes pattern languages.
Collaboration
Dive into the Stephen Kwek's collaboration.
University of Texas Health Science Center at San Antonio
View shared research outputs