Aynur A. Dayanik | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aynur A. Dayanik is active.

Explore More

Publication

Featured researches published by Aynur A. Dayanik.

international acm sigir conference on research and development in information retrieval | 2006

Constructing informative prior distributions from domain knowledge in text classification

Aynur A. Dayanik; David Lewis; David Madigan; Vladimir Menkov; Alexander Genkin

Supervised learning approaches to text classification are in practice often required to work with small and unsystematically collected training sets. The alternative to supervised learning is usually viewed to be building classifiers by hand, using a domain experts understanding of which features of the text are related to the class of interest. This is expensive, requires a degree of sophistication about linguistics and classification, and makes it difficult to use combinations of weak predictors. We propose instead combining domain knowledge with training examples in a Bayesian framework. Domain knowledge is used to specify a prior distribution for the parameters of a logistic regression model, and labeled training data is used to produce a posterior distribution, whose mode we take as the final classifier. We show on three text categorization data sets that this approach can rescue what would otherwise be disastrously bad training situations, producing much more effective classifiers.

Artificial Intelligence | 2003

Converting numerical classification into text classification

Sofus A. Macskassy; Haym Hirsh; Arunava Banerjee; Aynur A. Dayanik

Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional feature-vector-based learning methods, one could treat the presence or absence of a word as a Boolean feature and use these binary-valued features together with the numerical features. However, the use of a text-classification system on this is a bit more problematic-in the most straight-forward approach each number would be considered a distinct token and treated as a word. This paper presents an alternative approach for the use of text classification methods for supervised learning problems with numerical-valued features in which the numerical features are converted into bag-of-words features, thereby making them directly usable by text classification methods. We show that even on purely numerical-valued data the results of text classification on the derived text-like representation outperforms the more naive numbers-as-tokens representation and, more importantly, is competitive with mature numerical classification methods such as C4.5, Ripper, and SVM. We further show that on mixed-mode data adding numerical features using our approach can improve performance over not adding those features.

Knowledge Based Systems | 2010

Feature interval learning algorithms for classification

Aynur A. Dayanik

This paper presents Feature Interval Learning algorithms (FIL) which represent multi-concept descriptions in the form of disjoint feature intervals. The FIL algorithms are batch supervised inductive learning algorithms and use feature projections of the training instances to represent induced classification knowledge. The concept description is learned separately for each feature and is in the form of a set of disjoint intervals. The class of an unseen instance is determined by the weighted-majority voting of the feature predictions. The basic FIL algorithm is enhanced with adaptive interval and feature weight schemes in order to handle noisy and irrelevant features. The algorithms are empirically evaluated on twelve data sets from the UCI repository and are compared with k-NN, k-NNFP, and NBC classification algorithms. The experiments demonstrate that the FIL algorithms are robust to irrelevant features and missing feature values, achieve accuracy comparable to the best of the existing algorithms with significantly less average running times.

Expert Systems With Applications | 2012

Learning feature-projection based classifiers

Aynur A. Dayanik

This paper aims at designing better performing feature-projection based classification algorithms and presents two new such algorithms. These algorithms are batch supervised learning algorithms and represent induced classification knowledge as feature intervals. In both algorithms, each feature participates in the classification by giving real-valued votes to classes. The prediction for an unseen example is the class receiving the highest vote. The first algorithm, OFP.MC, learns on each feature pairwise disjoint intervals which minimize feature classification error. The second algorithm, GFP.MC, constructs feature intervals by greedily improving the feature classification error. The new algorithms are empirically evaluated on twenty datasets from the UCI repository and are compared with the existing feature-projection based classification algorithms (FIL.IF, VFI5, CFP, k-NNFP, and NBC). The experiments demonstrate that the OFP.MC algorithm outperforms other feature-projection based classification algorithms. The GFP.MC algorithm is slightly inferior to the OFP.MC algorithm, but, if it is used for datasets with large number of instances, then it reduces the space requirement of the OFP.MC algorithm. The new algorithms are insensitive to boundary noise unlike the other feature-projection based classification algorithms considered here.

Archive | 1999