Harris Drucker | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Harris Drucker is active.

Explore More

Publication

Featured researches published by Harris Drucker.

IEEE Transactions on Neural Networks | 1999

Support vector machines for spam categorization

Harris Drucker; Donghui Wu; Vladimir Vapnik

We study the use of support vector machines (SVMs) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVMs performed best when using binary features. For both data sets, boosting trees and SVMs had acceptable test performance in terms of accuracy and speed. However, SVMs had significantly less training time.

Information Processing and Management | 2002

Support vector machines: relevance feedback and information retrieval

Harris Drucker; Behzad Shahrary; David C. Gibbon

We compare support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. It is assumed a preliminary search finds a set of documents that the user marks as relevant or not and then feedback iterations commence. Particular attention is paid to IR searches where the number of relevant documents in the database is low and the preliminary set of documents used to start the search has few relevant documents. Experiments show that if inverse document frequency (IDF) weighting is not used because one is unwilling to pay the time penalty needed to obtain these features, then SVMs are better whether using term-frequency (TF) or binary weighting. SVM performance is marginally better than Ide dec-hi if TF-IDF weighting is used and there is a reasonable number of relevant documents found in the preliminary search. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred.

conference on image and video retrieval | 2008

Content personalization and adaptation for three-screen services

Zhu Liu; David C. Gibbon; Harris Drucker; Andrea Basso

Three-screen services provide the right solution for consumers to access rich multimedia resources by any device, anytime and anywhere. In this paper, we describe a prototype system of content personalization and adaptation for three-screen services. The system continuously acquires content from TV broadcast feeds, indexes and adapts the content for users according to their interests defined in preference profiles. Automatically compiled segments of content can be rendered on a variety of devices that the customers prefer to facilitate a smoother video consuming experience. Simulation results show that the proposed content analysis modules, including shot boundary detection, anchorperson detection, and multimodal story segmentation are effective. The resulting personalized content is suitable for consumption on devices with limited input capabilities.

Computational Statistics & Data Analysis | 2002

Effect of pruning and early stopping on performance of a boosting ensemble

Harris Drucker

Generating an architecture for an ensemble of boosting machines involves making a series of design decisions. One design decision is whether to use simple weak learners such as decision tree stumps or more complicated weak learners such as large decision trees or neural networks. Another design decision is the training algorithm for the constituent weak learners. Here we concentrate on binary decision trees and show that the best results are obtained using the Z-criterion to build the trees without pruning. In using neural networks, early stopping is recommended as an approach to lower the training time. In examining the multi-class boosting algorithms, the jury is still out on whether using the all-pairs binary learning algorithm or pseudo-loss is better.

international conference on acoustics, speech, and signal processing | 2005

Semantic data mining of short utterances

Lee Begeja; Harris Drucker; David C. Gibbon; Patrick Haffner; Zhu Liu; Bernard S. Renger; Behzad Shahraray

This paper introduces a methodology for speech data mining along with the tools that the methodology requires. We show how they increase the productivity of the analyst who seeks relationships among the contents of multiple utterances and ultimately must link some newly discovered context into testable hypotheses about new information. While, in its simplest form, one can extend text data mining to speech data mining by using text tools on the output of a speech recognizer, we have found that it is not optimal. We show how data mining techniques that are typically applied to text should be modified to enable an analyst to do effective semantic data mining on a large collection of short speech utterances. For the purposes of this paper, we examine semantic data mining in the context of semantic parsing and analysis in a specific situation involving the solution of a business problem that is known to the analyst. We are not attempting a generic semantic analysis of a set of speech. Our tools and methods allow the analyst to mine the speech data to discover the semantics that best cover the desired solution. The coverage, in this case, yields a set of Natural Language Understanding (NLU) classifiers that serve as testable hypotheses.

international symposium on neural networks | 1990

Implementation of minimum error expert system

Harris Drucker

The author shows how to implement a neural net expert system that is optimum in the minimum error sense and recognizes objects based on feature extraction. The expert system can handle features that may not be appropriate to describe certain objects (termed dont care features) or features that cannot be extracted (unknowns or dont know features) by using marginal density functions, if needed. The implementation uses linear operations and trivial nonlinear transfer functions and is amenable to VLSI implementation. The application to a real-world problem of identifying ordnance is discussed

international symposium on neural networks | 1999

Z splitting criterion for growing trees and boosting

Harris Drucker

A splitting criterion that arrives out of the context of a new boosting algorithm is used to construct classification trees. Trees constructed using this Z function are compared to those using the entropy function of C4.5 and are found to give much lower error rates. The Z function is also used to construct boosting machines which, when compared to other implementations, give lower error rates.

neural information processing systems | 1996