Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joshua Alspector is active.

Publication


Featured researches published by Joshua Alspector.


knowledge discovery and data mining | 2004

Improved robustness of signature-based near-replica detection via lexicon randomization

Aleksander Kolcz; Abdur Chowdhury; Joshua Alspector

Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional duplicate detection techniques relying on direct inter-document similarity computation (e.g., using the cosine measure) are often not feasible given the time and memory performance constraints. On the other hand, fingerprint-based methods, such as I-Match, are very attractive computationally but may be brittle with respect to small changes to document content. We focus on approaches to near-replica detection that are based upon large-collection statistics and present a general technique of increasing their robustness via multiple lexicon randomization. In experiments with large web-page and spam-email datasets the proposed method is shown to consistently outperform traditional I-Match, with the relative improvement in duplicate-document recall reaching as high as 40-60%. The large gains in detection accuracy are offset by only small increases in computational requirements.


electronic imaging | 1997

Duplicate document detection

Joshua Alspector; Abdur Chowdhury; Aleksander Kolcz

In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold.


Archive | 2003

Group based spam classification

Joshua Alspector; Aleksander Kolcz; Abdur Chowdhury


Archive | 2003

Classifier Tuning Based On Data Similarities

Joshua Alspector; Aleksander Kolcz; Abdur Chowdhury


Archive | 2011

Filtering system for providing personalized information in the absence of negative data

Joshua Alspector; Aleksander Kolcz


Archive | 2008

Reliability of duplicate document detection algorithms

Joshua Alspector; Aleksander Kolcz; Abdur Chowdhury


Archive | 2003

Data duplication: an imbalance problem ?

Abdur Chowdhury; Joshua Alspector


conference on email and anti-spam | 2004

The Impact of Feature Selection on Signature-Driven Spam Detection.

Aleksander Kolcz; Abdur Chowdhury; Joshua Alspector


Archive | 2004

Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems

Joshua Alspector; Aleksander Kolcz; Abdur Chowdhury


Archive | 2014

Online adaptive filtering of messages

Joshua Alspector; Aleksander Kolcz

Collaboration


Dive into the Joshua Alspector's collaboration.

Top Co-Authors

Avatar

Aleksander Kolcz

University of Colorado Colorado Springs

View shared research outputs
Top Co-Authors

Avatar

Abdur Chowdhury

Illinois Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge