Ross W. Gayler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ross W. Gayler is active.

Explore More

Publication

Featured researches published by Ross W. Gayler.

IEEE Transactions on Knowledge and Data Engineering | 2012

Resilient Identity Crime Detection

Clifton Phua; Kate Smith-Miles; Vincent C. S. Lee; Ross W. Gayler

Identity crime is well known, prevalent, and costly; and credit application fraud is a specific case of identity crime. The existing nondata mining detection system of business rules and scorecards, and known fraud matching have limitations. To address these limitations and combat identity crime in real time, this paper proposes a new multilayered detection system complemented with two additional layers: communal detection (CD) and spike detection (SD). CD finds real social relationships to reduce the suspicion score, and is tamper resistant to synthetic social relationships. It is the whitelist-oriented approach on a fixed set of attributes. SD finds spikes in duplicates to increase the suspicion score, and is probe-resistant for attributes. It is the attribute-oriented approach on a variable-size set of attributes. Together, CD and SD can detect more types of attacks, better account for changing legal behavior, and remove the redundant attributes. Experiments were carried out on CD and SD with several million real credit applications. Results on the data support the hypothesis that successful credit application fraud patterns are sudden and exhibit sharp spikes in duplicates. Although this research is specific to credit application fraud detection, the concept of resilience, together with adaptivity and quality data discussed in the paper, are general to the design, implementation, and evaluation of all detection systems.

international symposium on neural networks | 2013

Analogical mapping and inference with binary spatter codes and sparse distributed memory

Blerim Emruli; Ross W. Gayler; Fredrik Sandin

Analogy-making is a key function of human cognition. Therefore, the development of computational models of analogy that automatically learn from examples can lead to significant advances in cognitive systems. Analogies require complex, relational representations of learned structures, which is challenging for both symbolic and neurally inspired models. Vector symbolic architectures (VSAs) are a class of connectionist models for the representation and manipulation of compositional structures, which can be used to model analogy. We study a novel VSA network for the analogical mapping of compositional structures, which integrates an associative memory known as sparse distributed memory (SDM). The SDM enables non-commutative binding of compositional structures, which makes it possible to predict novel patterns in sequences. To demonstrate this property we apply the network to a commonly used intelligence test called Ravens Progressive Matrices. We present results of simulation experiments for the Ravens task and calculate the probability of prediction error at 95% confidence level. We find that non-commutative binding requires sparse activation of the SDM and that 10-20% concept-specific activation of neurons is optimal. The optimal dimensionality of the binary distributed representations of the VSA is of the order 104, which is comparable with former results and the average synapse count of neurons in the cerebral cortex.

knowledge discovery and data mining | 2013

Dynamic Similarity-Aware Inverted Indexing for Real-Time Entity Resolution

Banda Ramadan; Peter Christen; Huizhi Liang; Ross W. Gayler; David Hawking

Entity resolution is the process of identifying groups of records in a single or multiple data sources that represent the same real-world entity. It is an important tool in data de-duplication, in linking records across databases, and in matching query records against a database of existing entities. Most existing entity resolution techniques complete the resolution process offline and on static databases. However, real-world databases are often dynamic, and increasingly organizations need to resolve entities in real-time. Thus, there is a need for new techniques that facilitate working with dynamic databases in real-time. In this paper, we propose a dynamic similarity-aware inverted indexing technique (DySimII) that meets these requirements. We also propose a frequency-filtered indexing technique where only the most frequent attribute values are indexed. We experimentally evaluate our techniques on a large real-world voter database. The results show that when the index size grows no appreciable increase is found in the average record insertion time (around 0.1 msec) and in the average query time (less than 0.1 sec). We also find that applying the frequency-filtered approach reduces the index size with only a slight drop in recall.

European Journal of Operational Research | 2009

On the communal analysis suspicion scoring for identity crime in streaming credit applications

Clifton Phua; Ross W. Gayler; Vincent C. S. Lee; Kate Smith-Miles

This paper describes a rapid technique: communal analysis suspicion scoring (CASS), for generating numeric suspicion scores on streaming credit applications based on implicit links to each other, over both time and space. CASS includes pair-wise communal scoring of identifier attributes for applications, definition of categories of suspiciousness for application-pairs, the incorporation of temporal and spatial weights, and smoothed k-wise scoring of multiple linked application-pairs. Results on mining several hundred thousand real credit applications demonstrate that CASS reduces false alarm rates while maintaining reasonable hit rates. CASS is scalable for this large data sample, and can rapidly detect early symptoms of identity crime. In addition, new insights have been observed from the relationships between applications.

pacific-asia conference on knowledge discovery and data mining | 2013

Adaptive Temporal Entity Resolution on Dynamic Databases

Peter Christen; Ross W. Gayler

Entity resolution is the process of matching records that refer to the same entities from one or several databases in situations where the records to be matched do not include unique entity identifiers. Matching therefore has to rely upon partially identifying information, such as names and addresses. Traditionally, entity resolution has been applied in batch-mode and on static databases. However, increasingly organisations are challenged by the task of having a stream of query records that need to be matched to a database of known entities. As these query records are matched, they are inserted into the database as either representing a new entity, or as the latest embodiment of an existing entity. We investigate how temporal and dynamic aspects, such as time differences between query and database records and changes in database content, affect matching quality. We propose an approach that adaptively adjusts similarities between records depending upon the values of the records’ attributes and the time differences between records. We evaluate our approach on synthetic data and a large real US voter database, with results showing that our approach can outperform static matching approaches.

Proceedings of the 2007 international workshop on Domain driven data mining | 2007

Adaptive communal detection in search of adversarial identity crime

Clifton Phua; Vincent C. S. Lee; Kate Smith-Miles; Ross W. Gayler

This paper is on adaptive real-time searching of credit application data streams for identity crime with many search parameters. Specifically, we concentrated on handling our domain-specific adversarial activity problem with the adaptive Communal Analysis Suspicion Scoring (CASS) algorithm. CASSs main novel theoretical contribution is in the formulation of State-of- Alert (SoA) which sets the condition of reduced, same, or heightened watchfulness; and Parameter-of-Change (PoC) which improves detection ability with pre-defined parameter values for each SoA. With pre-configured SoA policy and PoC strategy, CASS determines when, what, and how much to adapt its search parameters to ongoing adversarial activity. The above approach is validated with three sets of experiments, where each experiment is conducted on several million real credit applications and measured with three appropriate performance metrics. Significant improvements are achieved over previous work, with the discovery of some practical insights of adaptivity into our domain.

Journal of Data and Information Quality | 2015

Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution

Banda Ramadan; Peter Christen; Huizhi Liang; Ross W. Gayler

Real-time Entity Resolution (ER) is the process of matching query records in subsecond time with records in a database that represent the same real-world entity. Indexing techniques are generally used to efficiently extract a set of candidate records from the database that are similar to a query record, and that are to be compared with the query record in more detail. The sorted neighborhood indexing method, which sorts a database and compares records within a sliding window, has been successfully used for ER of large static databases. However, because it is based on static sorted arrays and is designed for batch ER that resolves all records in a database rather than resolving those relating to a single query record, this technique is not suitable for real-time ER on dynamic databases that are constantly updated. We propose a tree-based technique that facilitates dynamic indexing based on the sorted neighborhood method, which can be used for real-time ER, and investigate both static and adaptive window approaches. We propose an approach to reduce query matching times by precalculating the similarities between attribute values stored in neighboring tree nodes. We also propose a multitree solution where different sorting keys are used to reduce the effects of errors and variations in attribute values on matching quality by building several distinct index trees. We experimentally evaluate our proposed techniques on large real datasets, as well as on synthetic data with different data quality characteristics. Our results show that as the index grows, no appreciable increase occurs in both record insertion and query times, and that using multiple trees gives noticeable improvements on matching quality with only a small increase in query time. Compared to earlier indexing techniques for real-time ER, our approach achieves significantly reduced indexing and query matching times while maintaining high matching accuracy.

intelligence and security informatics | 2006

Temporal representation in spike detection of sparse personal identity streams

Clifton Phua; Vincent C. S. Lee; Ross W. Gayler; Kate A. Smith

Identity crime has increased enormously over the recent years. Spike detection is important because it highlights sudden and sharp rises in intensity relative to the current identity attribute value (which can be indicative of abuse). This paper proposes the new spike analysis framework for monitoring sparse personal identity streams. For each identity example, it detects spikes in single attribute values and integrates multiple spikes from different attributes to produce a numeric suspicion score. Although only temporal representation is examined here, experimental results on synthetic and real credit applications reveal some conditions on which the framework will perform well.

pacific-asia conference on knowledge discovery and data mining | 2014

Noise-Tolerant Approximate Blocking for Dynamic Real-Time Entity Resolution

Huizhi Liang; Yanzhe Wang; Peter Christen; Ross W. Gayler

Entity resolution is the process of identifying records in one or multiple data sources that represent the same real-world entity. This process needs to deal with noisy data that contain for example wrong pronunciation or spelling errors. Many real world applications require rapid responses for entity queries on dynamic datasets. This brings challenges to existing approaches which are mainly aimed at the batch matching of records in static data. Locality sensitive hashing (LSH) is an approximate blocking approach that hashes objects within a certain distance into the same block with high probability. How to make approximate blocking approaches scalable to large datasets and effective for entity resolution in real-time remains an open question. Targeting this problem, we propose a noise-tolerant approximate blocking approach to index records based on their distance ranges using LSH and sorting trees within large sized hash blocks. Experiments conducted on both synthetic and real-world datasets show the effectiveness of the proposed approach.

international conference on data mining | 2006

Communal Detection of Implicit Personal Identity Streams

Clifton Phua; Ross W. Gayler; Kate Smith-Miles; Vincent C. S. Lee

The purpose of this paper is to outline some of the major developments of an identity crime/fraud stream mining system. Communal detection is about finding real communities of interest. The algorithm itself is unsupervised, single-pass, differentiates between normal and anomalous links, and mitigates the suspicion of normal links with a dynamic global whitelist. It is part of the important and novel communal detection framework introduced here for monitoring implicit personal identity streams. For each incoming identity example, it creates one of three types of single link (black, white, or anomalous) against any previous example within a set window. Subsequently, it integrates possible multiple links to produce a smoothed numeric suspicion score. In a principled stream-like fashion and using eighteen different parameter settings replicated over three large window sizes, this paper highlights and discusses significant score results from mining a few million recent credit applications

Explore More