Joe Carthy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joe Carthy is active.

Explore More

Publication

Featured researches published by Joe Carthy.

Digital Investigation | 2013

Cloud forensics definitions and critical criteria for cloud forensic capability: An overview of survey results

Keyun Ruan; Joe Carthy; M. Tahar Kechadi; Ibrahim Baggili

With the rapid growth of global cloud adoption in private and public sectors, cloud computing environments is becoming a new battlefield for cyber crime. In this paper, the researcher presents the results and analysis of a survey that was widely circulated among digital forensic experts and practitioners internationally on cloud forensics and critical criteria for cloud forensic capability in order to better understand the key fundamental issues of cloud forensics such as its definition, scope, challenges, opportunities as well as missing capabilities based on the 257 collected responses.

international acm sigir conference on research and development in information retrieval | 2001

Combining semantic and syntactic document classifiers to improve first story detection

Nicola Stokes; Joe Carthy

In this paper we describe a type of data fusion involving the combination of evidence derived from multiple document representations. Our aim is to investigate if a composite representation can improve the online detection of novel events in a stream of broadcast news stories. This classification process otherwise known as first story detection FSD (or in the Topic Detection and Tracking pilot study as online new event detection [1]), is one of three main classification tasks defined by the TDT initiative. Our composite document representation consists of a semantic representation (based on the lexical chains derived from a text) and a syntactic representation (using proper nouns). Using the TDT1 evaluation methodology, we evaluate a number of document representation combinations using these document classifiers.

2010 eCrime Researchers Summit | 2010

Feature selection for Spam and Phishing detection

Fergus Toolan; Joe Carthy

Unsolicited Bulk Email (UBE) has become a large problem in recent years. The number of mass mailers in existence is increasing dramatically. Automatically detecting UBE has become a vital area of current research. Many email clients (such as Outlook and Thunderbird) already have junk filters built in. Mass mailers are continually evolving and overcoming some of the junk filters. This means that the need for research in the area is ongoing. Many existing techniques seem to randomly choose the features that will be used for classification. This paper aims to address this issue by investigating the utility of over 40 features that have been used in recent literature. Information gain for these features are calculated over Ham, Spam and Phishing corpora.

international conference on computational linguistics | 2008

Investigating Statistical Techniques for Sentence-Level Event Classification

Martina Naughton; Nicola Stokes; Joe Carthy

The ability to correctly classify sentences that describe events is an important task for many natural language applications such as Question Answering (QA) and Summarisation. In this paper, we treat event detection as a sentence level text classification problem. We compare the performance of two approaches to this task: a Support Vector Machine (SVM) classifier and a Language Modeling (LM) approach. We also investigate a rule based method that uses hand crafted lists of terms derived from WordNet. These terms are strongly associated with a given event type, and can be used to identify sentences describing instances of that type. We use two datasets in our experiments, and evaluate each technique on six distinct event types. Our results indicate that the SVM consistently outperform the LM technique for this task. More interestingly, we discover that the manual rule based classification system is a very powerful baseline that outperforms the SVM on three of the six event types.

2009 eCrime Researchers Summit | 2009

Phishing detection using classifier ensembles

Fergus Toolan; Joe Carthy

This paper introduces an approach to classifying emails into Phishing / non-Phishing categories using the C5.0 algorithm which achieves very high precision and an ensemble of other classifiers that achieve high recall. The representation of instances used in this paper is very small consisting of only five features. Results of an evaluation of this system, using over 8,000 emails approximately half of which were phishing emails and the remainder legitimate, are presented. These results show the benefits of using this recall boosting technique over that of any individual classifier or collection of classifiers.

Expert Systems With Applications | 2015

An analysis of the coherence of descriptors in topic modeling

Derek O'Callaghan; Derek Greene; Joe Carthy; Pádraig Cunningham

We evaluate the coherence and generality of topic descriptors found by LDA and NMF.Six new and existing corpora were specifically compiled for this evaluation.A new coherence measure using word2vec-modeled term vector similarity is proposed.NMF regularly produces more coherent topics, where term weighting is influential.NMF may be more suitable for topic modeling of niche or non-mainstream corpora. In recent years, topic modeling has become an established method in the analysis of text corpora, with probabilistic techniques such as latent Dirichlet allocation (LDA) commonly employed for this purpose. However, it might be argued that adequate attention is often not paid to the issue of topic coherence, the semantic interpretability of the top terms usually used to describe discovered topics. Nevertheless, a number of studies have proposed measures for analyzing such coherence, where these have been largely focused on topics found by LDA, with matrix decomposition techniques such as Non-negative Matrix Factorization (NMF) being somewhat overlooked in comparison. This motivates the current work, where we compare and analyze topics found by popular variants of both NMF and LDA in multiple corpora in terms of both their coherence and associated generality, using a combination of existing and new measures, including one based on distributional semantics. Two out of three coherence measures find NMF to regularly produce more coherent topics, with higher levels of generality and redundancy observed with the LDA topic descriptors. In all cases, we observe that the associated term weighting strategy plays a major role. The results observed with NMF suggest that this may be a more suitable topic modeling method when analyzing certain corpora, such as those associated with niche or non-mainstream domains.

international conference on human language technology research | 2001

First story detection using a composite document representation

Nicola Stokes; Joe Carthy

In this paper, we explore the effects of data fusion on First Story Detection [1] in a broadcast news domain. The data fusion element of this experiment involves the combination of evidence derived from two distinct representations of document content in a single cluster run. Our composite document representation consists of a concept representation (based on the lexical chains derived from a text) and free text representation (using traditional keyword index terms). Using the TDT1 evaluation methodology we evaluate a number of document representation strategies and propose reasons why our data fusion experiment shows performance improvements in the TDT domain.

Information Retrieval | 2010

Sentence-level event classification in unstructured texts

Martina Naughton; Nicola Stokes; Joe Carthy

The ability to correctly classify sentences that describe events is an important task for many natural language applications such as Question Answering (QA) and Text Summarisation. In this paper, we treat event detection as a sentence level text classification problem. Overall, we compare the performance of discriminative versus generative approaches to this task: namely, a Support Vector Machine (SVM) classifier versus a Language Modeling (LM) approach. We also investigate a rule-based method that uses handcrafted lists of ‘trigger’ terms derived from WordNet. Two datasets are used in our experiments to test each approach on six different event types, i.e., Die, Attack, Injure, Meet, Transport and Charge-Indict. Our experimental results show that the trained SVM classifier significantly outperforms the simple rule-based system and language modeling approach on both datasets: ACE (F1 66% vs. 45% and 38%, respectively) and IBC (F1 92% vs. 88% and 74%, respectively). A detailed error analysis framework for the task is also provided which separates errors into different types: semantic, inference, continuous and trigger-less.

international conference on digital forensics | 2012

Cloud Computing Reference Architecture and Its Forensic Implications: A Preliminary Analysis

Keyun Ruan; Joe Carthy

In this paper, researchers provide a preliminary analysis on the forensic implications of cloud computing reference architecture, on the segregation of duties of cloud actors in cloud investigations, forensic artifacts on all layers of cloud system stack, cloud actors interaction scenarios in cloud investigations, and forensic implications of all cloud deployment models. The analysis serves as feedback and input for integrating forensic considerations into cloud standardization processes from early stage, and specifies requirements and directions for further standardization efforts.

systems, man and cybernetics | 2002

Lexical chains for topic tracking

Joe Carthy; M. Sherwood-Smith

Describes research into the use of lexical chains to build effective topic tracking systems. Lexical chaining is a method of grouping lexically related terms into so called lexical chains, using simple natural language processing techniques. Topic tracking involves tracking a given news event in a stream of news stories i.e. finding all subsequent storks in the news stream that discuss the given event. It has grown out of the Topic Detection and Tracking (TDT) initiative sponsored by DARPA. The paper describes the results of a topic tracking system, LexTrack, based on lexical chaining and compares it to a tracking system designed using traditional IR techniques.

Explore More