Mark Truran
Teesside University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mark Truran.
ACM Computing Surveys | 2012
Dong Zhou; Mark Truran; Tim J. Brailsford; Vincent Wade; Helen Ashman
Cross-language information retrieval (CLIR) is an active sub-domain of information retrieval (IR). Like IR, CLIR is centered on the search for documents and for information contained within those documents. Unlike IR, CLIR must reconcile queries and documents that are written in different languages. The usual solution to this mismatch involves translating the query and/or the documents before performing the search. Translation is therefore a pivotal activity for CLIR engines. Over the last 15 years, the CLIR community has developed a wide range of techniques and models supporting free text translation. This article presents an overview of those techniques, with a special emphasis on recent developments.
ACM Transactions on Asian Language Information Processing | 2008
Dong Zhou; Mark Truran; Tim J. Brailsford; Helen Ashman
In this article we describe a hybrid technique for dictionary-based query translation suitable for English-Chinese cross language information retrieval. This technique marries a graph-based model for the resolution of candidate term ambiguity with a pattern-based method for the translation of out-of-vocabulary (OOV) terms. We evaluate the performance of this hybrid technique in an experiment using several NTCIR test collections. Experimental results indicate a substantial increase in retrieval effectiveness over various baseline systems incorporating machine- and dictionary-based translation.
acm multimedia | 2005
Mark Truran; James Goulding; Helen Ashman
Lexical ambiguity in query-based image retrieval is an immemorial problem which has seemingly resisted all countermeasures. In this paper we introduce a methodology that expresses the users of a system and their navigational behaviour as the paramount resource for resolving query term ambiguity. Mass user consensus is modelled within a multi-dimensional feature space and evaluated through cluster analysis. This technique resolves query term ambiguity in a wholly democratic and dynamic fashion, in contrast to the brittle centralised models of contemporary word sense classification systems. The simple approach contained herein leads to several interesting emergent properties.
acm conference on hypertext | 2007
Dong Zhou; James Goulding; Mark Truran; Tim J. Brailsford
Manual hypertext construction is labour intensive and prone to error. Robust systems capable of automatic hypertext generation (AHG) could be of direct benefit to those individuals responsible for hypertext authoring. In this paper we propose a novel technique for the autonomous creation of hypertext which is dependent upon language models. This work is strongly influenced by those algorithms which process the hyperlinked structure of a corpus in an attempt to find authoritative sources. The algorithm was evaluated by experimental comparison with human hypertext authors, and we found that both approaches produced broadly similar results.
Expert Systems With Applications | 2013
Dong Zhou; Mark Truran; Jianxun Liu; Sanrong Zhang
Pseudo-relevance feedback (PRF) is a technique commonly used in the field of information retrieval. The performance of PRF is heavily dependent upon parameter values. When relevance judgements are unavailable, these parameters are difficult to set. In the following paper, we introduce a novel approach to PRF inspired by collaborative filtering (CF). We also describe an adaptive tuning method which automatically sets algorithmic parameters. In a multi-stage evaluation using publicly available datasets, our technique consistently outperforms conventional PRF, regardless of the underlying retrieval model.
Journal of the Association for Information Science and Technology | 2011
Mark Truran; Jan-Felix Schmakeit; Helen Ashman
Previous work has established that search engine queries can be classified according to the intent of the searcher (i.e., why is the user searching, what specifically do they intend to do). In this article, we describe an experiment in which four sets of queries, each set representing a different user intent, are repeatedly submitted to three search engines over a period of 60 days. Using a variety of measurements, we describe the overall stability of the search engine results recorded for each group. Our findings suggest that search engine results for informational queries are significantly more stable than the results obtained using transactional, navigational, or commercial queries.
web intelligence | 2009
Helen Ashman; Michael Antunovic; Christoph Donner; Rebecca Frith; Eric Rebelos; Jan-Felix Schmakeit; Gavin Smith; Mark Truran
In this paper we look at how images can be labelled as a result of click throughs from searches. One approach acts as a filter on image searches specifically, while the other approach propagates labels to images from their containing pages, where those pages were labelled themselves using clickthrough as a filter on text search. Then the paper reports on an experiment where users ranked for relevance six methods for labelling images, comparing the two clickthrough-based methods with flickrs amateur explicit labelling, Gettys professional explicit labelling, Googles standard image search, and the new Google Image Labeller. As well as comparing the accuracy of the proposed image labelling methods and discovering that automatic methods outperform explicit human labelling methods, the experiment suggests clickthrough data is reliable with very few clicks for image classification purposes.
ACM Computing Surveys | 2007
Mark Truran; James Goulding; Helen Ashman
Autonomous authoring tools are routinely used to expedite the translation of large document collections into functioning hypertexts. They are also used to add hyperlinks to pre-existing hypertext structures. In this survey we describe a taxonomy of autonomous hypertext authoring tools. The classification of any given system is determined by the type and nature of the document analysis it performs.
acm conference on hypertext | 2011
Helen Ashman; Michael Antunovic; Satit Chaprasit; Gavin Smith; Mark Truran
The interaction of vast numbers of search engine users with sets of search results sets is a potential source of significant quantities of resource classification data. In this paper we discuss work which uses coselection data (i.e. multiple click-through events generated by the same user on a single search engine result page) as an indicator of mutual relevance between web resources and a means for the automatic clustering of sense-singular resources. The results indicate that coselection can be used in this way. We ground-truthed unambiguous query clustering, forming a foundation for work on automatic ambiguity detection based on the resulting number of generated clusters. Using the cluster overlap by population principle, the extension of previous work allowed determination of synonyms or lingual translations where overlapping clusters indicated the mutual relevance in coselection and subsequently the irrelevance of the actual label inherited from the user query.
acm symposium on applied computing | 2008
Dong Zhou; Mark Truran; Tim J. Brailsford; Helen Ashman; James Goulding
In the field of cross-language information retrieval (CLIR), the resolution of lexical ambiguity is a key challenge. Common mechanisms for the translation of query terms from one language to another typically produce a set of possible translation candidates, rather than some authoritative result. Correctly reducing a list of possible candidates down to a single translation is an enduring problem. Thus far, solutions have concentrated upon the use of the use of term co-occurrence information to guide the process of resolving translation-based ambiguity. In this paper we introduce a new disambiguation strategy which employs a graph-based analysis of generated co-occurrence data to determine the most appropriate translation for a given term.