Bogdan Vrusias | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bogdan Vrusias is active.

Explore More

Publication

Featured researches published by Bogdan Vrusias.

european conference on information retrieval | 2003

Corpus-based thesaurus construction for image retrieval in specialist domains

Khurshid Ahmad; Mariam Tariq; Bogdan Vrusias; Chris Handy

This paper explores the use of texts that are related to an image collection, also known as collateral texts, for building thesauri in specialist domains to aid in image retrieval. Corpus linguistic and information extraction methods are used for identifying key terms and semantic relationships in specialist texts that may be used for query expansion purposes. The specialist domain context imposes certain constraints on the language used in the texts, which makes the texts computationally more tractable.

Neural Computing and Applications | 2001

Choosing feature sets for training and testing self-organising maps: A case study

Khurshid Ahmad; Bogdan Vrusias; Anthony W. Ledford

Statistical pattern recognition techniques, supervised and unsupervised classification techniques being two good examples here, rely on the computations of similarity and distance metrics. Thedistances are computed in a multi-dimensional space. The axes of this space in principle relate to the features inherent in the input data. Usually, such features are chosen by neural network developers, thereby introducing a possible bias. A method of automatically generating feature sets is discussed, with specific reference to the categorisation of streams of free-text news items. The feature sets were generated by a procedure that automatically selects a group of keywords based on a lexico-semantic analysis. Three different types of text streams – headlines only, news summaries and full news items including the body of the text –have been categorised using Self-Organising Feature Maps (SOFM). A method for assessing the discrimination ability of a SOFM, based on Fisher’s Linear Discriminant Rule suggests that the maps trained on vectors related to summaries only provides a fairly accurate cluster when compared with vectors related to full text. The use of summaries as document surrogates for document categorisation is suggested.

international conference on multiple classifier systems | 2003

Combining multiple modes of information using unsupervised neural classifiers

Khurshid Ahmad; Matthew C. Casey; Bogdan Vrusias; Panagiotis Saragiotis

A modular neural network-based system is presented where the component networks learn together to classify a set of complex input patterns. Each pattern comprises two vectors: a primary vector and a collateral vector. Examples of such patterns include annotated images and magnitudes with articulated numerical labels. Our modular system is trained using an unsupervised learning algorithm. One component learns to classify the patterns using the primary vectors and another classifies the same patterns using the collateral vectors. The third combiner network correlates the primary with the collateral. The primary and collateral vectors are mapped on a Kohonen self-organising feature map (SOM), with the combiner based on a variant of Hebbian networks. The classification results appear encouraging in our attempts to classify a set of scene-of-crime images and in our attempts to investigate how pre-school infants relate magnitude to articulated numerical quantities. Certain features of SOMs, namely the topological neighbourhoods of specific nodes, allow for one to many mappings between the primary and collateral maps, hence establishing a broader association between the two vectors when compared with the association due to synchrony in a conventional Hebbian association.

international acm sigir conference on research and development in information retrieval | 2003

Summary evaluation and text categorization

Khurshid Ahmad; Bogdan Vrusias; Paulo C. F. de Oliveira

In general terms the evaluation of a summary depends on how close it is to the chief points in the source text. This begets the question as to what are the chief points in the source text and how is this information used in itself in identifying the source text. This is crucially important when we discuss automatic evaluation of summaries. So the question of main points is the source text. Typically, this would be around a nucleus of keywords. However, the salience, the frequency, and the relationship of the text with other texts in the collection (of these keywords is perhaps) are important. Text categorisation using neural networks explicates these points well and also has a practical impact.

intelligent data engineering and automated learning | 2006

Categorization of large text collections: feature selection for training neural networks

Pensiri Manomaisupat; Bogdan Vrusias; Khurshid Ahmad

Automatic text categorization requires the construction of appropriate surrogates for documents within a text collection. The surrogates, often called document vectors, are used to train learning systems for categorising unseen documents. A comparison of different measures (tfidf and weirdness) for creating document vectors is presented together with two different state-of-theart classifiers: supervised Kohonen’s SOFM and unsupervised Vapniak’s SVM. The methods are tested using two ‘gold standard’ document collections and one data set from a ‘real-world’ news stream. There appears to be an optimal size both for the of document vectors and for the dimensionality of each vector that gives the best compromise between categorization accuracy and training time. The performance of each of the classifiers was computed for five different surrogate vector models: the first two surrogates were created with tfidf and weirdness measures accordingly, the third surrogate was created purely on the basis of high-frequency words in the training corpus, and the fourth vector model was created from a standardised terminology database. Finally, the fifth surrogate (used for evaluation purposes) was based on a random selection of words from the training corpus.

intelligent networking and collaborative systems | 2010

A Knowledge-Driven Architecture for Efficient Resource Discovery in P2P Networks

Athena Eftychiou; Bogdan Vrusias

As shared electronic data increases, it has become more difficult to manage it successfully and the demand for scalable and efficient mechanisms for managing and retrieving data effectively becomes essential. In this paper amore effective P2P architecture is presented, aiming to improve existing resource discovery processes. The proposed architecture is organised as a hierarchical super-peer structure, where super-peers of the network representnetwork’s knowledge that is formalised dynamically using its peers’ resources. The main focus of this paper is the creation of an adaptive hierarchical concept-based P2Ptopology using collective intelligence methods. In that process, unmanageable data is transformed into a structured knowledge based repository of semantic resources. Therefore, the network takes the form of an ontology of conceptually related entities of resource information, as provided by the peers. This knowledge driven approach has benefits over traditional load driven architectures, as the user query context is usually the main driver for managing the performance of the network, and in a way the network can be characterised as proactive rather than reactive. A number of experiments have been undertaken and results demonstrate the advantages of the proposed concept-based architecture over other popular architectures.

CISIS | 2009

Adaptable Text Filters and Unsupervised Neural Classifiers for Spam Detection

Bogdan Vrusias; Ian Golledge

Spam detection has become a necessity for successful email communications, security and convenience. This paper describes a learning process where the text of incoming emails is analysed and filtered based on the salient features identified. The method described has promising results and at the same time significantly better performance than other statistical and probabilistic methods. The salient features of emails are selected automatically based on functions combining word frequency and other discriminating matrices, and emails are then encoded into a representative vector model. Several classifiers are then used for identifying spam, and self-organising maps seem to give significantly better results.

international symposium on neural networks | 2007

Distributing SOM Ensemble Training using Grid Middleware

Bogdan Vrusias; Leonidas Vomvoridis; Lee Gillam

In this paper we explore the distribution of training of self-organised maps (SOM) on grid middleware. We propose a two-level architecture and discuss an experimental methodology comprising ensembles of SOMs distributed over a grid with periodic averaging of weights. The purpose of the experiments is to begin to systematically assess the potential for reducing the overall time taken for training by a distributed training regime against the impact on precision. Several issues are considered: (i) the optimum number of ensembles; (ii) the impact of different types of training data; and (iii) the appropriate period of averaging. The proposed architecture has been evaluated in a grid environment, with clock-time performance recorded.

Ninth International Conference on Information Visualisation (IV'05) | 2005

Visualising an image collection

Khurshid Ahmad; Bogdan Vrusias; Meng Zhu

A system for the visualization of large collections of images, facilitated by an automatically constructed visual thesaurus, is reported. A corpus-based method for extraction of terminology and ontology of a specialist domain, scene-of-crime, is outlined. The challenge when capturing information in a crime scene is how to later visualise the scene, when all exhibits have been removed or altered. Experiments on experts dealing with describing a visual domain (the crime scene) suggest that the inter-indexer variability is limited.

cross language evaluation forum | 2003

Scene of Crime information system: Playing at St. Andrews

Bogdan Vrusias; Mariam Tariq; Lee Gillam

This paper discusses the adaptation of the Scene of Crime Information System, developed within an EPSRC-funded project, to the collection of data within the ImageCLEF track of the Cross Language Evaluation Forum 2003. The adaptations necessary to participate in this activity are detailed and initial results are briefly presented.

Explore More