Simon Egerton | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Simon Egerton is active.

Explore More

Publication

Featured researches published by Simon Egerton.

data and knowledge engineering | 2010

Information extraction for search engines using fast heuristic techniques

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

We study the structured records of web pages and the relevant problems associated with the extraction and alignment of these structured records. Current automatic wrappers are complicated because they take into consideration the problems of locating relevant data region using visual cues and the use of complicated algorithms to check the similarity of data records. In this paper, we develop a non-visual automatic wrapper which questions the need for complex visual based wrappers in data extraction. The novel techniques for our wrapper are (1) filtering rules to detect and filter out irrelevant data records, (2) a tree matching algorithm using frequency measures to increase the speed of data extraction, (3) an algorithm to calculate the number and size of the components of data records to detect the correct data region, (4) a data alignment algorithm which is able to align iterative (repetitive HTML command tags) and disjunctive (optional) data items and (5) a data merging and partitioning method to solve the imperfect segmentation problem (the problem of correctly identifying the atomic entities in data items). Results show that our wrapper is as robust and in many cases outperforms the state of the art wrappers such as ViNT and DEPTA. This wrapper could have significant speed advantages when processing large volumes of web sites data, which could be helpful in meta search engine development.

information sciences, signal processing and their applications | 2010

Mining Wikipedia Knowledge to improve document indexing and classification

Ramesh Kumar Ayyasamy; Bashar Tahayna; Saadat M. Alhashmi; Siew Eu-Gene; Simon Egerton

Weblogs are an importan source of information that requires automatic techniques to categorize them into “topic-based” content, to facilitate their future browsing and retrieval. In this paper we propose and illustrate the effectiveness of a new tf. idf measure. The proposed Conf.idf, Catf.idf measures are solely based on the mapping of terms-to-concepts-to-categories (TCONCAT) method that utilizes Wikipedia. The Knowledge base-Wikipedia is considered as a large scale Web encyclopaedia, that has high-quality and huge number of articles and categorical indexes. Using this system, our proposed framework consists of two stages to solve weblog classification problem. The first stage is to find out the terms belonging to a unique concept (article), as well as to disambiguate the terms belonging to more than one concept. The second stage is the determination of the categories to which these found concepts belong to. Experimental result confirms that, proposed system can distinguish the weblogs that belongs to more than one category efficiently and has a better performance and success than the traditional statistical Natural Language Processing-NLP approaches.

2010 International Conference on Information Retrieval & Knowledge Management (CAMP) | 2010

ViWER- data extraction for search engine results pages using visual cue and DOM Tree

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

Visual wrappers use visual information in addition to the DOM Tree properties in the extraction of data records. The important feature of a visual assisted wrapper is the use of the bounding box of HTML tag to detect relevant data region which contains the required data records. However, a closer look indicates that additional visual cue such as the size of bounding box can be used to check the similarity of data records. In this paper, we present two main features of our algorithm in data extraction. We develop a tree matching algorithm to check the similarity of data records. This simplifies the complicated process of a full tree matching algorithm. We also use the size of bounding box to further improve the similarity check of data records. Our study shows that using the size of text and image in a wrapper design can improve the accuracy in detecting the correct data region (search results output from search engine results pages). Results show that our wrapper is highly effective in data extraction.

acm symposium on applied computing | 2010

WMS-extracting multiple sections data records from search engine results pages

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

In this paper, we develop an automatic wrapper for the extraction of multiple sections data records from search engine results pages. In the Information Extraction world, less attention has been focused on the development of wrappers for the extraction of multiple sections data records. This is evidenced by the fact that there is only one automatic wrapper, MSE developed for this purpose. Using the separation distance of data records and sections, MSE is able to distinguish sections and data records and extract them from search engine results pages. In this study, our approach is the use of DOM tree properties to develop an adaptive search method which is able to detect, differentiate, and partition sections and data records. The multiple sections data records labeled are used to pass through a few filtering stages, each filter is designed to filter out a particular group of irrelevant data until one data region containing the relevant records is found. Our filtering rules are designed based on visual cue such as text and image size obtained from the browser rendering engine. Experimental results show that our wrapper is able to obtain better results than the currently available MSE wrapper.

international conference on computer research and development | 2010

Aligning Data Records Using WordNet

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

Current automatic wrappers using DOM tree to align data records generally have limitations such as the inability to align iterative (repetitive and similar) and disjunctive (optional) data items. Our study on the properties of data records shows that these data items can be aligned based on their semantic properties. In this context, we propose an ontological technique using existing lexical database for English (WordNet) for the alignment of data records. Regular expression rules are developed to align the data items extracted so that they can be used for further processing. Experimental results indicate that our technique is robust and performs better than the existing state of the art wrappers.

soft computing and pattern recognition | 2009

DTM - Extracting Data Records from Search Engine Results Page Using Tree Matching Algorithm

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

In this paper, we develop a non-visual automatic wrapper for extracting data records from search engine results page. The novel techniques for our wrapper are (1) filtering rules to detect and filter out irrelevant data records, (2) a tree matching algorithm using frequency measures to increase the speed of data extraction (3) an algorithm to calculate the number and size of the components of data records to detect the correct data region. Results show that our wrapper is as robust and in many cases outperforms the state of the art wrappers such as ViNT and DEPTA. This wrapper could have significant speed advantages when processing large volumes of web sites data, which could be helpful in meta search engine development.

international conference on neural information processing | 2014

Dynamic Programming for Guided Gene Transfer in Bacterial Memetic Algorithm

Tiong Yew Tang; Simon Egerton; János Botzheim; Naoyuki Kubota

Evolutionary Computation (EC) approaches are known to empirically solve NP-hard optimisation problems. However, the genetic operators in these approaches have yet to be fully investigated and exploited for further improvements. Hence, we propose a novel genetic operator called Dynamic Programming Gene Transfer (DPGT) operator to improve the existing gene transfer operator in the Bacterial Memetic Algorithm (BMA). DPGT integrates dynamic programming based edit distance comparisons during gene transfer operator in BMA. DPGT operator enforces good gene transfers between individuals by conducting edit distance checks before transferring the genes. We tested the DPGT operator in an artificial learning agent ant’s perception-action problem. The experimental results revealed that DPGT gained overall improvements of training accuracy without any significant impact to the training processing time.

2014 10th France-Japan/ 8th Europe-Asia Congress on Mecatronics (MECATRONICS2014- Tokyo) | 2014

Stress-inspired dynamic optimisation on working memory for cognitive robot social support systems

Tiong Yew Tang; Simon Egerton; János Botzheim; Naoyuki Kubota

Robot social support systems such as robot partners game interactions with elderly people are very important in ageing societies. Human-robot interaction can decrease the risk of ageing disease such as dementia and thus improve the overall quality of life for the elderly people. However, in order for the robot partner to have successful game interactions with the elderly people, the robot partners need to be equipped with a certain degree of cognitive intelligence to guess the meaning and context of game interactions. In this paper, we discuss a biological stress-inspired model for the robots cognitive intelligence with dynamic optimisation on its working memory. We name this novel robots cognitive framework as Advanced Intelligence Cognitive Optimisation (AICO). AICO is a server framework for computational intensive cognitive processing for the smart phone robot known as iPhonoid. We have conducted physical robot experiments with our proposed iPhonoid AICO framework on Rényi-Ulam guessing game with real human subjects. The experimental results show that the proposed AICO framework successfully increased the robots guessing performance in the game interactions. At the same time, the robot behaves according to its emotional conditions to make the game play interesting for the elderly people.

international conference on robotics and automation | 2011

Monocular viewpoint invariant human activity recognition

Zaw Zaw Htike; Simon Egerton; Kuang Ye Chow

One of the grand goals of robotics is to have assistive robots living side-by-side with humans, autonomously assisting humans in everyday activities. To be able to interact with humans and assist them, robots must be able to understand and interpret human activities. There is a growing interest in the problem of human activity recognition. Despite much progress, most computer vision researchers have narrowed the problem towards fixed camera viewpoint owing to inherent difficulty to train their systems across all possible viewpoints. However, since the robots and humans are free to move around in the environment, the viewpoint of a robot with respect to a person varies all the time. Therefore, we attempt to relax the infamous fixed viewpoint assumption and present a novel and efficient framework to recognize and classify human activities from monocular video source from arbitrary viewpoint. The proposed framework comprises of two stages: human pose recognition and human activity recognition. In the pose recognition stage, an ensemble of pose models performs inference on each video frame. Each pose model estimates the probability that the given frame contains the corresponding pose. Over a sequence of frames, each pose model forms a time series. In the activity recognition stage, we use nearest neighbor, with dynamic time warping as a distance measure, to classify pose time series. We have built a small-scale proof-of-concept model and performed some experiments on three publicly available datasets. The satisfactory experimental results demonstrate the efficacy of our framework and encourage us to further develop a full-scale architecture.

international conference on computer research and development | 2010

Visual Data Alignment for Search Engine Results Pages

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

Visual wrappers use visual information in addition to the DOM Tree properties in the extraction of data records. However, a closer look indicates that visual information can also be used to align data records into tabular form. In this paper, we propose a data alignment algorithm to align data records using DOM Tree properties and visual cue of data records. Our data alignment algorithm uses a regular expression rule and incorporates visual cue such as relative position and size of data items to provide options for the alignment of iterative and disjunctive data items. Results show that our wrapper performs better than existing state of the art wrappers.

Explore More