Eu-Gene Siew | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eu-Gene Siew is active.

Explore More

Publication

Featured researches published by Eu-Gene Siew.

data and knowledge engineering | 2010

Information extraction for search engines using fast heuristic techniques

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

We study the structured records of web pages and the relevant problems associated with the extraction and alignment of these structured records. Current automatic wrappers are complicated because they take into consideration the problems of locating relevant data region using visual cues and the use of complicated algorithms to check the similarity of data records. In this paper, we develop a non-visual automatic wrapper which questions the need for complex visual based wrappers in data extraction. The novel techniques for our wrapper are (1) filtering rules to detect and filter out irrelevant data records, (2) a tree matching algorithm using frequency measures to increase the speed of data extraction, (3) an algorithm to calculate the number and size of the components of data records to detect the correct data region, (4) a data alignment algorithm which is able to align iterative (repetitive HTML command tags) and disjunctive (optional) data items and (5) a data merging and partitioning method to solve the imperfect segmentation problem (the problem of correctly identifying the atomic entities in data items). Results show that our wrapper is as robust and in many cases outperforms the state of the art wrappers such as ViNT and DEPTA. This wrapper could have significant speed advantages when processing large volumes of web sites data, which could be helpful in meta search engine development.

Journal of Accounting and Auditing: Research & Practice | 2012

Factors Influencing Audit Technology Acceptance by Audit Firms: A New I-TOE Adoption Framework

Khairina Rosli; Paul H.P. Yeow; Eu-Gene Siew

Many businesses are now moving to e-business and implementing computerized accounting information systems. This phenomenon has given impact to audit profession in performing IT audit, financial reports audit and tracing electronic source documents. Computer-Assisted-Auditing Techniques and Tools (CAATTs) are audit technologies that allow IT audit work to be performed efficiently, effectively and reduce audit time. However, little is known about CAATTs adoption by public audit firms. This paper presents a new paradigm of Individual-Technology-OrganizationEnvironment (I-TOE) to investigate the acceptance of CAATTs in audit firms. There are gaps that exist in prior literatures which studied CAATTs acceptance from only individual auditor views and did not deliberate on issues from both organizational and individual perspectives. Consequently, this paper contributes to extend the literature by providing a better understanding on relationship of both organizational and individual factors in foreseeing CAATTs adoption and investment. A combination of Unified Theory of Acceptance and Use of Technology 2, and TechnologyOrganization-Environment framework are used as the underlying theories. In addition to that, this paper complements the framework with new variables of technology risk, technology task fit, organization readiness and top management commitment. I-TOE framework contributes to professional audit firms that need to measure CAATTs acceptance for the advancement of audit profession. Future experimental studies may be done to provide evidence and empirically validate I-TOE framework in other domain.

2010 International Conference on Information Retrieval & Knowledge Management (CAMP) | 2010

ViWER- data extraction for search engine results pages using visual cue and DOM Tree

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

Visual wrappers use visual information in addition to the DOM Tree properties in the extraction of data records. The important feature of a visual assisted wrapper is the use of the bounding box of HTML tag to detect relevant data region which contains the required data records. However, a closer look indicates that additional visual cue such as the size of bounding box can be used to check the similarity of data records. In this paper, we present two main features of our algorithm in data extraction. We develop a tree matching algorithm to check the similarity of data records. This simplifies the complicated process of a full tree matching algorithm. We also use the size of bounding box to further improve the similarity check of data records. Our study shows that using the size of text and image in a wrapper design can improve the accuracy in detecting the correct data region (search results output from search engine results pages). Results show that our wrapper is highly effective in data extraction.

Knowledge Engineering Review | 2015

Performance and trends in recent opinion retrieval techniques

Sylvester Olubolu Orimaye; Saadat M. Alhashmi; Eu-Gene Siew

This paper presents trends and performance of opinion retrieval techniques proposed within the last 8 years. We identify major techniques in opinion retrieval and group them into four popular categories. We describe the state-of-the-art techniques for each category and emphasize on their performance and limitations. We then summarize with a performance comparison table for the techniques on different datasets. Finally, we highlight possible future research directions that can help solve existing challenges in opinion retrieval.

pacific rim international conference on artificial intelligence | 2012

Buy it - don't buy it: sentiment classification on amazon reviews using sentence polarity shift

Sylvester Olubolu Orimaye; Saadat M. Alhashmi; Eu-Gene Siew

In recent years, sentiment classification has been an appealing task for so many reasons. However, the subtle manner in which people write reviews has made achieving high accuracy more challenging. In this paper, we investigate the improvements on sentiment classification baselines using sentiment polarity shift in reviews. We focus on Amazon online reviews for different types of product. First, we use our newly-proposed Sentence Polarity Shift (SPS) algorithm on review documents, reducing the relative classification loss due to inconsistent sentiment polarities within reviews by an average of 16% over a supervised sentiment classifier. Second, we build up on a popular supervised sentiment classification baseline by adding different features which provide better improvement over the original baseline. The improvement shown by this technique suggests modeling sentiment classification systems based on polarity shift combined with sentence and document-level features.

acm symposium on applied computing | 2010

WMS-extracting multiple sections data records from search engine results pages

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

In this paper, we develop an automatic wrapper for the extraction of multiple sections data records from search engine results pages. In the Information Extraction world, less attention has been focused on the development of wrappers for the extraction of multiple sections data records. This is evidenced by the fact that there is only one automatic wrapper, MSE developed for this purpose. Using the separation distance of data records and sections, MSE is able to distinguish sections and data records and extract them from search engine results pages. In this study, our approach is the use of DOM tree properties to develop an adaptive search method which is able to detect, differentiate, and partition sections and data records. The multiple sections data records labeled are used to pass through a few filtering stages, each filter is designed to filter out a particular group of irrelevant data until one data region containing the relevant records is found. Our filtering rules are designed based on visual cue such as text and image size obtained from the browser rendering engine. Experimental results show that our wrapper is able to obtain better results than the currently available MSE wrapper.

World Wide Web | 2013

Can predicate-argument structures be used for contextual opinion retrieval from blogs?

Sylvester Olubolu Orimaye; Saadat M. Alhashmi; Eu-Gene Siew

We present the results of our investigation on the use of predicate-argument structures for contextual opinion retrieval. The use of predicate-argument structure for opinion retrieval is a novel approach that exploits the grammatical derivation of sentences to show contextual and subjective relevance. We do not use frequency of certain keywords as it is usually done in keyword-based opinion retrieval approaches. Rather, our novel solution is based on frequency of contextually relevant and subjective sentences. We use a linear relevance model that leverages semantic similarities among predicate-argument structures of sentences. Thus, this paper presents the evaluation results of the linear relevance model. The model does a linear combination of a popular relevance model, our proposed transformed terms similarity model, and the absolute value of a sentence subjectivity scoring scheme. The predicate-argument structures are derived from the grammatical derivations of natural language query topics and the well formed sentences from blog documents. The derived predicate-argument structures are then semantically compared to compute an opinion relevance score. Our scoring technique uses the highest frequency of semantically related predicate-argument structures enriched with the total subjectivity score from sentences. Evaluation and experimental results show that predicate-argument structures can indeed be used for contextual opinion retrieval as it improves performance of opinion retrieval task by 15% over the popular TREC baselines.

international conference on computer research and development | 2010

Aligning Data Records Using WordNet

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

Current automatic wrappers using DOM tree to align data records generally have limitations such as the inability to align iterative (repetitive and similar) and disjunctive (optional) data items. Our study on the properties of data records shows that these data items can be aligned based on their semantic properties. In this context, we propose an ontological technique using existing lexical database for English (WordNet) for the alignment of data records. Regular expression rules are developed to align the data items extracted so that they can be used for further processing. Experimental results indicate that our technique is robust and performs better than the existing state of the art wrappers.

soft computing and pattern recognition | 2009

DTM - Extracting Data Records from Search Engine Results Page Using Tree Matching Algorithm

Jer Lang Hong; Eu-Gene Siew; Simon Egerton

In this paper, we develop a non-visual automatic wrapper for extracting data records from search engine results page. The novel techniques for our wrapper are (1) filtering rules to detect and filter out irrelevant data records, (2) a tree matching algorithm using frequency measures to increase the speed of data extraction (3) an algorithm to calculate the number and size of the components of data records to detect the correct data region. Results show that our wrapper is as robust and in many cases outperforms the state of the art wrappers such as ViNT and DEPTA. This wrapper could have significant speed advantages when processing large volumes of web sites data, which could be helpful in meta search engine development.

Journal of Evaluation in Clinical Practice | 2009

Identifying patterns in primary care consultations: a cluster analysis

Joachim P. Sturmberg; Eu-Gene Siew; Leonid Churilov; Kate Smith-Miles

BACKGROUND A literature review revealed that little is known about the systems context of general practice consultations and their outcomes. OBJECTIVES To describe the systems context and resulting underlying patterns of primary care consultations in a local area. DESIGN Cross-sectional multi-practice study based on a three-part questionnaire. Cluster analysis of data. SETTING Stratified random sample of general practices and general practitioners--NSW-Central Coast, Australia. PARTICIPANTS A total of 1104 adults attending 12 general practitioners between February and November 1999. RESULTS AND CONCLUSIONS The study identified seven subgroups within the study population uniquely defined by variables from the health system, individual doctor and patient, consultation and consultation outcomes domains. A systems approach provides a framework in which to track and consider the important variables and their known and/or expected workings and thus offer a contextual framework to guide primary care reform.

Explore More