Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Fatih Gelgi is active.

Publication


Featured researches published by Fatih Gelgi.


international world wide web conferences | 2007

Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge

Srinivas Vadrevu; Fatih Gelgi; Hasan Davulcu

World Wide Web is transforming itself into the largest information resource making the process of information extraction (IE) from Web an important and challenging problem. In this paper, we present an automated IE system that is domain independent and that can automatically transform a given Web page into a semi-structured hierarchical document using presentation regularities. The resulting documents are weakly annotated in the sense that they might contain many incorrect annotations and missing labels. We also describe how to improve the quality of weakly annotated data by using domain knowledge in terms of a statistical domain model. We demonstrate that such system can recover from ambiguities in the presentation and boost the overall accuracy of a base information extractor by up to 20%. Our experimental evaluations with TAP data, computer science department Web sites, and RoadRunner document sets indicate that our algorithms can scale up to very large data sets.


International Journal of Web Services Research | 2007

Automated Situation-Aware Service Composition in Service-Oriented Computing

Stephen S. Yau; Hasan Davulcu; Supratik Mukhopadhyay; Dazhi Huang; Haishan Gong; Prabhdeep Singh; Fatih Gelgi

Service-based systems have many applications, such as e-business, health care, and homeland security. In these systems, it is necessary to provide users the capability of composing services into workflows providing higher-level functionality. In dynamic service-oriented computing environments, it is desirable that service composition is automated and situation-aware to generate robust and adaptive workflows. In this paper, an automated situation-aware service composition approach is presented. This approach is based on the a-logic, a-calculus, and a declarative model for situation awareness (SAW). This approach consists of four major components: (1) analyzing SAW requirements using our SAW model, (2) translating our SAW model representation to a-logic specifications and specifying a control flow graph in a-logic as the service composition goal, (3) automated synthesis of a-calculus terms defining situation-aware workflow agents based on a-logic specifications for SAW requirements and the control flow graph, and (4) compilation of a-calculus terms to executable components.


web intelligence | 2005

Automated Metadata and Instance Extraction from News Web Sites

Srinivas Vadrevu; Saravanakumar Nagarajan; Fatih Gelgi; Hasan Davulcu

Over the past few years World Wide Web has established as a vital resource for news. With the continuous growth in the number of available news Web sites and the diversity in their presentation of content, there is an increasing need to organize the news related information on the Web and keep track of it. In this paper, we present automated techniques for extracting metadata instance information by organizing and mining a set of news Web sites. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. The tree-mining algorithms that we present identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We report experimental evaluation for the news domain to demonstrate the efficacy of our algorithms.


web information systems engineering | 2005

Semantic partitioning of web pages

Srinivas Vadrevu; Fatih Gelgi; Hasan Davulcu

In this paper we describe the semantic partitioner algorithm, that uses the structural and presentation regularities of the Web pages to automatically transform them into hierarchical content structures. These content structures enable us to automatically annotate labels in the Web pages with their semantic roles, thus yielding meta-data and instance information for the Web pages. Experimental results with the TAP knowledge base and computer science department Web sites, comprising 16,861 Web pages indicate that our algorithm is able gather meta-data accurately from various types of Web pages. The algorithm is able to achieve this performance without any domain specific engineering requirement.


web intelligence | 2007

Baum-Welch Style EM Approach on Simple Bayesian Models forWeb Data Annotation

Fatih Gelgi; Hasan Davulcu

In this paper, our focus will be on weakly annotated data (WAD) which is typically generated by a (semi) automated information extraction system from the Web documents. The extracted information has a certain level of accuracy which can be surpassed by using statistical models that are capable of contextual reasoning such as Bayesian models. Our contribution is an EM algorithm that operates on simple Bayesian models to re-annotate WAD. EM estimates the parameters, i.e., the prior and conditional probabilities by iterating Bayesian model on the given Web data. In the expectation step, Bayesian classifier is trained from current annotations, and in the maximization step, the roles of all the labels are re-annotated to find the best fitting annotation with the current model then the probabilities are re-adjusted from the new annotations. Our experiments show that EM increases the Web data annotation accuracies up to 8%. We use Baum-Welch methodology in our EM approach.


international symposium on computer and information sciences | 2006

Heuristics for minimum brauer chain problem

Fatih Gelgi; Melih Onus

The exponentiation problem is computing xn for positive integer exponents n where the quality is measured by number of multiplications it requires. However, finding minimum number of multiplications is an NP-complete problem. This problem is very important for many applications such as RSA encryption and ElGamal decryption. Solving minimum Brauer chain problem is a way to solve the exponentiation problem. In this paper, five heuristics for approximating minimum length Brauer chain for a given number n is discussed. These heuristics are based on some greedy approaches and dynamic programming. As a result, we empirically get 1.1-approximation for the problem.


ieee international conference on services computing | 2008

A Risk Reduction Framework for Dynamic Workflows

Prabhdeep Singh; Fatih Gelgi; Hasan Davulcu; Stephen S. Yau; Supratik Mukhopadhyay

Workflows tend to fail in real-world scenarios due to the uncertain/unreliable sensory information which sometimes needs to be updated during the execution of workflows. In a logic based framework, these dynamic predicates that can be updated are called non-monotonic predicates (NMPs). In this paper, we focus on reducing the risk of a given workflow due to the NMPs in that workflow. The main idea is to synthesize a backup workflow by augmenting the main workflow without introducing new NMPs. The backup workflow is generated by using expected values of NMPs if necessary instead of given values. The expected values are calculated from the execution history or provided by a domain expert. It is argued that total risk reduces to the square root of the main workflow itself.


international conference on web engineering | 2007

Fixing weakly annotated web data using relational models

Fatih Gelgi; Srinivas Vadrevu; Hasan Davulcu

In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data - which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from two major problems: they (i) might contain incorrect ontological role assignments, and (ii) might have many missing attributes. Our experimental evaluations with the TAP and RoadRunner data sets, and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites. The Bayesian model is also shown to be useful for improving the performance of IE systems by informing them with additional domain information.


Online Information Review | 2006

Gathering meta‐data and instances from object referral lists on the web

Srinivas Vadrevu; Fatih Gelgi; Saravanakumar Nagarajan; Hasan Davulcu

Purpose – The purpose of this research is to automatically separate and extract meta‐data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design/methodology/approach – Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta‐data and their individual data instances.Findings – Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta‐data and instance extraction averages 85, 88 percent F‐measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality/valu...


atlantic web intelligence conference | 2007

Relational Model Based Annotation of the Web Data

Fatih Gelgi; Srinivas Vadrevu; Hasan Davulcu

In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data – which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from incorrect ontological role assignments. Our experimental evaluations with the TAP and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites.

Collaboration


Dive into the Fatih Gelgi's collaboration.

Top Co-Authors

Avatar

Hasan Davulcu

Arizona State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stephen S. Yau

Arizona State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dazhi Huang

Arizona State University

View shared research outputs
Top Co-Authors

Avatar

Haishan Gong

Arizona State University

View shared research outputs
Top Co-Authors

Avatar

Melih Onus

Arizona State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge