Guizhen Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guizhen Yang is active.

Explore More

Publication

Featured researches published by Guizhen Yang.

cooperative information systems | 2003

FLORA-2: A rule-based knowledge representation and inference infrastructure for the Semantic Web

Guizhen Yang; Michael Kifer; Chang Zhao

\(\mathcal{F}\) lora-2 is a rule-based object-oriented knowledge base system designed for a variety of automated tasks on the Semantic Web, ranging from meta-data management to information integration to intelligent agents. The \(\mathcal{F}\) lora-2 system integrates F-logic, HiLog, and Transaction Logic into a coherent knowledge representation and inference language. The result is a flexible and natural framework that combines rule-based and object-oriented paradigms. This paper discusses the principles underlying the design of the \(\mathcal{F}\) lora-2 system and describes its salient features, including meta-programming, reification, logical database updates, encapsulation, and support for dynamic modules.

international semantic web conference | 2003

Automatic annotation of content-rich HTML documents: structural and semantic analysis

Saikat Mukherjee; Guizhen Yang; I. V. Ramakrishnan

Although RDF/XML has been widely recognized as the standard vehicle for representing semantic information on the Web, an enormous amount of semantic data is still being encoded in HTML documents that are designed primarily for human consumption and not directly amenable to machine processing. This paper seeks to bridge this semantic gap by addressing the fundamental problem of automatically annotating HTML documents with semantic labels. Exploiting a key observation that semantically related items exhibit consistency in presentation style as well as spatial locality in template-based content-rich HTML documents, we have developed a novel framework for automatically partitioning such documents into semantic structures. Our framework tightly couples structural analysis of documents with semantic analysis incorporating domain ontologies and lexical databases such as WordNet. We present experimental evidence of the effectiveness of our techniques on a large collection of HTML documents from various news portals.

Lecture Notes in Computer Science | 2000

FLORA: Implementing an Efficient DOOD System Using a Tabling Logic Engine

Guizhen Yang; Michael Kifer

This paper reports on the design and implementation of FLORA -- a powerful DOOD system that incorporates the features of F-logic, HiLog, and Transaction Logic. FLORA is implemented by translation into XSB, a tabling logic engine that is known for its efficiency and is the only known system that extends the power of Prolog with an equivalent of the Magic Sets style optimization, the well-founded semantics for negation, and many other important features. We discuss the features of XSB that help our effort as well as the areas where it falls short of what is needed. We then describe our solutions and optimization techniques that address these problems and make FLORA much more efficient than other known DOOD systems based on F-logic.

Logics for Emerging Applications of Databases | 2004

Logic-Based Approaches to Workflow Modeling and Verification

Saikat Mukherjee; Hasan Davulcu; Michael Kifer; Pinar Senkul; Guizhen Yang

A workflow is a collection of coordinated activities designed to carry out a well-defined complex process, such as trip planning, student registration, or a business process in a large enterprise. An activity in a workflow might be performed by a human, a device, or a program. Workflow management systems (or WfMS) provide a framework for capturing the interaction among the activities in a workflow and are recognized as a new paradigm for integrating disparate systems, including legacy systems. A large workflow system might involve many disparate activities that are coordinated in complex ways and are subject to many constraints. Thus, modeling such systems and ensuring that they perform according to the specifications is not an easy task. To be able to analyze the properties of workflows, the latter must be specified using a formalism with well-defined semantics. The popular formalisms in this area are the various logics, Petri Nets [1,35], Event-Condition-Action rules [23,15], and State Charts [36]. In this chapter we survey and compare a number of logic-based formalisms that were proposed in the literature.

cooperative information systems | 2002

Well-Founded Optimism: Inheritance in Frame-Based Knowledge Bases

Guizhen Yang; Michael Kifer

F-logic is a popular formalism for knowledge-intensive applications and, especially, for ontology management in Semantic Web. However, the original F-logics semantics for inheritance suffers from a number of anomalies when inheritance and deduction closely interact.This work rectifies this problem and develops a natural model-theoretic semantics for inheritance in frame-based knowledge bases, which supports inference by inheritance as well as inference via rules. Inference by inheritance supports a multitude of features, such as overriding and nonmonotonic multiple inheritance, meta programming, and dynamic inheritance hierarchies -- the features that are fundamental to advanced knowledge management. This semantics has been effectively implemented in the Flora-2 system which is extensively used in a number of projects.To the best of our knowledge, this work is the only model-theoretic semantics for nonmonotonic multiple inheritance that applies to general, unrestricted frame-based knowledge bases and has several independent characterizations, which testifies to its naturalness and robustness.The problems discussed in this paper are inherent in any logic-based system that supports inheritance and deductive rules and our techniques apply to such systems. In particular, they apply to DAML+OIL extended with rules and inheritance.

symposium on principles of database systems | 2000

Computational aspects of resilient data extraction from semistructured sources (extended abstract)

Hasan Davulcu; Guizhen Yang; Michael Kifer; I. V. Ramakrishnan

Automatic data extraction from semistructured sources such as HTML pages is rapidly growing into a problem of significant importance, spurred by the growing popularity of the so called “shopbots” that enable end users to compare prices of goods and other services at various web sites without having to manually browse and fill out forms at each one of these sites. The main problem one has to contend with when designing data extraction techniques is that the contents of a web page changes frequently, either because its data is generated dynamically, in response to filling out a form, or because of changes to its presentation format. This makes the problem of data extraction particularly challenging, since a desirable requirement of any data extraction technique is that it be “resilient”, i.e., using it we should always be able to locate the object of interest in a page (such as a form or an element in a table generated by a form fill-out) in spite of changes to the pages ntent and layout. In this paper we propose a formal computation model for developing resilient data extraction techniques from semistructured sources. Specifically we formalize the problem of data extraction as one of generating unambiguous extraction expressions, which are regular expressions with some additional structure. The problem of resilience is then formalized as one of generating a maximal extraction expression of this kind. We present characterization theorems for maximal extraction expressions, complexity results for testing them, and algorithms for synthesizing them.

rules and rule markup languages for the semantic web | 2003

Inheritance and rules in object-oriented semantic web languages

Guizhen Yang; Michael Kifer

Rule-based and object-oriented techniques are rapidly making their way into the infrastructure for representing and reasoning about semantic information on the Web. Combining these two paradigms has been an important objective and F-logic is a widely adopted formalism that achieves this goal. However, the original F-logic was lacking the notion of instance methods - one of the most common object-oriented modeling tools. Extending F-logic with instance methods poses new, nontrivial problems. It requires a different kind of nonmonotonic inheritance and impacts much of the semantics of the logic. In this paper we incorporate instance methods into F-logic and develop a complete model theory as well as a computation framework for the extended language.

cooperative information systems | 2002

On the Semantics of Anonymous Identity and Reification

Guizhen Yang; Michael Kifer

Reification and anonymous resources are two of the more interesting features of RDF -- an emerging standard for representing semantic information on the Web. Ironically, when RDF was standardized by W3C over three years ago [18], it came without a semantics. There is now growing understanding that a Semantic Web language without a semantics is an oxymoron, and a number of efforts are directed towards giving RDF a precise semantics [12,10]. In this paper we propose a simple semantics for reification and anonymous resources in F-logic [17] -- a frame-based logic language, which is a popular formalism for representing and reasoning about semantic information on the Web [22,9,11,8,7].The choice of F-logic (over RDF) as a basis for our semantics is motivated by the fact that F-logic provides a comprehensive solution for the problem of integrating frames, rules, and deduction, and it has been shown to provide an effective inference service for RDF [8,21].

Lecture Notes in Computer Science | 2000

Design and Implementation of the Physical Layer in WebBases: The XRover Experience

Hasan Davulcu; Guizhen Yang; Michael Kifer; I. V. Ramakrishnan

Webbases are database systems that enable creation of Web applications that allow end users to shop around for products and services at various Web sites without having to manually browse and fill out forms at each of these sites. In this paper we describe XRover which is an implementation of the physical layer of the webbase architecture. This layer is primarily responsible for automatically locating and extracting dynamic data from Web sites, i.e data that can only be obtained by form fill-outs. We discuss our experience in building XRover using FLORA, a deductive object-oriented system.

international conference on data mining | 2003

On precision and recall of multi-attribute data extraction from semistructured sources

Guizhen Yang; Saikat Mukherjee; I. V. Ramakrishnan

Machine learning techniques for data extraction from semistructured sources exhibit different precision and recall characteristics. However to date the formal relationship between learning algorithms and their impact on these two metrics remains unexplored. We propose a formalization of precision and recall of extraction and investigates the complexity-theoretic aspects of learning algorithms for multiattribute data extraction based on this formalism. We show that there is a tradeoff between precision/recall of extraction and computational efficiency and present experimental results to demonstrate the practical utility of these concepts in designing scalable data extraction algorithms for improving recall without compromising on precision.

Explore More