David W. Embley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David W. Embley is active.

Explore More

Publication

Featured researches published by David W. Embley.

data and knowledge engineering | 1999

Conceptual-model-based data extraction from multiple-record Web pages

David W. Embley; Douglas M. Campbell; Y. S. Jiang; Stephen W. Liddle; Deryle Lonsdale; Yiu-Kai Ng; Randy Smith

Abstract Electronically available data on the Web is exploding at an ever increasing pace. Much of this data is unstructured, which makes searching hard and traditional database querying impossible. Many Web documents, however, contain an abundance of recognizable constants that together describe the essence of a documents content. For these kinds of data-rich, multiple-record documents (e.g., advertisements, movie reviews, weather reports, travel information, sports summaries, financial statements, obituaries, and many others) we can apply a conceptual-modeling approach to extract and structure data automatically. The approach is based on an ontology – a conceptual model instance – that describes the data of interest, including relationships, lexical appearance, and context keywords. By parsing the ontology, we can automatically produce a database scheme and recognizers for constants and keywords, and then invoke routines to recognize and extract data from unstructured documents and structure it according to the generated database scheme. Experiments show that it is possible to achieve good recall and precision ratios for documents that are rich in recognizable constants and narrow in ontological breadth. Our approach is less labor-intensive than other approaches that manually or semiautomatically generate wrappers, and it is generally insensitive to changes in Web-page format.

international conference on management of data | 1999

Record-boundary discovery in Web documents

David W. Embley; Y. S. Jiang; Yiu-Kai Ng

Extraction of information from unstructured or semistructured Web documents often requires a recognition and delimitation of records. (By “record” we mean a group of information relevant to some entity.) Without first chunking documents that contain multiple records according to record boundaries, extraction of record information will not likely succeed. In this paper we describe a heuristic approach to discovering record boundaries in Web documents. In our approach, we capture the structure of a document as a tree of nested HTML tags, locate the subtree containing the records of interest, identify candidate separator tags within the subtree using five independent heuristics, and select a consensus separator tag based on a combined heuristic. Our approach is fast (runs linearly for practical cases within the context of the larger data-extraction problem) and accurate (100% in the experiments we conducted).

conference on information and knowledge management | 1998

Ontology-based extraction and structuring of information from data-rich unstructured documents

David W. Embley; Douglas M. Campbell; Randy Smith; Stephen W. Liddle

We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontology, we formulate rules to extract constants and context keywords from unstructured documents. For each unstructured document of interest, we extract its constants and keywords and apply a recognizer to organize extracted constants as attribute values of tuples in a generated database schema. To make our approach general, we fix all the processes and change only the ontological description for a different application domain. In experiments we conducted on two different types of unstructured documents taken from the Web, our approach attained recall ratios in the 80% and 90% range and precision ratios near 98%.

IEEE Computer | 1990

A graphical data manipulation language for an extended entity-relationship model

Bogdan D. Czejdo; Ramez Elmasri; Marek Rusinkiewicz; David W. Embley

A user can formulate database queries and updates graphically, by manipulating schema diagrams. The authors based the graphical data manipulation interface on the entity-relationship (ER) model because of its widespread use and increasing popularity. They use an extended ER model incorporating various forms of generalization and specialization, including subset, union and partition relationships. They call their model the extended conceptual entity-relationship or ECER model. A comparison with other graphical entity-relationship interfaces is included.<<ETX>>

international world wide web conferences | 2005

Towards Ontology Generation from Tables

Yuri A. Tijerino; David W. Embley; Deryle Lonsdale; Yihong Ding; George Nagy

At the heart of todays information-explosion problems are issues involving semantics, mutual understanding, concept matching, and interoperability. Ontologies and the Semantic Web are offered as a potential solution, but creating ontologies for real-world knowledge is nontrivial. If we could automate the process, we could significantly improve our chances of making the Semantic Web a reality. While understanding natural language is difficult, tables and other structured information make it easier to interpret new items and relations. In this paper we introduce an approach to generating ontologies based on table analysis. We thus call our approach TANGO (Table ANalysis for Generating Ontologies). Based on conceptual modeling extraction techniques, TANGO attempts to (i) understand a tables structure and conceptual content; (ii) discover the constraints that hold between concepts extracted from the table; (iii) match the recognized concepts with ones from a more general specification of related concepts; and (iv) merge the resulting structure with other similar knowledge representations. TANGO is thus a formalized method of processing the format and content of tables that can serve to incrementally build a relevant reusable conceptual ontology.

Archive | 2006

Conceptual Modeling - ER 2006

David W. Embley; Antoni Olivé; Sudha Ram

Keynote Papers.- Suggested Research Directions for a New Frontier - Active Conceptual Modeling.- From Conceptual Modeling to Requirements Engineering.- Web Services.- A Context Model for Semantic Mediation in Web Services Composition.- Modeling Service Compatibility with Pi-calculus for Choreography.- The DeltaGrid Abstract Execution Model: Service Composition and Process Interference Handling.- Quality in Conceptual Modeling.- Evaluating Quality of Conceptual Models Based on User Perceptions.- Representation Theory Versus Workflow Patterns - The Case of BPMN.- Use Case Modeling and Refinement: A Quality-Based Approach.- Aspects of Conceptual Modeling.- Ontology with Likeliness and Typicality of Objects in Concepts.- In Defense of a Trope-Based Ontology for Conceptual Modeling: An Example with the Foundations of Attributes, Weak Entities and Datatypes.- Explicitly Representing Superimposed Information in a Conceptual Model.- Modeling Advanced Applications.- Preference Functional Dependencies for Managing Choices.- Modeling Visibility in Hierarchical Systems.- A Model for Anticipatory Event Detection.- XML.- A Framework for Integrating XML Transformations.- Oxone: A Scalable Solution for Detecting Superior Quality Deltas on Ordered Large XML Documents.- Schema-Mediated Exchange of Temporal XML Data.- A Quantitative Summary of XML Structures.- Semantic Web.- Database to Semantic Web Mapping Using RDF Query Languages.- Representing Transitive Propagation in OWL.- On Generating Content and Structural Annotated Websites Using Conceptual Modeling.- Requirements Modeling.- A More Expressive Softgoal Conceptualization for Quality Requirements Analysis.- Conceptualizing the Co-evolution of Organizations and Information Systems: An Agent-Oriented Perspective.- Towards a Theory of Genericity Based on Government and Binding.- Aspects of Interoperability.- Concept Modeling by the Masses: Folksonomy Structure and Interoperability.- Method Chunks for Interoperability.- Domain Analysis for Supporting Commercial Off-the-Shelf Components Selection.- Metadata Management.- A Formal Framework for Reasoning on Metadata Based on CWM.- A Set of QVT Relations to Assure the Correctness of Data Warehouses by Using Multidimensional Normal Forms.- Design and Use of ER Repositories: Methodologies and Experiences in eGovernment Initiatives.- Human-Computer Interaction.- Notes for the Conceptual Design of Interfaces.- The User Interface Is the Conceptual Model.- Towards a Holistic Conceptual Modelling-Based Software Development Process.- Business Modeling.- A Multi-perspective Framework for Organizational Patterns.- Deriving Concepts for Modeling Business Actions.- Towards a Reference Ontology for Business Models.- Reasoning.- Reasoning on UML Class Diagrams with OCL Constraints.- On the Use of Association Redefinition in UML Class Diagrams.- Optimising Abstract Object-Oriented Database Schemas.- Panels.- Experimental Research on Conceptual Modeling: What Should We Be Doing and Why?.- Eliciting Data Semantics Via Top-Down and Bottom-Up Approaches: Challenges and Opportunities.- Industrial Track.- The ADO.NET Entity Framework: Making the Conceptual Level Real.- XMeta Repository and Services.- IBM Industry Models: Experience, Management and Challenges.- Community Semantics for Ultra-Scale Information Management.- Managing Data in High Throughput Laboratories: An Experience Report from Proteomics.- Policy Models for Data Sharing.- Demos and Posters.- Protocol Analysis for Exploring the Role of Application Domain in Conceptual Schema Understanding.- Auto-completion of Underspecified SQL Queries.- iQL: A Query Language for the Instance-Based Data Model.- Designing Under the Influence of Speech Acts: A Strategy for Composing Enterprise Integration Solutions.- Geometry of Concepts.

Archive | 1999

Advances in Conceptual Modeling

Peter P. Chen; David W. Embley; Jacques Kouloumdjian; Stephen W. Liddle; John F. Roddick

Traditionally product data and their evolving definitions, have been handled separately from process data and their evolving definitions. There is little or no overlap between these two views of systems even though product and process data arc inextricably linked over the complete software lifecycle from design to production. The integration of product and process models in an unified data model provides the means by which data could be shared across an enterprise throughout the lifecycle, even while that data continues to evolve. In integrating these domains, an object oriented approach to data modelling has been adopted by the CRISTAL (Cooperating Repositories and an Information System for Tracking Assembly Lifecycles) project. The model that has been developed is description-driven in nature in that it captures multiple layers of product and process definitions and it provides object persistence, flexibility, reusability, schema evolution and versioning of data elements. This paper describes the model that has been developed in CRISTAL and how descriptive meta-objects in that model have their persistence handled. It concludes that adopting a description-driven approach to modelling, aligned with a use of suitable object persistence, can lead to an integration of product and process models which is sufficiently flexible to cope with evolving data definitions. Ke)fwords: Description-Driven systems. Modelling change, schema evolution, versioning

International Journal on Document Analysis and Recognition | 2006

Table-processing paradigms: a research survey

David W. Embley; Matthew Hurst; Daniel P. Lopresti; George Nagy

Tables are a ubiquitous form of communication. While everyone seems to know what a table is, a precise, analytical definition of “tabularity” remains elusive because some bureaucratic forms, multicolumn text layouts, and schematic drawings share many characteristics of tables. There are significant differences between typeset tables, electronic files designed for display of tables, and tables in symbolic form intended for information retrieval. Most past research has addressed the extraction of low-level geometric information from raster images of tables scanned from printed documents, although there is growing interest in the processing of tables in electronic form as well. Recent research on table composition and table analysis has improved our understanding of the distinction between the logical and physical structures of tables, and has led to improved formalisms for modeling tables. This review, which is structured in terms of generalized paradigms for table processing, indicates that progress on half-a-dozen specific research issues would open the door to using existing paper and electronic tables for database update, tabular browsing, structured information retrieval through graphical and audio interfaces, multimedia table editing, and platform-independent display.

international conference on data engineering | 1987

An approach to schema integration and query formulation in federated database systems

Bogdan D. Czejdo; Marek Rusinkiewicz; David W. Embley

A language is introduced in which both schema integration and query formulation for federated database systems can be performed. The relational model is augmented with connectors that impose predicate conditions over attributes of relations either at the same or different sites. The relational model with connectors has a natural diagrammatic representation, and thus, the language has a graphical user interface. The theoretical foundation for the graphical language is algebraic. Bach algebraic operator maps a diagram (of relations with connectors) into another diagram, and every diagram represents a possible user query. Operators specifically designed to operate on incompatible schemas are introduced and formally defined. Possible domain incompatibilities are resolved by using extended abstract data types.

international conference on conceptual modeling | 2002

Extracting Data behind Web Forms

Stephen W. Liddle; David W. Embley; Del T. Scott; Sai Ho Yau

A significant and ever-increasing amount of data is accessible only by filling out HTML forms to query an underlying Web data source. While this is most welcome from a user perspective (queries are relatively easy and precise) and from a data management perspective (static pages need not be maintained and databases can be accessed directly), automated agents must face the challenge of obtaining the data behind forms. In principle an agent can obtain all the data behind a form by multiple submissions of the form filled out in all possible ways, but efficiency concerns lead us to consider alternatives. We investigate these alternatives and show that we can estimate the amount of remaining data (if any) after a small number of submissions and that we can heuristically select a reasonably minimal number of submissions to maximize the coverage of the data. Experimental results show that these statistical predictions are appropriate and useful.

Explore More