Eduard C. Dragut
Temple University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eduard C. Dragut.
cooperative information systems | 2004
Eduard C. Dragut; Ramon Lawrence
Large-scale database integration requires a significant cost in developing a global schema and finding mappings between the global and local schemas. Developing the global schema requires matching and merging the concepts in the data sources and is a bottleneck in the process. In this paper we propose a strategy for computing the mapping between schemas by performing a composition of the mappings between individual schemas and a reference ontology. Our premise is that many organizations have standard ontologies that, although they may not be suitable as a global schema, are useful in providing standard terminology and naming conventions for concepts and relationships. It is valuable to leverage these existing ontological resources to help automate the construction of a global schema and mappings between schemas. Our system semi-automates the matching between local schemas and a reference ontology then automatically composes the matchings to build mappings between schemas. Using these mappings, we use model management techniques to compute a global schema. A major advantage of this approach is that human intervention in validating matchings mostly occurs during the matching between schema and ontology. A problem is that matching schemas to ontologies is challenging because the ontology may only contain a subset of the concepts in the schema or may be more general than the schema. Further, the more complicated ontological graph structure limits the effectiveness of some matchers. Our contribution is showing how schema-to-ontology matchings can be used to compose mappings between schemas with high accuracy by adapting the COMA schema matching system to work with ontologies.
conference on information and knowledge management | 2010
Eduard C. Dragut; Clement T. Yu; A. Prasad Sistla; Weiyi Meng
The Web has plenty of reviews, comments and reports about products, services, government policies, institutions, etc. The opinions expressed in these reviews influence how people regard these entities. For example, a product with consistently good reviews is likely to sell well, while a product with numerous bad reviews is likely to sell poorly. Our aim is to build a sentimental word dictionary, which is larger than existing sentimental word dictionaries and has high accuracy. We introduce rules for deduction, which take words with known polarities as input and produce synsets (a set of synonyms with a definition) with polarities. The synsets with deduced polarities can then be used to further deduce the polarities of other words. Experimental results show that for a given sentimental word dictionary with D words, approximately an additional 50% of D words with polarities can be deduced. An experiment is conducted to find the accuracy of a random sample of the deduced words. It is found that the accuracy is about the same as that of comparing the judgment of one human with that of another.
international conference on data engineering | 2006
Eduard C. Dragut; Wensheng Wu; A. Prasad Sistla; Clement T. Yu; Weiyi Meng
Recently, there are many e-commerce search engines that return information from Web databases. Unlike text search engines, these e-commerce search engines have more complicated user interfaces. Our aim is to construct automatically a natural query user interface that integrates a set of interfaces over a given domain of interest. For example, each airline company has a query interface for ticket reservation and our system can construct an integrated interface for all these companies. This will permit users to access information uniformly from multiple sources. Each query interface from an e-commerce search engine is designed so as to facilitate users to provide necessary information. Specifically, (1) related pieces of information such as first name and last name are grouped together and (2) certain hierarchical relationships are maintained. In this paper, we provide an algorithm to compute an integrated interface from query interfaces of the same domain. The integrated query interface can be proved to preserve the above two types of relationships. Experiments on five domains verify our theoretical study.
very large data bases | 2010
Thomas Kabisch; Eduard C. Dragut; Clement T. Yu; Ulf Leser
In this paper, we present VisQI (VISual Query interface Integration system), a Deep Web integration system. VisQI is capable of (1) transforming Web query interfaces into hierarchically structured representations, (2) of classifying them into application domains and (3) of matching the elements of different interfaces. Thus VisQI contains solutions for the major challenges in building Deep Web integration systems. The system comes along with a full-fledged evaluation system that automatically compares generated data structures against a gold standard. VisQI has a framework-like architecture such that other developers can reuse its components easily.
Synthesis Lectures on Data Management | 2012
Eduard C. Dragut; Weiyi Meng; Clement T. Yu
There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art techniques for extracting, understanding, and integrating the query interfaces of deep Web data sources. These techniques are critical for producing an integrated query interface for each domain. The interface serves as the mediator for searching all data sources in the concerned domain. While query interface integration is only relevant for the deep Web integration approach, the extraction and understanding of query interfaces are critical for both deep Web exploration approaches. This book aims to provide in-depth and comprehensive coverage of the key technologies needed to create high quality integrated query interfaces automatically. The following technical issues are discussed in detail in this book: query interface modeling, query interface extraction, query interface clustering, query interface matching, query interface attribute integration, and query interface integration. Table of Contents: Introduction / Query Interface Representation and Extraction / Query Interface Clustering and Categorization / Query Interface Matching / Query Interface Attribute Integration / Query Interface Integration / Summary and Future Research
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014) | 2014
Eduard C. Dragut; Christiane Fellbaum
Sentiment Analysis, an important area of Natural Language Understanding, often relieson the assumption thatlexemescarry inherent sentiment values, as reflected in specialized resources. We examine and measure the contribution that eight intensifying adverbs make to the sentiment value of sentences, as judged by human annotators. Our results show, first, that the intensifying adverbs are not themselves sentiment-laden but strengthen the sentiment conveyed by words in their contexts to different degrees. We consider the consequences for appropriate modifications of the representation of the adverbs in sentiment lexicons.
international acm sigir conference on research and development in information retrieval | 2013
Lei Cen; Eduard C. Dragut; Luo Si; Mourad Ouzzani
Entity disambiguation is an important step in many information retrieval applications. This paper proposes new research for entity disambiguation with the focus of name disambiguation in digital libraries. In particular, pairwise similarity is first learned for publications that share the same author name string (ANS) and then a novel Hierarchical Agglomerative Clustering approach with Adaptive Stopping Criterion (HACASC) is proposed to adaptively cluster a set of publications that share a same ANS to individual clusters of publications with different author identities. The HACASC approach utilizes a mixture of kernel ridge regressions to intelligently determine the threshold in clustering. This obtains more appropriate clustering granularity than non-adaptive stopping criterion. We conduct a large scale empirical study with a dataset of more than 2 million publication record pairs to demonstrate the advantage of the proposed HACASC approach.
Computers & Geosciences | 2006
Anton Kruger; Ramon Lawrence; Eduard C. Dragut
The management and processing of terabyte-scale radar data sets is time-consuming, costly, and an impediment to research. Researchers require rapid and transparent access to the data without being encumbered with the technical challenges of data management. In this paper, we describe a database architecture that manages over 12TB (and growing) of Archive Level II data that is produced by the United States National Weather Services network of WSR-88D weather radars. The contribution of this work is an automatic system for archiving and analyzing radar data that isolates geoscientists from the complexities of data storage and retrieval. Data access transparency is achieved by using a relational database to store metadata on the raw data, which enables simple SQL queries to retrieve data subsets of interest. The second component is a distributed web platform that cost-effectively distributes data across web servers for access using the ubiquitous HTTP protocol. This work demonstrates how massive data sets can be effectively queried and managed.
international conference on data engineering | 2015
El Kindi Rezig; Eduard C. Dragut; Mourad Ouzzani; Ahmed K. Elmagarmid
Data-intensive Web applications usually require integrating data from Web sources at query time. The sources may refer to the same real-world entity in different ways and some may even provide outdated or erroneous data. An important task is to recognize and merge the records that refer to the same real world entity at query time. Most existing duplicate detection and fusion techniques work in the off-line setting and do not meet the online constraint. There are at least two aspects that differentiate online duplicate detection and fusion from its off-line counterpart. (i) The latter assumes that the entire data is available, while the former cannot make such an assumption. (ii) Several query submissions may be required to compute the “ideal” representation of an entity in the online setting. This paper presents a general framework for the online setting based on an iterative record-based caching technique. A set of frequently requested records is deduplicated off-line and cached for future reference. Newly arriving records in response to a query are deduplicated jointly with the records in the cache, presented to the user and appended to the cache. Experiments with real and synthetic data show the benefit of our solution over traditional record linkage techniques applied to an online setting.
IEEE Transactions on Knowledge and Data Engineering | 2015
Eduard C. Dragut; Hong Wang; A. Prasad Sistla; Clement T. Yu; Weiyi Meng
Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries have been manually or (semi)automatically constructed. We notice that these sentiment dictionaries have numerous inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases of polarity inconsistency, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize two fast SAT solvers to detect inconsistencies in a sentiment dictionary. We perform experiments on five sentiment dictionaries and WordNet to show interand intra-dictionaries inconsistencies.