Is this you? Create Your Porfile

Tomer Sagi

Technion – Israel Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tomer Sagi is active.

Explore More

Publication

Featured researches published by Tomer Sagi.

Information Systems | 2010

Tuning the ensemble selection process of schema matchers

Avigdor Gal; Tomer Sagi

Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema integration and its outcome serves in many tasks such as targeted content delivery and view integration. Schema matching research has been going on for more than 25 years now. An interesting research topic, that was largely left untouched involves the automatic selection of schema matchers to an ensemble, a set of schema matchers. To the best of our knowledge, none of the existing algorithmic solutions offer such a selection feature. In this paper we provide a thorough investigation of this research topic. We introduce a new heuristic, Schema Matcher Boosting (SMB). We show that SMB has the ability to choose among schema matchers and to tune their importance. As such, SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher, a designer can instead focus on finding better than random schema matchers. For the effective utilization of SMB, we propose a complementary approach to the design of new schema matchers. We separate schema matchers into first-line and second-line matchers. First-line schema matchers were designed by-and-large as applications of existing works in other areas (e.g., machine learning and information retrieval) to schemata. Second-line schema matchers operate on the outcome of other schema matchers to improve their original outcome. SMB selects matcher pairs, where each pair contains a first-line matcher and a second-line matcher. We run a thorough set of experiments to analyze SMB ability to effectively choose schema matchers and show that SMB performs better than other, state-of-the-art ensemble matchers.

business process management | 2013

Predicting the quality of process model matching

Matthias Weidlich; Tomer Sagi; Henrik Leopold; Avigdor Gal; Jan Mendling

Process model matching refers to the task of creating correspondences among activities of different process models. This task is crucial whenever comparison and alignment of process models are called for. In recent years, there have been a few attempts to tackle process model matching. Yet, evaluating the obtained sets of correspondences reveals high variability in the results. Addressing this issue, we propose a method for predicting the quality of results derived by process model matchers. As such, prediction serves as a case-by-case decision making tool in estimating the amount of trust one should put into automatic matching. This paper proposes a model of prediction for process matching based on both process properties and preliminary match results.

Proceedings of the Ninth International Workshop on Information Integration on the Web | 2012

Making sense of top-k matchings: a unified match graph for schema matching

Avigdor Gal; Tomer Sagi; Matthias Weidlich; Eliezer Levy; Victor Shafran; Zoltán Miklós; Nguyen Quoc Viet Hung

Schema matching in uncertain environments faces several challenges, among them the identification of complex correspondences. In this paper, we present a method to address this challenge based on top-k matchings, i.e., a set of matchings comprising only 1: 1 correspondences derived by common matchers. We propose the unified top-k match graph and define a clustering problem for it. The obtained attribute clusters are analysed to derive complex correspondences. Our experimental evaluation shows that our approach is able to identify a significant share of complex correspondences.

very large data bases | 2013

Schema matching prediction with applications to data source discovery and dynamic ensembling

Tomer Sagi; Avigdor Gal

Web-scale data integration involves fully automated efforts which lack knowledge of the exact match between data descriptions. In this paper, we introduce schema matching prediction, an assessment mechanism to support schema matchers in the absence of an exact match. Given attribute pair-wise similarity measures, a predictor predicts the success of a matcher in identifying correct correspondences. We present a comprehensive framework in which predictors can be defined, designed, and evaluated. We formally define schema matching evaluation and schema matching prediction using similarity spaces and discuss a set of four desirable properties of predictors, namely correlation, robustness, tunability, and generalization. We present a method for constructing predictors, supporting generalization, and introduce prediction models as means of tuning prediction toward various quality measures. We define the empirical properties of correlation and robustness and provide concrete measures for their evaluation. We illustrate the usefulness of schema matching prediction by presenting three use cases: We propose a method for ranking the relevance of deep Web sources with respect to given user needs. We show how predictors can assist in the design of schema matching systems. Finally, we show how prediction can support dynamic weight setting of matchers in an ensemble, thus improving upon current state-of-the-art weight setting methods. An extensive empirical evaluation shows the usefulness of predictors in these use cases and demonstrates the usefulness of prediction models in increasing the performance of schema matching.

cooperative information systems | 2013

Completeness and Ambiguity of Schema Cover

Avigdor Gal; Michael Katz; Tomer Sagi; Matthias Weidlich; Karl Aberer; Hung Quoc Viet Nguyen; Zoltán Miklós; Eliezer Levy; Victor Shafran

Given a schema and a set of concepts, representative of entities in the domain of discourse, schema cover defines correspondences between concepts and parts of the schema. Schema cover aims at interpreting the schema in terms of concepts and thus, vastly simplifying the task of schema integration. In this work we investigate two properties of schema cover, namely completeness and ambiguity. The former measures the part of a schema that can be covered by a set of concepts and the latter examines the amount of overlap between concepts in a cover. To study the tradeoffs between completeness and ambiguity we define a cover model to which previous frameworks are special cases. We analyze the theoretical complexity of variations of the cover problem, some aim at maximizing completeness while others aim at minimizing ambiguity. We show that variants of the schema cover problem are hard problems in general and formulate an exhaustive search solution using integer linear programming. We then provide a thorough empirical analysis, using both real-world and simulated data sets, showing empirically that the integer linear programming solution scales well for large schemata. We also show that some instantiations of the general schema cover problem are more effective than others.

international conference on conceptual modeling | 2012

Non-binary evaluation for schema matching

Tomer Sagi; Avigdor Gal

In this work we extend the commonly used binary evaluation of schema matching to support evaluation methods for non-binary matching results as well. We motivate our work with some new applications of schema matching. Non-binary evaluation is formally defined together with two new, non-binary evaluation measures using a vector-space representation of schema matching outcome. We provide an empirical evaluation to support the usefulness of non-binary evaluation and show its superiority to its binary counterpart.

international world wide web conferences | 2016

From Diversity-based Prediction to Better Ontology & Schema Matching

Avigdor Gal; Haggai Roitman; Tomer Sagi

Ontology & schema matching predictors assess the quality of matchers in the absence of an exact match. We propose MCD (Match Competitor Deviation), a new diversity-based predictor that compares the strength of a matcher confidence in the correspondence of a concept pair with respect to other correspondences that involve either concept. We also propose to use MCD as a regulator to optimally control a balance between Precision and Recall and use it towards 1:1 matching by combining it with a similarity measure that is based on solving a maximum weight bipartite graph matching (MWBM). Optimizing the combined measure is known to be an NP-Hard problem. Therefore, we propose CEM, an approximation to an optimal match by efficiently scanning multiple possible matches, using rare event estimation. Using a thorough empirical study over several benchmark real-world datasets, we show that MCD outperforms other state-of-the-art predictor and that CEM significantly outperform existing matchers.

international conference on data engineering | 2014

In schema matching, even experts are human: Towards expert sourcing in schema matching

Tomer Sagi; Avigdor Gal

Schema matching problems have been historically defined as a semi-automated task in which correspondences are generated by matching algorithms and subsequently validated by a single human expert. Emerging alternative models are based upon piecemeal human validation of algorithmic results and the usage of crowd based validation. We propose an alternative model in which human and algorithmic matchers are given more symmetric roles. Under this model, better insight into the respective strengths and weaknesses of human and algorithmic matchers is required. We present initial insights from a pilot study conducted and outline future work in this area.

Information Systems | 2017

Multi-source uncertain entity resolution

Tomer Sagi; Avigdor Gal; Omer Barkol; Ruth Bergman; Alexander Avram

In this work we present a multi-source uncertain entity resolution model and show its implementation in a use case of Yad Vashem, the central repository of Holocaust-era information. The Yad Vashem dataset is unique with respect to classic entity resolution, by virtue of being both massively multi-source and by requiring multi-level entity resolution. With todays abundance of information sources, this project motivates the use of multi-source resolution on a big-data scale. We instantiate the proposed model using the MFIBlocks entity resolution algorithm and a machine learning approach, based upon decision trees to transform soft clusters into ranked clustering of records, representing possible entities. An extensive empirical evaluation demonstrates the unique properties of this dataset that make it a good candidate for multi-source entity resolution. We conclude with proposing avenues for future research in this realm. HighlightsUncertain Entity Resolution allows creating multiple narratives from complementary sources of data.The approach was demonstrated during a unique project performed on the Yad Vashem Names database.Algorithms implementing the approach were empirically evaluated on a tagged subset on various configurations and versus equivalent algorithms.The accurate and insightful results are being integrated into Yad Vashem systems and user applications.

ieee international conference on services computing | 2014

Measuring Expected Integration Effort in Service Composition

Tomer Sagi; Avigdor Gal; Matthias Weidlich

Evaluating alternative solutions for service compositions is done by various properties, each requiring an associated evaluation measure. In this paper, we propose a new measure, namely integration effort, to capture the expected effort a human programmer is expected to invest in integrating composed services into a functioning process. We present several integration effort evaluation measures, which were adapted from the related research areas of schema and ontology matching. These measures are embedded in an extendible framework, allowing application in different levels of refinement. Our measures are empirically validated to be effective proxies of integration effort.

Explore More