Marcin Szymczak
Polish Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marcin Szymczak.
Information Sciences | 2015
Marcin Szymczak; Sławomir Zadrożny; Antoon Bronselaer; Guy De Tré
Preserving data quality is an important issue in data collection management. One of the crucial issues hereby is the detection of duplicate objects (called coreferent objects) which describe the same entity, but in different ways. In this paper we present a method for detecting coreferent objects in metadata, in particular in XML schemas. Our approach consists in comparing the paths from a root element to a given element in the schema. Each path precisely defines the context and location of a specific element in the schema. Path matching is based on the comparison of the different steps of which paths are composed. The uncertainty about the matching of steps is expressed with possibilistic truth values and aggregated using the Sugeno integral. The discovered coreference of paths can help for establishing a mapping between two different XML schemas. In other words, a novel approach for schema matching problem based on paths comparison only is proposed.
joint ifsa world congress and nafips annual meeting | 2013
Marcin Szymczak; Sławomir Zadrożny; Guy De Tré
Preserving data quality is an important issue in data collection management. One of the crucial issues hereby is the detection of duplicate objects (called coreferent objects) which describe the same entity, but in different ways. In this paper we present a method for detecting coreferent objects in metadata, in particular in XML schemas. Our approach consists in comparing the paths from a root element to a given element in the schema. Each path precisely defines the context and location of a specific element in the schema. Path matching is based on the comparison of the different steps of which paths are composed. The uncertainty about the matching of steps is expressed with possibilistic truth values and aggregated using the Sugeno integral. The discovered coreference of paths can help for determining the coreference of different XML schemas.
Information Fusion | 2016
Antoon Bronselaer; Marcin Szymczak; Sławomir Zadrożny; Guy De Tré
Fusion functions based on order relations are formalized.It is pointed out that an appropriate order relation is not always at hand.The DOC algorithm to construct an appropriate order relation dynamically, is provided.Selection strategies are discussed.A thorough experimental evaluation shows the benefits of the proposed techniques. A crucial operation in the maintenance of data quality in relational databases is to remove tuples that mutually describe the same entity (i.e., duplicate tuples) and to replace them with a tuple that minimizes information loss. A function that combines multiple tuples into one is called a fusion function. In this paper, we investigate fusion functions for attributes of which the values can be sorted by means of an order relation that reflects a notion of generality. It is shown that providing such an order relation a priori, let alone keeping it up-to-date, is a costly operation. Therefore, the Dynamical Order Construction (DOC) algorithm is proposed that constructs an order relation in an automated fashion upon inspecting the data that need to be fused. Such order relations can be immediately deployed in a framework of selectional fusion functions, which are fusion functions that adopt the sort-and-select principle. These fusion functions are investigated closely in terms of their selection strategies. An experimental evaluation of our method shows the influence of the parameters and the benefit with respect to using a fixed and predefined taxonomy.
north american fuzzy information processing society | 2012
Marcin Szymczak; Antoon Bronselaer; Sławomir Zadrożny; Guy De Tré
Modern database systems allow to describe information from the real world in a well structured manner. Unfortunately, many databases cope with quality problems. One of these problems is the existence of coreferent data, which means that the same real world entity is described multiple times within one database. Due to errors, inaccuracies and lack of standardization, coreferent data are not bound to be equal, which makes the finding of coreferent data a challenging topic. In this paper, we contribute to the field of coreference detection by proposing an automated and dynamical method for the construction of a binary relation R that models semantical knowledge between attribute values. The advantages of the proposed method are two folded: no effort must be put in construction of knowledge bases and mismatches between the database and the knowledge base are avoided.
Advances in intelligent systems and computing | 2014
Marcin Szymczak; Antoon Bronselaer; Sławomir Zadrożny; Guy De Tré
Useful information is often scattered over multiple sources. Therefore, automatic data integration that guarantees high data quality is extremely important. One of the crucial operations in data integration from different sources is the detection of different representations of the same piece of information (called coreferent data) and translation to a common, unified representation. That translation is also known as value mapping. However, values mappings are often not explicit i.e. the specific value may be mapped to more than one value. In this paper, we investigate automatic selection method which reduces the set of one-to-many mappings to the set of one-to-one mappings for attributes whose domains are partially ordered and where the given order relation reflects a notion of generality.
Challenging Problems and Solutions in Intelligent Systems | 2016
Marcin Szymczak; Antoon Bronselaer; Sławomir Zadrożny; Guy De Tré
A novel automatic method for detecting corresponding attributes in schemas based on content data is studied. More specifically, our proposed method for the detection of coreferent attributes in schemas is based on a statistical and lexical comparison of content data and detected coreferent tuples across multiple datasets, which increase the possibility of correct schema matching. We will show that knowledge of even a small number of coreferent tuples is sufficient to establish correct matching between corresponding attributes of heterogeneous schemas. The behaviour of the novel schema matching technique has been evaluated on several real life datasets, giving a valuable insight in the influence of the different parameters of our approach on the results obtained.
ieee international conference on intelligent systems | 2015
Marcin Szymczak; Antoon Bronselaer; Sławomir Zadrożny; Guy De Tré
Nowadays the amount of data is increasing very fast. Moreover, useful information is scattered over multiple sources. Therefore, automatic data integration that guarantees high data quality is extremely important. One of the crucial operations in integration of information from independent databases is detection of different representations of the same piece of information (called coreferent data) and translation of the representation of data from one source into the representation of the other source. That translation is also known as object mapping. In this paper, we investigate automatic mapping methods for attributes the values of which may need semantical comparison and can be sorted by means of an order relation that reflects a notion of generality. These mapping methods are investigated closely in terms of their effectiveness. An experimental evaluation of our method shows that using different mapping methods can enlarge a set of true positive mappings.
Norbert Wiener in the 21st Century (21CW), 2014 IEEE Conference on | 2014
Marcin Szymczak; Antoon Bronselaer; Sławomir Zadrożny; Guy De Tré
Nowadays the amount of data is increasing very fast. Moreover, useful information is scattered over multiple sources. Therefore, automatic data integration that guarantees high data quality is extremely important. One of the crucial operations in integration of information from independent databases is detection of different representations of the same piece of information (called coreferent data) and translation of the representation of data from one source into the representation of the other source. That translation is also known as object mapping. In this paper, we investigate automatic mapping methods for attributes the values of which may need semantical comparison and can be sorted by means of an order relation that reflects a notion of generality. These mapping methods are investigated closely in terms of their effectiveness. An experimental evaluation of our method shows that using different mapping methods can enlarge a set of true positive mappings.
New developments in fuzzy sets, intuitionistic fuzzy sets, generalized nets and related topics : application | 2012
Marcin Szymczak; Julius Koepke
Challenging problems and solutions in computational intelligence | 2016
Marcin Szymczak; Antoon Bronselaer; Sławomir Zadrożny; Guy De Tré