Kostyantyn M. Shchekotykhin
Alpen-Adria-Universität Klagenfurt
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kostyantyn M. Shchekotykhin.
Journal of Web Semantics | 2012
Kostyantyn M. Shchekotykhin; Gerhard Friedrich; Philipp Fleiss; Patrick Rodler
Effective debugging of ontologies is an important prerequisite for their broad application, especially in areas that rely on everyday users to create and maintain knowledge bases, such as the Semantic Web. In such systems ontologies capture formalized vocabularies of terms shared by its users. However in many cases users have different local views of the domain, i.e. of the context in which a given term is used. Inappropriate usage of terms together with natural complications when formulating and understanding logical descriptions may result in faulty ontologies. Recent ontology debugging approaches use diagnosis methods to identify causes of the faults. In most debugging scenarios these methods return many alternative diagnoses, thus placing the burden of fault localization on the user. This paper demonstrates how the target diagnosis can be identified by performing a sequence of observations, that is, by querying an oracle about entailments of the target ontology. To identify the best query we propose two query selection strategies: a simple “split-in-half” strategy and an entropy-based strategy. The latter allows knowledge about typical user errors to be exploited to minimize the number of queries. Our evaluation showed that the entropy-based method significantly reduces the number of required queries compared to the “split-in-half” approach. We experimented with different probability distributions of user errors and different qualities of the a priori probabilities. Our measurements demonstrated the superiority of entropy-based query selection even in cases where all fault probabilities are equal, i.e. where no information about typical user errors is available.
Journal of Web Semantics | 2009
Dietmar Jannach; Kostyantyn M. Shchekotykhin; Gerhard Friedrich
The process of populating an ontology-based system with high-quality and up-to-date instance information can be both time-consuming and prone to error. In many domains, however, one possible solution to this problem is to automate the instantiation process for a given ontology by searching (mining) the web for the required instance information. The primary challenges facing such system include: (a) efficiently locating web pages that most probably contain the desired instance information, (b) extracting the instance information from a page, and (c) clustering documents that describe the same instance in order to exploit data redundancy on the web and thus improve the overall quality of the harvested data. In addition, these steps should require as little seed knowledge as possible. In this paper, the AllRight ontology instantiation system is presented, which supports the full instantiation life-cycle and addresses the above-mentioned challenges through a combination of new and existing techniques. In particular the system was designed to deal with situations where the instance information is given in tabular form. The main innovative pillars of the system are a new high-recall focused crawling technique (xCrawl), a novel table recognition algorithm, innovative methods for document clustering and instance name recognition, as well as techniques for fact extraction, instance generation and query-based fact validation. The successful evaluation of the system in different real-world application scenarios shows that the ontology instantiation process can be successfully automated using only a very limited amount of seed knowledge.
Applied Intelligence | 2009
Alexander Felfernig; Gerhard Friedrich; Klaus Isak; Kostyantyn M. Shchekotykhin; Erich Christian Teppan; Dietmar Jannach
Abstract Customers interacting with online selling platforms require the assistance of sales support systems in the product and service selection process. Knowledge-based recommenders are specific sales support systems which involve online customers in dialogs with the goal to support preference forming processes. These systems have been successfully deployed in commercial environments supporting the recommendation of, e.g., financial services, e-tourism services, or consumer goods. However, the development of user interface descriptions and knowledge bases underlying knowledge-based recommenders is often an error-prone and frustrating business. In this paper we focus on the first aspect and present an approach which supports knowledge engineers in the identification of faults in user interface descriptions. These descriptions are the input for a model-based diagnosis algorithm which automatically identifies faulty elements and indicates those elements to the knowledge engineer. In addition, we present results of an empirical study which demonstrates the applicability of our approach.
international semantic web conference | 2007
Kostyantyn M. Shchekotykhin; Dietmar Jannach; Gerhard Friedrich; Olga Kozeruk
The process of instantiating an ontology with high-quality and up-to-date instance information manually is both time consuming and prone to error. Automatic ontology instantiation from Web sources is one of the possible solutions to this problem and aims at the computer supported population of an ontology through the exploitation of (redundant) information available on the Web. In this paper we present ALLRIGHT, a comprehensive ontology instantiating system. In particular, the techniques implemented in ALLRIGHT are designed for application scenarios, in which the desired instance information is given in the form of tables and for which existing Information Extraction (IE) approaches based on statistical or natural language processing methods are not directly applicable. Within ALLRIGHT, we have therefore developed new techniques for dealing with tabular instance data and combined these techniques with existing methods. The system supports all necessary steps for ontology instantiation, i.e. web crawling, name extraction, document clustering as well as fact extraction and validation. ALLRIGHT has been successfully evaluated in the popular domains of digital cameras and notebooks leading to a about eighty percent accuracy of the extracted facts given only a very limited amount of seed knowledge.
european conference on artificial intelligence | 2014
Kostyantyn M. Shchekotykhin; Gerhard Friedrich; Patrick Rodler; Philipp Fleiss
Sequential diagnosis methods compute a series of queries for discriminating between diagnoses. Queries are answered by probing such that eventually the set of faults is identified. The computation of queries is based on the generation of a set of most probable diagnoses. However, in diagnosis problem instances where the number of minimal diagnoses and their cardinality is high, even the generation of a set of minimum cardinality diagnoses is unfeasible with the standard conflict-based approach. In this paper we propose to base sequential diagnosis on the computation of some set of minimal diagnoses using the direct diagnosis method, which requires less consistency checks to find a minimal diagnosis than the standard approach. We study the application of this direct method to high cardinality faults in knowledge-bases. In particular, our evaluation shows that the direct method results in almost the same number of queries for cases when the standard approach is applicable. However, for the cases when the standard approach is not applicable, sequential diagnosis based on the direct method is able to locate the faults correctly.
international conference on data mining | 2009
Kostyantyn M. Shchekotykhin; Gerhard Friedrich
Efficient acquisition of constraint networks is a key factor for the applicability of constraint problem solving methods. Current techniques learn constraint networks from sets of training examples, where each example is classified as either a solution or non-solution of a target network. However, in addition to this classification, an expert can usually provide arguments as to why examples should be rejected or accepted. Generally speaking domain specialists have partial knowledge about the theory to be acquired which can be exploited for knowledge acquisition. Based on this observation, we discuss the various types of arguments an expert can formulate and develop a knowledge acquisition algorithm for processing these types of arguments which gives the expert the possibility to input arguments in addition to the learning examples. The result of this approach is a significant reduction in the number of examples which must be provided to the learner in order to learn the target constraint network.
pacific asia conference on knowledge discovery and data mining | 2010
Kostyantyn M. Shchekotykhin; Dietmar Jannach; Gerhard Friedrich
Web mining systems exploit the redundancy of data published on the Web to automatically extract information from existing Web documents. The first step in the Information Extraction process is thus to locate as many Web pages as possible that contain relevant information within a limited period of time, a task which is commonly accomplished by applying focused crawling techniques. The performance of such a crawler can be measured by its “recall”, i.e., the percentage of documents found and identified as relevant compared to the total number of existing documents. A higher recall value implies that more redundant data are available, which in turn leads to better results in the subsequent fact extraction phase of the Web mining process. In this paper, we propose xCrawl, a new focused crawling method which outperforms state-of-the-art approaches with respect to the recall values achievable within a given period of time. This method is based on a new combination of ideas and techniques used to identify and exploit the navigational structures of Web sites, such as hierarchies, lists, or maps. In addition, automatic query generation is applied to rapidly collect Web sources containing target documents. The proposed crawling technique was inspired by the requirements of a Web mining system developed to extract product and service descriptions given in tabular form and was evaluated in different application scenarios. Comparisons with existing focused crawling techniques reveal that the new crawling method leads to a significant increase in recall while maintaining precision.
Theory and Practice of Logic Programming | 2016
Carmine Dodaro; Philip Gasteiger; Nicola Leone; Benjamin Musitsch; Francesco Ricca; Kostyantyn M. Shchekotykhin
Answer Set Programming (ASP) is a popular logic programming paradigm that has been applied for solving a variety of complex problems. Among the most challenging real-world applications of ASP are two industrial problems defined by Siemens: the Partner Units Problem (PUP) and the Combined Configuration Problem (CCP). The hardest instances of PUP and CCP are out of reach for state-of-the-art ASP solvers. Experiments show that the performance of ASP solvers could be significantly improved by embedding domain-specific heuristics, but a proper effective integration of such criteria in off-the-shelf ASP implementations is not obvious. In this paper the combination of ASP and domain-specific heuristics is studied with the goal of effectively solving real-world problem instances of PUP and CCP. As a byproduct of this activity, the ASP solver WASP was extended with an interface that eases embedding new external heuristics in the solver. The evaluation shows that our domain-heuristic-driven ASP solver finds solutions for all the real-world instances of PUP and CCP ever provided by Siemens. This paper is under consideration for acceptance in TPLP.
international semantic web conference | 2010
Kostyantyn M. Shchekotykhin; Gerhard Friedrich
Debugging is an important prerequisite for the wide-spread application of ontologies, especially in areas that rely upon everyday users to create and maintain knowledge bases, such as the Semantic Web. Most recent approaches use diagnosis methods to identify sources of inconsistency. However, in most debugging cases these methods return many alternative diagnoses, thus placing the burden of fault localization on the user. This paper demonstrates how the target diagnosis can be identified by performing a sequence of observations, that is, by querying an oracle about entailments of the target ontology. We exploit probabilities of typical user errors to formulate information theoretic concepts for query selection. Our evaluation showed that the suggested method reduces the number of required observations compared to myopic strategies.
web reasoning and rule systems | 2013
Patrick Rodler; Kostyantyn M. Shchekotykhin; Philipp Fleiss; Gerhard Friedrich
Efficient ontology debugging is a cornerstone for many activities in the context of the Semantic Web, especially when automatic tools produce (parts of) ontologies such as in the field of ontology matching. The best currently known interactive debugging systems rely upon some meta information in terms of fault probabilities, which can speed up the debugging procedure in the good case, but can also have negative impact on the performance in the bad case. The problem is that assessment of the meta information is only possible a-posteriori. Consequently, as long as the actual fault is unknown, there is always some risk of suboptimal interactive diagnoses discrimination. As an alternative, one might prefer to rely on a tool which pursues a no-risk strategy. In this case, however, possibly well-chosen meta information cannot be exploited, resulting again in inefficient debugging actions. In this work we present a reinforcement learning strategy that continuously adapts its behavior depending on the performance achieved and minimizes the risk of using low-quality meta information. Therefore, this method is suitable for application scenarios where reliable a-priori fault estimates are difficult to obtain. Using a corpus of incoherent real-world ontologies from the field of ontology matching, we show that the proposed risk-aware query strategy outperforms both meta information based approaches and no-risk strategies on average in terms of required amount of user interaction.