Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kai Eckert is active.

Publication


Featured researches published by Kai Eckert.


international semantic web conference | 2013

Deployment of RDFa, Microdata, and Microformats on the Web A Quantitative Analysis

Kai Eckert; Robert Meusel; Hannes Mühleisen; Michael Schuhmacher; Johanna Völker

More and more websites embed structured data describing for instance products, reviews, blog posts, people, organizations, events, and cooking recipes into their HTML pages using markup standards such as Microformats, Microdata and RDFa. This development has accelerated in the last two years as major Web companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to use the embedded data within their applications. In this paper, we analyze the adoption of RDFa, Microdata, and Microformats across the Web. Our study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites. The analysis reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. What distinguishes our work from earlier studies, published by the large Web companies, is that the analyzed crawl as well as the extracted data are publicly available. This allows our findings to be verified and to be used as starting points for further domain-specific investigations as well as for focused information extraction endeavors.


european semantic web conference | 2009

Improving Ontology Matching Using Meta-level Learning

Kai Eckert; Christian Meilicke; Heiner Stuckenschmidt

Despite serious research efforts, automatic ontology matching still suffers from severe problems with respect to the quality of matching results. Existing matching systems trade-off precision and recall and have their specific strengths and weaknesses. This leads to problems when the right matcher for a given task has to be selected. In this paper, we present a method for improving matching results by not choosing a specific matcher but applying machine learning techniques on an ensemble of matchers. Hereby we learn rules for the correctness of a correspondence based on the output of different matchers and additional information about the nature of the elements to be matched, thus leveraging the weaknesses of an individual matcher. We show that our method always performs significantly better than the median of the matchers used and in most cases outperforms the best matcher with an optimal threshold for a given pair of ontologies. As a side product of our experiments, we discovered that the majority vote is a simple but powerful heuristic for combining matchers that almost reaches the quality of our learning results.


theory and practice of digital libraries | 2012

Identifying references to datasets in publications

Katarina Boland; Dominique Ritze; Kai Eckert; Brigitte Mathiak

Research data and publications are usually stored in separate and structurally distinct information systems. Often, links between these resources are not explicitly available which complicates the search for previous research. In this paper, we propose a pattern induction method for the detection of study references in full texts. Since these references are not specified in a standardized way and may occur inside a variety of different contexts --- i.e., captions, footnotes, or continuous text --- our algorithm is required to induce very flexible patterns. To overcome the sparse distribution of training instances, we induce patterns iteratively using a bootstrapping approach. We show that our method achieves promising results for the automatic identification of data references and is a first step towards building an integrated information system.


international conference on knowledge capture | 2007

Interactive thesaurus assessment for automatic document annotation

Kai Eckert; Heiner Stuckenschmidt; Magnus Pfeffer

The use of thesaurus-based indexing is a common approach for increasing the performance of document retrieval. With the growing amount of documents available, manual indexing is not a feasible option. Statistical methods for automated document indexing are an attractive alternative. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed is of crucial importance inautomatic indexing because there is no human in the loop that can spot and avoid indexing errors. We propose a method for thesaurus evaluation that is based on a combination of statistical measures and appropriate visualization techniques that supports the detection of potential problems in a thesaurus. We describe this method and show its application in the context of two automatic indexing tasks. The examples show that the methods indeed eases the detection and correction of errors leading to a better indexing result. Please refer to http://www.kaiec.org for high resolution media of all figures used in this paper, as well as an animated presentation of the interactive tool.


Library Hi Tech | 2009

Tagging and automation: challenges and opportunities for academic libraries

Kai Eckert; Christian Hänger; Christof Niemann

Purpose – The purpose of this paper is to compare and examine the quality of the results of tagging, and intellectual and automated indexing processes.Design/methodology/approach – The approach takes the form of analysis and graphical representation of annotation sets using the software “Semtinel”.Findings – A combination of tagging, intellectual and automatic indexing is probably best suited to shape the annotation of literature more efficiently without compromising quality.Originality/value – The paper presents the open source software Semtinel, offering a highly optimized toolbox for analysing thesauri and classifications.


international world wide web conferences | 2014

RESTful open workflows for data provenance and reuse

Kai Eckert; Dominique Ritze; Konstantin Baierer

In this paper, we present a workflow model together with an implementation following the Linked Data principles and the principles for RESTful web services. By means of RDF-based specifications of web services, workflows, and runtime information, we establish a full provenance chain for all resources created within these workflows.


international conference on asian digital libraries | 2010

Thesaurus extension using web search engines

Robert Meusel; Mathias Niepert; Kai Eckert; Heiner Stuckenschmidt

Maintaining and extending large thesauri is an important challenge facing digital libraries and IT businesses alike. In this paper we describe a method building on and extending existing methods from the areas of thesaurus maintenance, natural language processing, and machine learning to (a) extract a set of novel candidate concepts from text corpora and (b) to generate a small ranked list of suggestions for the position of these concept in an existing thesaurus. Based on a modification of the standard tf-idf term weighting we extract relevant concept candidates from a document corpus. We then apply a pattern-based machine learning approach on content extracted from web search engine snippets to determine the type of relation between the candidate terms and existing thesaurus concepts. The approach is evaluated with a largescale experiment using the MeSH and WordNet thesauri as testbed.


international conference on semantic systems | 2015

The role of reasoning for RDF validation

Thomas Bosch; Erman Acar; Andreas Nolle; Kai Eckert

For data practitioners embracing the world of RDF and Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this is taken for granted in the XML world. Based on our work in the DCMI RDF Application Profiles Task Group and in cooperation with the W3C Data Shapes Working Group, we published by today 81 types of constraints that are required by various stakeholders for data applications. These constraint types form the basis to investigate the role that reasoning and different semantics play in practical data validation, why reasoning is beneficial for RDF validation, and how to overcome the major shortcomings when validating RDF data by performing reasoning prior to validation. For each constraint type, we examine (1) if reasoning may improve data quality, (2) how efficient in terms of runtime validation is performed with and without reasoning, and (3) if validation results depend on underlying semantics which differs between reasoning and validation. Using these findings, we determine for the most common constraint languages which constraint types they enable to express and give directions for the further development of constraint languages.


GfKl | 2014

Data Enrichment in Discovery Systems Using Linked Data

Dominique Ritze; Kai Eckert

The Linked Data Web is an abundant source for information that can be used to enrich information retrieval results. This can be helpful in many different scenarios, for example to enable extensive multilingual semantic search or to provide additional information to the users. In general, there are two different ways to enrich data: client-side and server-side. With client-side data enrichment, for instance by means of JavaScript in the browser, users can get additional information related to the results they are provided with. This additional information is not stored within the retrieval system and thus not available to improve the actual search. An example is the provision of links to external sources like Wikipedia, merely for convenience. By contrast, an enrichment on the server-side can be exploited to improve the retrieval directly, at the cost of data duplication and additional efforts to keep the data up-to-date. In this paper, we describe the basic concepts of data enrichment in discovery systems and compare advantages and disadvantages of both variants. Additionally, we introduce a JavaScript Plugin API that abstracts from the underlying system and facilitates platform independent client-side enrichments.


International Journal of Metadata, Semantics and Ontologies | 2008

Assessing thesaurus-based annotations for semantic search applications

Kai Eckert; Magnus Pfeffer; Heiner Stuckenschmidt

Statistical methods for automated document indexing are becoming an alternative to the manual assignment of keywords. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed and as a basis for the specific indexing method used is of crucial importance in automatic indexing. We present an interactive tool for thesaurus evaluation that is based on a combination of statistical measures and appropriate visualisation techniques that supports the detection of potential problems in a thesaurus. We describe the methods used and show that the tool supports the detection and correction of errors, leading to a better indexing result.

Collaboration


Dive into the Kai Eckert's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge