Michael Schuhmacher
University of Mannheim
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael Schuhmacher.
international semantic web conference | 2013
Kai Eckert; Robert Meusel; Hannes Mühleisen; Michael Schuhmacher; Johanna Völker
More and more websites embed structured data describing for instance products, reviews, blog posts, people, organizations, events, and cooking recipes into their HTML pages using markup standards such as Microformats, Microdata and RDFa. This development has accelerated in the last two years as major Web companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to use the embedded data within their applications. In this paper, we analyze the adoption of RDFa, Microdata, and Microformats across the Web. Our study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites. The analysis reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. What distinguishes our work from earlier studies, published by the large Web companies, is that the analyzed crawl as well as the extracted data are publicly available. This allows our findings to be verified and to be used as starting points for further domain-specific investigations as well as for focused information extraction endeavors.
conference on information and knowledge management | 2015
Michael Schuhmacher; Laura Dietz; Simone Paolo Ponzetto
When humans explain complex topics, they naturally talk about involved entities, such as people, locations, or events. In this paper, we aim at automating this process by retrieving and ranking entities that are relevant to understand free-text web-style queries like Argentine British relations, which typically demand a set of heterogeneous entities with no specific target type like, for instance, Falklands_-War} or Margaret-_Thatcher, as answer. Standard approaches to entity retrieval rely purely on features from the knowledge base. We approach the problem from the opposite direction, namely by analyzing web documents that are found to be query-relevant. Our approach hinges on entity linking technology that identifies entity mentions and links them to a knowledge base like Wikipedia. We use a learning-to-rank approach and study different features that use documents, entity mentions, and knowledge base entities -- thus bridging document and entity retrieval. Since established benchmarks for this problem do not exist, we use TREC test collections for document ranking and collect custom relevance judgments for entities. Experiments on TREC Robust04 and TREC Web13/14 data show that: i) single entity features, like the frequency of occurrence within the top-ranke documents, or the query retrieval score against a knowledge base, perform generally well; ii) the best overall performance is achieved when combining different features that relate an entity to the query, its document mentions, and its knowledge base representation.
conference on information and knowledge management | 2013
Michael Schuhmacher; Simone Paolo Ponzetto
We present a knowledge-rich approach to Web search result clustering which exploits the output of an open-domain entity linker, as well as the types and topical concepts encoded within a wide-coverage ontology. Our results indicate that, thanks to an accurate and compact semantification of the search result snippets, we are able to achieve a competitive performance on a benchmarking dataset for this task.
european conference on information retrieval | 2016
Michael Schuhmacher; Benjamin Roth; Simone Paolo Ponzetto; Laura Dietz
This work studies the combination of a document retrieval and a relation extraction system for the purpose of identifying query-relevant relational facts. On the TREC Web collection, we assess extracted facts separately for correctness and relevance. Despite some TREC topics not being covered by the relation schema, we find that this approach reveals relevant facts, and in particular those not yet known in the knowledge base DBpedia. The study confirms that mention frequency, document relevance, and entity relevance are useful indicators for fact relevance. Still, the task remains an open research problem.
exploiting semantic annotations in information retrieval | 2015
Laura Dietz; Michael Schuhmacher
We aim to augment textual knowledge resources such as Wikipedia with information from the World Wide Web and at the same time focus on a given information need. We demonstrate a solution based on what we call knowledge portfolios. A knowledge portfolio is a query-specific collection of relevant entities together with associated passages from the Web that explain how the entity is relevant for the query. Knowledge portfolios are extracted through a combination of retrieval from World Wide Web and Wikipedia with a reasoning process on mutual relevance. A key ingredient are entity link annotations that tie abstract entities from the knowledge base into their context on the Web. We demonstrate the results of our fully automated system Queripidia, which is capable to create a knowledge portfolios for any web-style query, on data from the TREC Web track. The online demo is available via http://smart-cactus.org/~dietz/knowport/.
applications of natural language to data bases | 2014
Arnab Dutta; Michael Schuhmacher
Open domain information extraction (OIE) projects like Nell or ReVerb are often impaired by a schema-poor structure. This severely limits their application domain in spite of having web-scale coverage. In this work we try to disambiguate an OIE fact by referring its terms to unique instances from a structured knowledge base, DBpedia in our case. We propose a method which exploits the frequency information and the semantic relatedness of all probable candidate pairs. We show that our combined linking method outperforms a strong baseline.
Semantic Web Evaluation Challenge | 2014
Michael Schuhmacher; Christian Meilicke
Within this paper we present our contribution to Task 2 of the ESWC’14 Recommender Systems Challenge. First we describe an unpersonalized baseline approach that uses no linked-data but applies a naive way to compute the overall popularity of the items observed in the training data. Despite being very simple and unpersonalized, we achieve a competitive F1 measure of 0.5583. Then we describe an algorithm that makes use of several features acquired from DBpedia, like author and type, and self-generated features like abstract-based keywords, for item representation and comparison. Item recommendations are generated by a mixture-model of individual classifiers that have been learned per feature on a user neighborhood cluster in combination with a global classifier learned on all training data. While our Linked-Data-based approach achieves an F1 measure of 0.5649, the increase over the popularity baseline remains surprisingly low.
international conference on electronic commerce | 2015
Petar Ristoski; Michael Schuhmacher; Heiko Paulheim
Linked Open Data has been recognized as a useful source of background knowledge for building content-based recommender systems. While many existing approaches transform that data into a propositional form, we investigate how the graph nature of Linked Open Data can be exploited when building recommender systems. In particular, we use path lengths, the K-Step Markov approach, as well as weighted NI paths to compute item relevance and perform a content-based recommendation. An evaluation on the three tasks of the 2015 LOD-RecSys challenge shows that the results are promising, and, for cross-domain recommendations, outperform collaborative filtering.
international semantic web conference | 2013
Heiner Stuckenschmidt; Michael Schuhmacher; Christian Meilicke; Ansgar Scherp
Experimentation is an important way to validate results of Semantic Web and Computer Science research in general. In this paper, we investigate the development and the current status of experimental work on the Semantic Web. Based on a corpus of 500 papers collected from the International Semantic Web Conferences (ISWC) over the past decade, we analyse the importance and the quality of experimental research conducted and compare it to general Computer Science. We observe that the amount and quality of experiments are steadily increasing over time. Unlike hypothesised, we cannot confirm a statistically significant correlation between a papers citations and the amount of experimental work reported. Our analysis, however, shows that papers comparing themselves to other systems are more often cited than other papers.
web search and data mining | 2014
Michael Schuhmacher; Simone Paolo Ponzetto