Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Benjamin Zapilko is active.

Publication


Featured researches published by Benjamin Zapilko.


arXiv: Digital Libraries | 2013

TheSoz: A SKOS representation of the thesaurus for the social sciences

Benjamin Zapilko; Johann Schaible; Philipp Mayr; Brigitte Mathiak

The Thesaurus for the Social Sciences TheSoz is a Linked Dataset in SKOS format, which serves as a crucial instrument for information retrieval based on e.g. document indexing or search term recommendation. Thesauri and similar controlled vocabularies build a linking bridge for datasets from the Linked Open Data cloud. In this article the conversion process of the TheSoz to SKOS is described including the analysis of the original dataset and its structure, the mapping to adequate SKOS classes and properties, and the technical conversion. In order to create a semantically full representation of TheSoz in SKOS, extensions based on SKOS-XL had to be defined. These allow the modeling of special relations like compound equivalences and terms with ambiguities. Additionally, mappings to other datasets and the appliance of the TheSoz are presented. Finally, limitations and modeling issues encountered during the creation process are discussed.


arXiv: Human-Computer Interaction | 2011

Web-based multi-view visualizations for aggregated statistics

Daniel Hienert; Benjamin Zapilko; Philipp Schaer; Brigitte Mathiak

With the rise of the open data movement a lot of statistical data has been made publicly available by governments, statistical offices and other organizations. First efforts to visualize are made by the data providers themselves. Data aggregators go a step beyond: they collect data from different open data repositories and make them comparable by providing data sets from different providers and showing different statistics in the same chart. Another approach is to visualize two different indicators in a scatter plot or on a map. The integration of several data sets in one graph can have several drawbacks: different scales and units are mixed, the graph gets visually cluttered and one cannot easily distinguish between different indicators. Our approach marks a combination of (1) the integration of live data from different data sources, (2) presenting different indicators in coordinated visualizations and (3) allows adding user visualizations to enrich official statistics with personal data. Each indicator gets its own visualization, which fits best for the individual indicator in case of visualization type, scale, unit etc. The different visualizations are linked, so that related items can easily be identified by using mouse over effects on data items.


european semantic web conference | 2014

Object Property Matching utilizing the Overlap between Imported Ontologies

Benjamin Zapilko; Brigitte Mathiak

Large scale Linked Data is often based on relational databases and thereby tends to be modeled with rich object properties, specifying the exact relationship between two objects, rather than a generic is-a or part-of relationship. We study this phenomenon on government issued statistical data, where a vested interest exists in matching such object properties for data integration. We leverage the fact that while the labeling of the properties is often heterogeneous, e.g. ex1:geo and ex2:location, they link to individuals of semantically similar code lists, e.g. country lists. State-of-the-art ontology matching tools do not use this effect and therefore tend to miss the possible correspondences. We enhance the state-of-the-art matching process by aligning the individuals of such imported ontologies separately and computing the overlap between them to improve the matching of the object properties. The matchers themselves are used as black boxes and are thus interchangeable. The new correspondences found with this method lead to an increase of recall up to 2.5 times on real world data, with only a minor loss in precision.


international conference on web information systems and technologies | 2011

VIZGR - Combining Data on a Visual Level

Daniel Hienert; Benjamin Zapilko; Philipp Schaer; Brigitte Mathiak

In this paper we present a novel method to connect data on the visualization level. In general, visualizations are a dead end, when it comes to reusability. Yet, users prefer to work with visualizations as evidenced by WYSIWYG editors. To enable users to work with their data in a way that is intuitive to them, we have created Vizgr. Vizgr.com offers basic visualization methods, like graphs, tag clouds, maps and time lines. But unlike normal data visualizations, these can be re-used, connected to each other and to web sites. We offer a simple opportunity to combine diverse data structures, such as geo-locations and networks, with each other by a mouse click. In an evaluation, we found that over 85 % of the participants were able to use and understand this technology without any training or explicit instructions.


international conference on web information systems and technologies | 2011

Vizgr: Linking Data in Visualizations

Daniel Hienert; Benjamin Zapilko; Philipp Schaer; Brigitte Mathiak

Working with data can be very abstract without a proper visualization. Yet, once the data is visualized, it presents a dead end, so the user has to return to the data level to make enrichments. With Vizgr (vizgr.org), we offer an elegant simplification to this workflow by giving the opportunity to enrich the data in the visualization itself. Data, e.g. statistical data, data entered by the user, from DBpedia or other data sources, can be visualized by graphs, tag clouds, on maps and in timelines. The data points can be connected with each other, with data in other visualizations and any web address, regardless of the source. It allows users to make data presentations without changing to the data level, once the data is in the system. In an evaluation, we found that over 85% of the participants were able to use and understand this technology without any training or explicit instructions.


Frontiers in Digital Humanities | 2017

Interlinking Large-scale Library Data with Authority Records

Felix Bensmann; Benjamin Zapilko; Philipp Mayr

In the area of Linked Open Data (LOD) meaningful and high-performance interlinking of different datasets has become an ongoing challenge. Necessary tasks are supported by established standards and software, e.g. for the transformation, storage, interlinking and publication of data. Our use case Swissbib is a well-known provider for bibliographic data in Switzerland representing various libraries and library networks. In this article, a case study is presented from the project linked.swissbib.ch which focuses on the preparation and publication of the Swissbib data by means of LOD. Data available in Marc21 XML is extracted from the Swissbib system and transformed into an RDF/XML representation. From the approx. 21 million monolithic records the author information is extracted and interlinked with authority files from the Virtual International Authority File (VIAF) and DBpedia. The links are used to extract additional data from the counterpart corpora. Afterwards, data is pushed into an Elasticsearch index to make the data accessible for other components. As a demonstrator, a search portal is developed which presents the additional data and the generated links to users. In addition to that, a REST interface is developed in order to enable also access by other applications. A main obstacle in this project is the amount of data and the necessity of day-to-day (partial) updates. In the current situation the data in Swissbib and in the external corpora are too large to be processed by established linking tools. The arising memory footprint prevents the correct functioning of these tools. Also triple stores are unhandy by revealing a massive overhead for import and update operations. Hence, we have developed procedures for extracting and shaping the data into a more suitable form, e.g. data is reduced to the necessary properties and blocked. For this purpose, we used sorted N-Triples as an intermediate data format. This method proved to be very promising as our preliminary results show. Our approach could establish 30,773 links to DBpedia and 20,714 links to VIAF and both link sets show high precision values and could be generated in reasonable expenditures of time.


ieee international conference semantic computing | 2016

Validating RDF Data Quality Using Constraints to Direct the Development of Constraint Languages

Thomas Hartmann; Benjamin Zapilko; Joachim Wackerow; Kai Eckert

For research institutes, data libraries, and data archives, RDF data validation according to predefined constraints is a much sought-after feature, particularly as this is taken for granted in the XML world. Based on our work in the DCMI RDF Application Profiles Task Group and in cooperation with the W3C Data Shapes Working Group, we identified and published by today 81 types of constraints that are required by various stakeholders for data applications. In this paper, in collaboration with several domain experts we formulate 115 constraints on three different vocabularies (DDI-RDF, QB, and SKOS) and classify them according to (1) the severity of an occurring violation and (2) the complexity of the constraint expression in common constraint languages. We evaluate the data quality of 15,694 data sets (4.26 billion triples) of research data for the social, behavioral, and economic sciences obtained from 33 SPARQL endpoints. Based on the results, we formulate several findings to direct the further development of constraint languages.


european semantic web conference | 2018

A LOD Backend Infrastructure for Scientific Search Portals

Benjamin Zapilko; Katarina Boland; Dagmar Kern

In recent years, Linked Data became a key technology for organizations in order to publish their data collections on the web and to connect it with other data sources on the web. With the ongoing change in the research infrastructure landscape where an integrated search for comprehensive research information gains importance, organizations are challenged to connect their historically unconnected databases with each other. In this article, we present a Linked Open Data based backend infrastructure for a scientific search portal which is set as an additional layer between unconnected non-RDF data collections and makes the links between datasets visible and usable for retrieval. In addition, Linked Data technologies are used in order to organize different versions and aggregations of datasets. We evaluate the in-use application of our approach in a scientific search portal for the social sciences by investigating the benefit of links between different data sources in a user study.


Künstliche Intelligenz | 2016

Applying Linked Data Technologies in the Social Sciences

Benjamin Zapilko; Johann Schaible; Timo Wandhöfer; Peter Mutschke

In recent years, Linked Open Data (LOD) has matured and gained acceptance across various communities and domains. Large potential of Linked Data technologies is seen for an application in scientific disciplines. In this article, we present use cases and applications for an application of Linked Data in the social sciences. They focus on (a) interlinking domain-specific information, and (b) linking social science data to external LOD sources (e.g. authority data) from other domains. However, several technical and research challenges arise, when applying Linked Data technologies to a scientific domain with its specific data, information needs and use cases. We discuss these challenges and show how they can be addressed.


International Journal of Semantic Computing | 2016

Directing the Development of Constraint Languages by Checking Constraints on RDF Data

Thomas Hartmann; Benjamin Zapilko; Joachim Wackerow; Kai Eckert

For research institutes, data libraries, and data archives, validating RDF data according to predefined constraints is a much sought-after feature, particularly as this is taken for granted in the XML world. Based on our work in two international working groups on RDF validation and jointly identified requirements to formulate constraints and validate RDF data, we have published 81 types of constraints that are required by various stakeholders for data applications. In this paper, we evaluate the usability of identified constraint types for assessing RDF data quality by (1) collecting and classifying 115 constraints on vocabularies commonly used in the social, behavioral, and economic sciences, either from the vocabularies themselves or from domain experts, and (2) validating 15,694 data sets (4.26 billion triples) of research data against these constraints. We classify each constraint according to (1) the severity of occurring violations and (2) based on which types of constraint languages are able to express its constraint type. Based on the large-scale evaluation, we formulate several findings to direct the further development of constraint languages.

Collaboration


Dive into the Benjamin Zapilko's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kai Eckert

University of Mannheim

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

York Sure

Karlsruhe Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge