Petar Ristoski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Petar Ristoski is active.

Explore More

Publication

Featured researches published by Petar Ristoski.

Journal of Web Semantics | 2016

Semantic Web in data mining and knowledge discovery

Petar Ristoski; Heiko Paulheim

Data Mining and Knowledge Discovery in Databases (KDD) is a research field concerned with deriving higher-level insights from data. The tasks performed in that field are knowledge intensive and can often benefit from using additional knowledge from various sources. Therefore, many approaches have been proposed in this area that combine Semantic Web data with the data mining and knowledge discovery process. This survey article gives a comprehensive overview of those approaches in different stages of the knowledge discovery process. As an example, we show how Linked Open Data can be used at various stages for building content-based recommender systems. The survey shows that, while there are numerous interesting research works performed, the full potential of the Semantic Web and Linked Open Data for data mining and KDD is still to be unlocked.

extended semantic web conference | 2013

I See a Car Crash: Real-Time Detection of Small Scale Incidents in Microblogs

Axel Schulz; Petar Ristoski; Heiko Paulheim

Microblogs are increasingly gaining attention as an important information source in emergency management. Nevertheless, it is still difficult to reuse this information source during emergency situations, because of the sheer amount of unstructured data. Especially for detecting small scale events like car crashes, there are only small bits of information, thus complicating the detection of relevant information.

international semantic web conference | 2016

RDF2Vec: RDF Graph Embeddings for Data Mining

Petar Ristoski; Heiko Paulheim

Linked Open Data has been recognized as a valuable source for background information in data mining. However, most data mining tools require features in propositional form, i.e., a vector of nominal or numerical features associated with an instance, while Linked Open Data sources are graphs by nature. In this paper, we present RDF2Vec, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs. We generate sequences by leveraging local information from graph sub-structures, harvested by Weisfeiler-Lehman Subtree RDF Graph Kernels and graph walks, and learn latent numerical representations of entities in RDF graphs. Our evaluation shows that such vector representations outperform existing techniques for the propositionalization of RDF graphs on a variety of different predictive machine learning tasks, and that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.

Journal of Web Semantics | 2015

Mining the Web of Linked Data with RapidMiner

Petar Ristoski; Heiko Paulheim

Lots of data from different domains are published as Linked Open Data (LOD). While there are quite a few browsers for such data, as well as intelligent tools for particular purposes, a versatile tool for deriving additional knowledge by mining the Web of Linked Data is still missing. In this system paper, we introduce the RapidMiner Linked Open Data extension. The extension hooks into the powerful data mining and analysis platform RapidMiner, and offers operators for accessing Linked Open Data in RapidMiner, allowing for using it in sophisticated data analysis workflows without the need for expert knowledge in SPARQL or RDF. The extension allows for autonomously exploring the Web of Data by following links, thereby discovering relevant datasets on the fly, as well as for integrating overlapping data found in different datasets. As an example, we show how statistical data from the World Bank on scientific publications, published as an RDF data cube, can be automatically linked to further datasets and analyzed using additional background knowledge from ten different LOD datasets.

Semantic Web Evaluation Challenge | 2014

A Hybrid Multi-strategy Recommender System Using Linked Open Data

Petar Ristoski; Eneldo Loza Mencía; Heiko Paulheim

In this paper, we discuss the development of a hybrid multi-strategy book recommendation system using Linked Open Data. Our approach builds on training individual base recommenders and using global popularity scores as generic recommenders. The results of the individual recommenders are combined using stacking regression and rank aggregation. We show that this approach delivers very good results in different recommendation settings and also allows for incorporating diversity of recommendations.

discovery science | 2014

Feature Selection in Hierarchical Feature Spaces

Petar Ristoski; Heiko Paulheim

Feature selection is an important preprocessing step in data mining, which has an impact on both the runtime and the result quality of the subsequent processing steps. While there are many cases where hierarchic relations between features exist, most existing feature selection approaches are not capable of exploiting those relations. In this paper, we introduce a method for feature selection in hierarchical feature spaces. The method first eliminates redundant features along paths in the hierarchy, and further prunes the resulting feature set based on the features’ relevance. We show that our method yields a good trade-off between feature space compression and classification accuracy, and outperforms both standard approaches as well as other approaches which also exploit hierarchies.

Journal of Web Semantics | 2015

The Mannheim Search Join Engine

Oliver Lehmberg; Dominique Ritze; Petar Ristoski; Robert Meusel; Heiko Paulheim

A Search Join is a join operation which extends a user-provided table with additional attributes based on a large corpus of heterogeneous data originating from the Web or corporate intranets. Search Joins are useful within a wide range of application scenarios: Imagine you are an analyst having a local table describing companies and you want to extend this table with attributes containing the headquarters, turnover, and revenue of each company. Or imagine you are a film enthusiast and want to extend a table describing films with attributes like director, genre, and release date of each film. This article presents the Mannheim Search Join Engine which automatically performs such table extension operations based on a large corpus of Web data. Given a local table, the Mannheim Search Join Engine searches the corpus for additional data describing the entities contained in the input table. The discovered data are joined with the local table and are consolidated using schema matching and data fusion techniques. As a result, the user is presented with an extended table and given the opportunity to examine the provenance of the added data. We evaluate the Mannheim Search Join Engine using heterogeneous data originating from over one million different websites. The data corpus consists of HTML tables, as well as Linked Data and Microdata annotations which are converted into tabular form. Our experiments show that the Mannheim Search Join Engine achieves a coverage close to 100% and a precision of around 90% for the tasks of extending tables describing cities, companies, countries, drugs, books, films, and songs.

international semantic web conference | 2016

A collection of benchmark datasets for systematic evaluations of machine learning on the Semantic Web

Petar Ristoski; Gerben Klaas Dirk de Vries; Heiko Paulheim

In the recent years, several approaches for machine learning on the Semantic Web have been proposed. However, no extensive comparisons between those approaches have been undertaken, in particular due to a lack of publicly available, acknowledged benchmark datasets. In this paper, we present a collection of 22 benchmark datasets of different sizes. Such a collection of datasets can be used to conduct quantitative performance testing and systematic comparisons of approaches.

web intelligence, mining and semantics | 2017

Biased graph walks for RDF graph embeddings

Michael Cochez; Petar Ristoski; Simone Paolo Ponzetto; Heiko Paulheim

Knowledge Graphs have been recognized as a valuable source for background information in many data mining, information retrieval, natural language processing, and knowledge extraction tasks. However, obtaining a suitable feature vector representation from RDF graphs is a challenging task. In this paper, we extend the RDF2Vec approach, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities. We generate sequences by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs. We extend the way we compute feature vector representations by comparing twelve different edge weighting functions for performing biased walks on the RDF graph, in order to generate higher quality graph embeddings. We evaluate our approach using different machine learning, as well as entity and document modeling benchmark data sets, and show that the naive RDF2Vec approach can be improved by exploiting Biased Graph Walks.

international semantic web conference | 2017

Global RDF Vector Space Embeddings

Michael Cochez; Petar Ristoski; Simone Paolo Ponzetto; Heiko Paulheim

Vector space embeddings have been shown to perform well when using RDF data in data mining and machine learning tasks. Existing approaches, such as RDF2Vec, use local information, i.e., they rely on local sequences generated for nodes in the RDF graph. For word embeddings, global techniques, such as GloVe, have been proposed as an alternative. In this paper, we show how the idea of global embeddings can be transferred to RDF embeddings, and show that the results are competitive with traditional local techniques like RDF2Vec.

Explore More