Filip Ilievski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Filip Ilievski is active.

Explore More

Publication

Featured researches published by Filip Ilievski.

Sprachwissenschaft | 2017

Literally better: Analyzing and improving the quality of literals

Wouter Beek; Filip Ilievski; Jeremy Debattista; Stefan Schlobach; Jan Wielemaker

Quality is a complicated and multifarious topic in contemporary Linked Data research. The aspect of literal quality in particular has not yet been rigorously studied. Nevertheless, analyzing and improving the quality of literals is important since literals form a substantial (one in seven statements) and crucial part of the Semantic Web. Specifically, literals allow infinite value spaces to be expressed and they provide the linguistic entry point to the LOD Cloud. We present a toolchain that builds on the LOD Laundromat data cleaning and republishing infrastructure and that allows us to analyze the quality of literals on a very large scale, using a collection of quality criteria we specify in a systematic way. We illustrate the viability of our approach by lifting out two particular aspects in which the current LOD Cloud can be immediately improved by automated means: value canonization and language tagging. Since not all quality aspects can be addressed algorithmically, we also give an overview of other problems that can be used to guide future endeavors in tooling, training, and best practice formulation.

international semantic web conference | 2016

LOTUS: Adaptive Text Search for Big Linked Data

Filip Ilievski; Wouter Beek; Marieke van Erp; Laurens Rietveld; Stefan Schlobach

Finding relevant resources on the Semantic Web today is a dirty job: no centralized query service exists and the support for natural language access is limited. We present LOTUS: Linked Open Text UnleaShed, a text-based entry point to a massive subset of todays Linked Open Data Cloud. Recognizing the use case dependency of resource retrieval, LOTUS provides an adaptive framework in which a set of matching and ranking algorithms are made available. Researchers and developers are able to tune their own LOTUS index by choosing and combining the matching and ranking algorithms that suit their use case best. In this paper, we explain the LOTUS approach, its implementation and the functionality it provides. We demonstrate the ease with which LOTUS enables text-based resource retrieval at an unprecedented scale in concrete and domain-specific scenarios. Finally, we provide evidence for the scalability of LOTUS with respect to the LOD Laundromat, the largest collection of easily accessible Linked Open Data currently available.

language data and knowledge | 2017

Hunger for Contextual Knowledge and a Road Map to Intelligent Entity Linking

Filip Ilievski; Piek Vossen; Marieke van Erp

The task of entity linking (EL) is often perceived as an algorithmic problem, where the novelty of systems lies in the decision making process, while the knowledge is relatively fixed. As a consequence, we lack an understanding about the importance and the relevance of diverse knowledge types in EL. However, knowledge and relevance are crucial: following the Gricean maxim, an author relies on assumptions about the knowledge of the reader and uses the most efficient and scarce, yet understandable, level of detail when conveying a message. In this paper, we seek to understand the EL task from a knowledge and relevance perspective. We define four categories of contextual knowledge relevant for EL and observe that two of these are systematically absent in existing entity linkers. Consequently, many contextual cases, in particular long-tail entities, can never be interpreted by existing systems. Finally, we present our ideas on developing knowledge-intensive systems and long-tail datasets.

empirical methods in natural language processing | 2016

Moving away from semantic overfitting in disambiguation datasets

Marten Postma; Filip Ilievski; Piek Vossen; M.G.J. van Erp

Entities and events in the world have no frequency, but our communication about them and the expressions we use to refer to them do have a strong frequency profile. Language expressions and their meanings follow a Zipfian distribution, featuring a small amount of very frequent observations and a very long tail of low frequent observations. Since our NLP datasets sample texts but do not sample the world, they are no exception to Zipf’s law. This causes a lack of representativeness in our NLP tasks, leading to models that can capture the head phenomena in language, but fail when dealing with the long tail. We therefore propose a referential challenge for semantic NLP that reflects a higher degree of ambiguity and variance and captures a large range of small real-world phenomena. To perform well, systems would have to show deep understanding on the linguistic tail.

12th International Summer School on Reasoning Web Summer School, RW 2016 | 2016

LOD Lab: Scalable Linked Data Processing

Wouter Beek; Laurens Rietveld; Filip Ilievski; Stefan Schlobach

With tens if not hundreds of billions of logical statements, the Linked Open Data (LOD) is one of the biggest knowledge bases ever built. As such it is a gigantic source of information for applications in various domains, but also given its size an ideal test-bed for knowledge representation and reasoning, heterogeneous nature, and complexity.

language resources and evaluation | 2016