Vít Nováček
National University of Ireland, Galway
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vít Nováček.
Journal of Biomedical Informatics | 2008
Vít Nováček; Loredana Laera; Siegfried Handschuh; Brian Davis
We present a novel ontology integration technique that explicitly takes the dynamics and data-intensiveness of e-health and biomedicine application domains into account. Changing and growing knowledge, possibly contained in unstructured natural language resources, is handled by application of cutting-edge Semantic Web technologies. In particular, semi-automatic integration of ontology learning results into a manually developed ontology is employed. This integration bases on automatic negotiation of agreed alignments, inconsistency resolution and natural language generation methods. Their novel combination alleviates the end-user effort in the incorporation of new knowledge to large extent. This allows for efficient application in many practical use cases, as we show in the paper.
international semantic web conference | 2011
Vít Nováček; Siegfried Handschuh; Stefan Decker
We aim at providing a complementary layer for the web semantics, catering for bottom-up phenomena that are empirically observable on the Semantic Web rather than being merely asserted by it. We focus on meaning that is not associated with particular semantic descriptions, but emerges from the multitude of explicit and implicit links on the web of data. We claim that the current approaches are mostly top-down and thus lack a proper mechanisms for capturing the emergent aspects of the web meaning. To fill this gap, we have proposed a framework based on distributional semantics (a successful bottom-up approach to meaning representation in computational linguistics) that is, however, still compatible with the top-down Semantic Web principles due to inherent support of rules. We evaluated our solution in a knowledge consolidation experiment, which confirmed the promising potential of our approach.
Journal of Web Semantics | 2010
Vít Nováček; Tudor Groza; Siegfried Handschuh; Stefan Decker
Search engines used in contemporary online scientific publishing mostly exploit raw publication data (bags of words) and shallow metadata (authors, key words, citations, etc.). Exploitation of the knowledge contained implicitly in published texts is still largely not utilized. Following our long-term ambition to take advantage of such knowledge, we have implemented CORAAL (COntent extended by emeRgent and Asserted Annotations of Linked publication data), an enhanced-search prototype and the second-prize winner of the Elsevier Grand Challenge. CORAAL extracts asserted publication metadata together with the knowledge implicitly present in the relevant text, integrates the emergent content, and displays it using a multiple-perspective search&browse interface. This way we enable semantic querying for individual publications, and convenient exploration of the knowledge contained within them. In other words, recalling the metaphor in the article title, we let the users dive into publications more easily, and allow them to freely bathe in the related unlocked knowledge.
PeerJ | 2014
Vít Nováček; Gully A. P. C. Burns
Background. Unlike full reading, ‘skim-reading’ involves the process of looking quickly over information in an attempt to cover more material whilst still being able to retain a superficial view of the underlying content. Within this work, we specifically emulate this natural human activity by providing a dynamic graph-based view of entities automatically extracted from text. For the extraction, we use shallow parsing, co-occurrence analysis and semantic similarity computation techniques. Our main motivation is to assist biomedical researchers and clinicians in coping with increasingly large amounts of potentially relevant articles that are being published ongoingly in life sciences. Methods. To construct the high-level network overview of articles, we extract weighted binary statements from the text. We consider two types of these statements, co-occurrence and similarity, both organised in the same distributional representation (i.e., in a vector-space model). For the co-occurrence weights, we use point-wise mutual information that indicates the degree of non-random association between two co-occurring entities. For computing the similarity statement weights, we use cosine distance based on the relevant co-occurrence vectors. These statements are used to build fuzzy indices of terms, statements and provenance article identifiers, which support fuzzy querying and subsequent result ranking. These indexing and querying processes are then used to construct a graph-based interface for searching and browsing entity networks extracted from articles, as well as articles relevant to the networks being browsed. Last but not least, we describe a methodology for automated experimental evaluation of the presented approach. The method uses formal comparison of the graphs generated by our tool to relevant gold standards based on manually curated PubMed, TREC challenge and MeSH data. Results. We provide a web-based prototype (called ‘SKIMMR’) that generates a network of inter-related entities from a set of documents which a user may explore through our interface. When a particular area of the entity network looks interesting to a user, the tool displays the documents that are the most relevant to those entities of interest currently shown in the network. We present this as a methodology for browsing a collection of research articles. To illustrate the practical applicability of SKIMMR, we present examples of its use in the domains of Spinal Muscular Atrophy and Parkinson’s Disease. Finally, we report on the results of experimental evaluation using the two domains and one additional dataset based on the TREC challenge. The results show how the presented method for machine-aided skim reading outperforms tools like PubMed regarding focused browsing and informativeness of the browsing context.
artificial intelligence in medicine in europe | 2009
Vít Nováček; Tudor Groza; Siegfried Handschuh
Prominent biomedical literature search tools like ScienceDirect, PubMed Central or MEDLINE allow for efficient retrieval of resources based on key words. Due to vast amounts of data available in life sciences, key word search is not always sufficient, though. One would often welcome more intelligent search for knowledge, i.e., for concepts and their mutual relations. This is, however, still a major challenge, since getting the necessary machine-readable knowledge manually is virtually impossible in large scale, while its automatic extraction is not particularly reliable. We have researched a novel framework actually enabling practical exploitation of automatically extracted knowledge, though. On the top of the framework, we implemented CORAAL, a prototype for knowledge-based biomedical literature search. This paper describes its essential principles, innovative capabilities and current results.
international semantic web conference | 2009
Vít Nováček; Stefan Decker
We present a lightweight framework for processing uncertain emergent knowledge that comes from multiple resources with varying relevance. The framework is essentially RDF-compatible, but allows also for direct representation of contextual features (e.g., provenance). We support soft integration and robust querying of the represented content based on well-founded notions of aggregation, similarity and ranking. A proof-of-concept implementation is presented and evaluated within large scale knowledge-based search in life science articles.
language data and knowledge | 2017
Sameh K. Mohamed; Emir Muñoz; Vít Nováček; Pierre-Yves Vandenbussche
Relation paths are sequences of relations with inverse that allow for complete exploration of knowledge graphs in a two-way unconstrained manner. They are powerful enough to encode complex relationships between entities and are crucial in several contexts, such as knowledge base verification, rule mining, and link prediction. However, fundamental forms of reasoning such as containment and equivalence of relation paths have hitherto been ignored. Intuitively, two relation paths are equivalent if they share the same extension, i.e., set of source and target entity pairs. In this paper, we study the problem of containment as a means to find equivalent relation paths and show that it is very expensive in practice to enumerate paths between entities. We characterize the complexity of containment and equivalence of relation paths and propose a domain-independent and unsupervised method to obtain approximate equivalences ranked by a tri-criteria ranking function. We evaluate our algorithm using test cases over real-world data and show that we are able to find semantically meaningful equivalences efficiently.
international conference on move to meaningful internet systems | 2010
Vít Nováček; Siegfried Handschuh
The paper presents CORAAL, a novel solution for life science publication search that exploits knowledge locked within unstructured publication texts (apart of possessing the traditional full text functionalities). In contrast to most related state of the art solutions, CORAAL integrally addresses acquisition (i.e., extraction), processing (i.e., integration and extension) and dissemination (i.e., convenient exposure) of the publication knowledge. After detailing the motivations of our research, we outline the representation and processing framework that allows CORAAL to tackle the rather noisy and sparse automatically extracted knowledge. The architecture and core features of the CORAAL prototype itself are described then. Most importantly, we report on an extensive evaluation of the CORAAL tool performed with an assistance of actual sample users. The evaluation illustrates the practical benefits brought by our solution already in the early research prototype stage.
Semantic e-Science | 2010
Vít Nováček; Tudor Groza; Siegfried Handschuh
Despite being a flourishing field, the contemporary online scientific publishing properly exploits mostly raw publication data (rather meaningless bags of words) and shallow meta-data (authors, keywords, citations, etc.) regarding search. The much needed economical mass exploitation of the knowledge implicitly contained in publication texts is still largely an uncharted territory. The way towards filling this gap leads through (1) extraction of asserted publication meta-data together with the knowledge implicitly present in the respective text; (2) integration, refinement and extension of the emergent content; (3) release of the processed content via a meaning-sensitive search&browse interface catering for services complementary to the current full-text search. This chapter addresses the scientific and engineering challenges related to the suggested approach and introduces a particular solution that tackles them – CORAAL, a prototype for knowledge-based life science publication search.
international conference on enterprise information systems | 2007
Vít Nováček
We present new results of our research on integration of ontologies created automatically by means of Human Language Technologies. The research is related to OLE (Ontology LEarning) – a project aimed at bottom-up generation and merging of ontologies. It utilises a proposal of expressive uncertain knowledge representation framework called ANUIC (Adaptive Net of Universally Interrelated Concepts). We discuss our recent achievements in taxonomy acquisition and show how even simple application of the principles of ANUIC can improve the results of initial knowledge extraction methods. We also suggest an algorithm for large-scale automatic annotation of natural language documents, applying uncertain knowledge bases created using our approach.