Stefan Dlugolinsky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Dlugolinsky is active.

Explore More

Publication

Featured researches published by Stefan Dlugolinsky.

international conference on intelligent engineering systems | 2013

Evaluation of named entity recognition tools on microposts

Stefan Dlugolinsky; Marek Ciglan; Michal Laclavik

In this paper we evaluate eight well-known Information Extraction (IE) tools on a task of Named Entity Recognition (NER) in microposts. We have chosen six NLP tools and two Wikipedia concept extractors for the evaluation. Our intent was to see how these tools would perform on relatively short texts of microposts. Evaluation dataset has been adopted from the MSM 2013 IE Challenge. This dataset contained manually annotated microposts with classification restricted to four entity types: PER, LOC, ORG and MISC.

international conference on enterprise information systems | 2010

Tools for Email Based Recommendation in Enterprise

Michal Laclavik; Martin Seleng; Stefan Dlugolinsky; Emil Gatial; Ladislav Hluchý

Even in Web 2.0 era, email is still the most popular application on the internet. Beset by many problems, such as spam or information overload, yet it yields significant benefits especially to enterprise users when communicating, collaborating or solving business tasks. The email standards, content, services and clients improved a lot, but the integration with the environment and enterprise context remained pretty much the same. We believe that this can be improved by introducing our work in progress – the Acoma context-sensitive recommendation tool. Acoma processes emails on the server or desktop side and attaches the relevant information from various sources to the email messages. It can be used with any email client or mobile device since it is hooked up to email as a proxy to email protocols. In order to provide useful recommendations, emails need to be processed and business objects need to be identified. Thus the paper also discusses the object identification using the information extraction techniques based on the Ontea tool, as well as its customization in the enterprise context.

international acm sigir conference on research and development in information retrieval | 2014

A search based approach to entity recognition: magnetic and IISAS team at ERD challenge

Michal Laclavik; Marek Ciglan; Alex Dorman; Stefan Dlugolinsky; Sam Steingold; Martin Seleng

ERD 2014 was a research challenge focused on the task of recognition and disambiguation of knowledge base entities in short and long texts. This write-up describes Magnetic-IISAS teams approach to the entity recognition in search queries with which we have participated in ERD 2014 challenge. Our approach combines techniques of information retrieval, gazetteer based annotation and entity link graph analysis to identify and disambiguate candidate entities. We built a search index with multiple structured fields extracted from Wikipedia, Freebase and DBPedia. When processing a query, we first retrieve top matching entities from the index. For all retrieved entities, we gather plausible verbalizations, surface forms, that retrieved entities may be referred to with. We match gathered entity surface forms against the original query to confirm the entity relevance to the query. Finally, we exploit Wikipedia link graph to asses the similarity of candidate entities for the purpose of disambiguation and further candidate filtering. In the paper we discuss successful as well as unsuccessful attempts to improve the quality of system results that we have tried during the course of the challenge.

database and expert systems applications | 2010

Towards a Search System for the Web Exploiting Spatial Data of a Web Document

Stefan Dlugolinsky; Michal Laclavik; Ladislav Hluchy

In this paper, we describe our work in progress in the scope of information retrieval exploiting the spatial data extracted from web documents. We discuss problems of a search for web documents by geographic distance, where the geographic distance of a document is determined automatically using information extraction methods. We present here our approach of building a distributed search system, which deals with several problems of this area. Search by geographic distance is useful, for example if we are looking for the nearest restaurant, hotel or any other business near our location (reference point). Almost every company today presents its business on the Internet sharing business information along with contact information. There can be miscellaneous geographic information extracted from the contact information (but no only from it) and used to compute geographic distance of a document. Under a documents geographic distance, we understand the distance between a search reference point and a geographic location related to the document. In our approach, we chose postal addresses and GPS coordinates for spatial data extraction. The reference point can be dynamically changed and one document can be related to more than one geographic location. Geographic locations are automatically discovered in documents textual content. Document is then indexed by all its known geographic locations, so later when searching, the document can be found near different geographic locations to which it is related. Domain of the search is automatically built by crawling through linked web documents.

world congress on information and communication technologies | 2013

Character gazetteer for Named Entity Recognition with linear matching complexity

Stefan Dlugolinsky; Giang T. Nguyen; Michal Laclavik; Martin Seleng

A large amount of unstructured data is produced daily through numerous media around us. Despite that computer systems are becoming more powerful, even the commodity hardware, processing of such data and gaining useful information in time efficient manner remains a problem. One of the domains in unstructured data processing is Natural Language Processing (NLP). NLP covers areas like information extraction, machine translation, word sense disambiguation, automated question answering, etc. All of these areas require fast and precise Named Entity Recognition (NER), which is not a trivial task because of the processed data size and heterogeneity. Our effort in this research area is to provide fast tokenization and precise NER with linear complexity. In this paper, we present a character gazetteer with linear tokenization as well as NER and compare its two tree data structure representations; i.e. multiway tree implemented by hash maps and first child-next sibling binary tree. Our measurements shows that one outperforms the other in processing time, while the other outperforms it in memory consumption efficiency.

international symposium on applied machine intelligence and informatics | 2011

Combining object-oriented and ontology-based approaches in human behaviour modelling

Marcel Kvassay; Ladislav Hluchy; Bartosz Kryza; Jacek Kitowski; Martin Seleng; Stefan Dlugolinsky; Michal Laclavik

This article proposes a combination of object-oriented and ontology-based approaches for real-time interworking of human behaviour models in the context of agent-based simulation systems. We present a conceptual design of a semantic intermediation framework, including the split of the responsibilities between the intermediation ontology and software code. We illustrate our design in the context of the EDA project A-0938-RT-GC EUSAS, where it will be used for integrating various behaviour models and for virtual trainings running in real time. We also report the results of preliminary performance tests related to ontological queries, and conclude with our future plans concerning the intermediation infrastructure.

Computer Science | 2012

DISTRIBUTED WEB-SCALE INFRASTRUCTURE FOR CRAWLING, INDEXING AND SEARCH WITH SEMANTIC SUPPORT

Stefan Dlugolinsky; Martin Seleng; Michal Laclavik; Ladislav Hluchy

In this paper, we describe our work in progress in the scope of web-scale information extraction and information retrieval utilizing distributed computing. We present a distributed architecture built on top of the MapReduce paradigm for information retrieval, information processing and intelligent search supported by spatial capabilities. Proposed architecture is focused on crawling documents in several different formats, information extraction, lightweight semantic annotation of the extracted information, indexing of extracted information and finally on indexing of documents based on the geo-spatial information found in a document. We demonstrate the architecture on two use cases, where the first is search in job offers retrieved from the LinkedIn portal and the second is search in BBC news feeds and discuss several problems we had to face during the implementation. We also discuss spatial search applications for both cases because both LinkedIn job offer pages and BBC news feeds contain a lot of spatial information to extract and process.

international world wide web conferences | 2015

Search Query Categorization at Scale

Michal Laclavik; Marek Ciglan; Sam Steingold; Martin Seleng; Alex Dorman; Stefan Dlugolinsky

State of the art query categorization methods usually exploit web search services to retrieve the best matching web documents and map them to a given taxonomy of categories. This is effective but impractical when one does not own a web corpus and has to use a 3rd party web search engine API. The problem lies in performance and in financial costs. In this paper, we present a novel, fast and scalable approach to categorization of search queries based on a limited intermediate corpus: we use Wikipedia as the knowledge base. The presented solution relies on two steps: first a query is mapped to the relevant Wikipedia pages; second, the retrieved documents are categorized into a given taxonomy. We approach the first challenge as an entity search problem and present a new document categorization approach for the second step. On a standard data set, our approach achieves results comparable to the state-of-the-art approaches while maintaining high performance and scalability.

international conference on intelligent engineering systems | 2015

Lightweight Semantic approach for enterprise interoperability issues

Martin Seleng; Stefan Dlugolinsky; Martin Tomášek; Karol Furdik; Ladislav Hluchy

In this paper we present an ongoing FP7 project VENIS, where we are focusing on interoperability problems between Large Enterprise (LE) and Small/Medium Enterprise (SME) as well as Micro Enterprise (ME). We are proposing a solution for the enterprise search and enterprise interoperability, which follows the lightweight semantic approach. This approach is more suitable for SMEs and MEs, which are not always able to adopt heavy and more expensive solutions used by LEs. We are trying to discover hidden knowledge stored in every of the involved enterprise ecosystem, mainly in emails and CMS (documents). Our Lightweight Semantics approach (LWS) is able to provide a semantic search and recommendation to support users inside business workflows to fulfill interoperability tasks.

#MSM | 2013