Martin Seleng
Slovak Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin Seleng.
international conference on computational science | 2008
Michal Laclavik; Martin Seleng; Ladislav Hluchý
Automated annotation of the web documents is a key challenge of the Semantic Web effort. Web documents are structured but their structure is understandable only for a human that is the major problem of the Semantic Web. Semantic Web can be exploited only if metadata understood by a computer reach critical mass. Semantic metadata can be created manually, using automated annotation or tagging tools. Automated semantic annotation tools with the best results are built on different machine learning algorithms requiring training sets. Another approach is to use pattern based semantic annotation solutions built on NLP, information retrieval or information extraction methods. Most of developed methods are tested and evaluated on hundreds of documents which cannot prove its real usage on large scale data such as web or email communication in enterprise or community environment. In this paper we present how a pattern based annotation tool can benefit from Googles MapReduce architecture to process large amount of text data.
adaptive agents and multi-agents systems | 2011
Michal Laclavik; Štefan Dlugolinský; Martin Seleng; Marcel Kvassay; Bernhard Schneider; Holger Bracker; Micha; Wrzeszcz; Jacek Kitowski; Ladislav Hluchý
In this paper we provide a brief survey of agent based simulation (ABS) platforms and evaluate two of them --- NetLogo and MASON --- by implementing an exemplary scenario in the context of human behavior modeling. We define twelve evaluation points, which we discuss for both of the evaluated systems. The purpose of our evaluation is to identify the best ABS platform for parametric studies (data farming) of human behavior, but we intend to use the system also for training purposes. That is why we also discuss one of serious game platform representatives --- VBS2.
international world wide web conferences | 2012
Michal Laclavik; Štefan Dlugolinský; Martin Seleng; Marek Ciglan; Ladislav Hluchý
In this paper, we present an approach for representing an email archive in the form of a network, capturing the communication among users and relations among the entities extracted from the textual part of the email messages. We showcase the method on the Enron email corpus, from which we extract various entities and a social network. The extracted named entities (NE), such as people, email addresses and telephone numbers, are organized in a graph along with the emails in which they were found. The edges in the graph indicate relations between NEs and represent a co-occurrence in the same email part, paragraph, sentence or a composite NE. We study mathematical properties of the graphs so created and describe our hands-on experience with the processing of such structures. Enron Graph corpus contains a few million nodes and is large enough for experimenting with various graph-querying techniques, e.g. graph traversal or spread of activation. Due to its size, the exploitation of traditional graph processing libraries might be problematic as they keep the whole structure in the memory. We describe our experience with the management of such data and with the relation discovery among the extracted entities. The described experience might be valuable for practitioners and highlights several research challenges.
international conference on enterprise information systems | 2010
Michal Laclavik; Martin Seleng; Stefan Dlugolinsky; Emil Gatial; Ladislav Hluchý
Even in Web 2.0 era, email is still the most popular application on the internet. Beset by many problems, such as spam or information overload, yet it yields significant benefits especially to enterprise users when communicating, collaborating or solving business tasks. The email standards, content, services and clients improved a lot, but the integration with the environment and enterprise context remained pretty much the same. We believe that this can be improved by introducing our work in progress – the Acoma context-sensitive recommendation tool. Acoma processes emails on the server or desktop side and attaches the relevant information from various sources to the email messages. It can be used with any email client or mobile device since it is hooked up to email as a proxy to email protocols. In order to provide useful recommendations, emails need to be processed and business objects need to be identified. Thus the paper also discusses the object identification using the information extraction techniques based on the Ontea tool, as well as its customization in the enterprise context.
international acm sigir conference on research and development in information retrieval | 2014
Michal Laclavik; Marek Ciglan; Alex Dorman; Stefan Dlugolinsky; Sam Steingold; Martin Seleng
ERD 2014 was a research challenge focused on the task of recognition and disambiguation of knowledge base entities in short and long texts. This write-up describes Magnetic-IISAS teams approach to the entity recognition in search queries with which we have participated in ERD 2014 challenge. Our approach combines techniques of information retrieval, gazetteer based annotation and entity link graph analysis to identify and disambiguate candidate entities. We built a search index with multiple structured fields extracted from Wikipedia, Freebase and DBPedia. When processing a query, we first retrieve top matching entities from the index. For all retrieved entities, we gather plausible verbalizations, surface forms, that retrieved entities may be referred to with. We match gathered entity surface forms against the original query to confirm the entity relevance to the query. Finally, we exploit Wikipedia link graph to asses the similarity of candidate entities for the purpose of disambiguation and further candidate filtering. In the paper we discuss successful as well as unsuccessful attempts to improve the quality of system results that we have tried during the course of the challenge.
parallel processing and applied mathematics | 2007
Michal Laclavik; Marek Ciglan; Martin Seleng; Ladislav Hluchý
Nowadays, capturing the knowledge in ontological structures is one of the primary focuses of the semantic web research. To exploit the knowledge from the vast quantity of existing unstructured texts available in natural languages in ontologies, tools for automatic semantic annotation (ASA) are heavily needed. In this paper, we present the ASA tool Ontea and empowering of the method by Grid technology for performance increase, which help us in delivering formalized semantic data in shorter time. We have adjusted Ontea annotation algorithm to be executable in the distributed grid environment. We also give performance evaluation of Ontea algorithm and experimental results from cluster and grid implementation.
world congress on information and communication technologies | 2013
Stefan Dlugolinsky; Giang T. Nguyen; Michal Laclavik; Martin Seleng
A large amount of unstructured data is produced daily through numerous media around us. Despite that computer systems are becoming more powerful, even the commodity hardware, processing of such data and gaining useful information in time efficient manner remains a problem. One of the domains in unstructured data processing is Natural Language Processing (NLP). NLP covers areas like information extraction, machine translation, word sense disambiguation, automated question answering, etc. All of these areas require fast and precise Named Entity Recognition (NER), which is not a trivial task because of the processed data size and heterogeneity. Our effort in this research area is to provide fast tokenization and precise NER with linear complexity. In this paper, we present a character gazetteer with linear tokenization as well as NER and compare its two tree data structure representations; i.e. multiway tree implemented by hash maps and first child-next sibling binary tree. Our measurements shows that one outperforms the other in processing time, while the other outperforms it in memory consumption efficiency.
international symposium on applied machine intelligence and informatics | 2011
Marcel Kvassay; Ladislav Hluchy; Bartosz Kryza; Jacek Kitowski; Martin Seleng; Stefan Dlugolinsky; Michal Laclavik
This article proposes a combination of object-oriented and ontology-based approaches for real-time interworking of human behaviour models in the context of agent-based simulation systems. We present a conceptual design of a semantic intermediation framework, including the split of the responsibilities between the intermediation ontology and software code. We illustrate our design in the context of the EDA project A-0938-RT-GC EUSAS, where it will be used for integrating various behaviour models and for virtual trainings running in real time. We also report the results of preliminary performance tests related to ontological queries, and conclude with our future plans concerning the intermediation infrastructure.
fuzzy systems and knowledge discovery | 2010
Ladislav Hluchy; Martin Seleng; Ondrej Habala; Peter Krammer
We present data mining methods which are used in a hydro-meteorological scenarios within the FP7 project ADMIRE1. The scenarios uses data mining techniques instead of more common physical models in order to predict phenomena which are not being ordinarily solved in Slovakia - water temperature, discharge wave propagation downstream of a major water reservoir and short-term rainfall prediction by analyzing radar imagery. These scenarios are one of a set of use cases, which form the Flood Forecasting Simulation Cascade - a pilot application of ADMIRE project. We describe the variables used in data mining training of these scenarios and also an introduction to the data integration methodology approach we have devised.
international conference on intelligent engineering systems | 2014
Martin Seleng; Michal Laclavik; Štefan Dlugolinský; Marek Ciglan; Martin Tomašek; Ladislav Hluchý
In this paper we want to describe a solution for the enterprise search and interoperability by using the lightweight semantic approach, which is suitable for small and micro enterprises. Our approach is based on discovering and reusing an existing knowledge hidden in enterprise infrastructure ecosystem, like emails and content management systems (documents). Using the lightweight semantic approach our solution is able to support lightweight semantic search and recommendation in order to fulfill interoperability tasks.