Michal Konkol
University of West Bohemia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michal Konkol.
text speech and dialogue | 2013
Michal Konkol; Miloslav Konopík
In this paper, we present our effort to consolidate and push further the named entity recognition (NER) research for the Czech language. The research in Czech is based upon a non-standard basis. Some systems are constructed to provide hierarchical outputs whereas the rests give flat entities. Direct comparison among these system is therefore impossible. Our first goal is to tackle this issue. We build our own NER system based upon conditional random fields (CRF) model. It is constructed to output either flat or hierarchical named entities thus enabling an evaluation with all the known systems for Czech language. We show a 3.5 – 11% absolute performance increase when compared to previously published results. As a last step we put our system in the context of the research for other languages. We show results for English, Spanish and Dutch corpora. We can conclude that our system provides solid results when compared to the foreign state of the art.
Expert Systems With Applications | 2015
Michal Konkol; Tomáš Brychcín; Miloslav Konopík
Language independent Named Entity Recognition system.Novel features based on latent semantics.Experiments on multiple languages - English, Spanish, Dutch, Czech.State-of-the-art results. In this paper, we propose new features for Named Entity Recognition (NER) based on latent semantics. Furthermore, we explore the effect of unsupervised morphological information on these methods and on the NER system in general. The newly created NER system is fully language-independent thanks to the unsupervised nature of the proposed features. We evaluate the system on English, Spanish, Dutch and Czech corpora and study the difference between weakly and highly inflectional languages. Our system achieves the same or even better results than state-of-the-art language dependent systems. The proposed features proved to be very useful and are the main reason of our promising results.
international conference on artificial intelligence and soft computing | 2014
Michal Konkol
Brainy is a newly created cross-platform machine learning library written in Java. It defines interfaces for common types of machine learning tasks and implementations of the most popular algorithms. Brainy utilizes a complex mathematical infrastructure which is also part of the library. The main difference compared to other ML libraries is the sophisticated system for feature definition and management. The design of the library is focused on efficiency, reliability, extensibility and simple usage. Brainy has been extensively used for research as well as commercial projects for major companies in Czech Republic and USA. Brainy is released under the GPL license and freely available from the project web page.
international conference on computational linguistics | 2014
Tomáš Brychcín; Michal Konkol; Josef Steinberger
This paper describes our system participating in the aspect-based sentiment analysis task of Semeval 2014. The goal was to identify the aspects of given target entities and the sentiment expressed towards each aspect. We firstly introduce a system based on supervised machine learning, which is strictly constrained and uses the training data as the only source of information. This system is then extended by unsupervised methods for latent semantics discovery (LDA and semantic spaces) as well as the approach based on sentiment vocabularies. The evaluation was done on two domains, restaurants and laptops. We show that our approach leads to very promising results.
text speech and dialogue | 2011
Michal Konkol; Miloslav Konopík
Named Entity Recognition (NER) is an important preprocessing tool for many Natural Language Processing tasks like Information Retrieval, Question Answering or Machine Translation. This paper is focused on NER for Czech language. The proposed NER is based on knowledge and experiences acquired on other languages and adapted for Czech. Our recognizer outperforms the previously introduced recognizers for Czech. The article is also focused on the use of semantic spaces for NER. Although no significant improvement was yet achieved in this way, we believe that the research is worth of sharing.
meeting of the association for computational linguistics | 2014
Josef Steinberger; Tomáš Brychcín; Michal Konkol
This paper presents a pioneering research on aspect-level sentiment analysis in Czech. The main contribution of the paper is the newly created Czech aspectlevel sentiment corpus, based on data from restaurant reviews. We annotated the corpus with two variants of aspect-level sentiment ‐ aspect terms and aspect categories. The corpus consists of 1,244 sentences and 1,824 annotated aspects and is freely available to the research community. Furthermore, we propose a baseline system based on supervised machine learning. Our system detects the aspect terms with Fmeasure 68.65% and their polarities with accuracy 66.27%. The categories are recognized with F-measure 74.02% and their polarities with accuracy 66.61%.
text speech and dialogue | 2014
Michal Konkol; Miloslav Konopík
In this paper, we study the effects of various lemmatization and stemming approaches on the named entity recognition (NER) task for Czech, a highly inflectional language. Lemmatizers are seen as a necessary component for Czech NER systems and they were used in all published papers about Czech NER so far. Thus, it has an utmost importance to explore their benefits, limits and differences between simple and complex methods. Our experiments are evaluated on the standard Czech Named Entity Corpus 1.1 as well as the newly created 2.0 version.
text speech and dialogue | 2015
Michal Konkol; Miloslav Konopík
In this paper we study the effects of various segment representations in the named entity recognition NER task. The segment representation is responsible for mapping multi-word entities into classes used in the chosen machine learning approach. Usually, the choice of a segment representation in the NER system is arbitrary without proper tests. Some authors presented comparisons of different segment representations such as BIO, BIEO, BILOU and usually compared only two segment representations. Our goal is to show, that the segment representation problem is more complex and that the proper selection of the best approach is not straightforward. We provide experiments with a wide set of segment representations. All the representations are tested using two popular machine learning algorithms: Conditional Random Fields and Maximum Entropy. Furthermore, the tests are done on four languages, namely English, Spanish, Dutch and Czech.
north american chapter of the association for computational linguistics | 2016
Tomás Hercig; Tomáš Brychcín; Lukáš Svoboda; Michal Konkol
This paper describes our system used in the Aspect Based Sentiment Analysis (ABSA) task of SemEval 2016. Our system uses Maximum Entropy classifier for the aspect category detection and for the sentiment polarity task. Conditional Random Fields (CRF) are used for opinion target extraction. We achieve state-of-the-art results in 9 experiments among the constrained systems and in 2 experiments among the unconstrained systems.
international conference on artificial intelligence and soft computing | 2015
Michal Konkol
In this paper, we describe fuzzy agglomerative clustering, a brand new fuzzy clustering algorithm. The basic idea of the proposed algorithm is based on the well-known hierarchical clustering methods. To achieve the soft or fuzzy output of the hierarchical clustering, we combine the single-linkage and complete-linkage strategy together with a fuzzy distance. As the algorithm was created recently, we cover only some basic experiments on synthetic data to show some properties of the algorithm. The reference implementation is freely available.