Dimitrios P. Lyras | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dimitrios P. Lyras is active.

Explore More

Publication

Featured researches published by Dimitrios P. Lyras.

International Journal on Artificial Intelligence Tools | 2008

APPLYING SIMILARITY MEASURES FOR AUTOMATIC LEMMATIZATION: A CASE STUDY FOR MODERN GREEK AND ENGLISH

Dimitrios P. Lyras; Kyriakos N. Sgarbas; Nikolaos Fakotakis

This paper addresses the problem of automatic induction of the normalized form (lemma) of regular and mildly irregular words with no direct supervision using language-independent algorithms. More specifically, two string distance metric models (i.e. the Levenshtein Edit Distance algorithm and the Dice Coefficient similarity measure) were employed in order to deal with the automatic word lemmatization task by combining two alignment models based on the string similarity and the most frequent inflectional suffixes. The performance of the proposed model has been evaluated quantitatively and qualitatively. Experiments were performed for the Modern Greek and English languages and the results, which are set within the state-of-the-art, have showed that the proposed model is robust (for a variety of languages) and computationally efficient. The proposed model may be useful as a pre-processing tool to various language engineering and text mining applications such as spell-checkers, electronic dictionaries, morphological analyzers etc.

international conference on tools with artificial intelligence | 2007

Using the Levenshtein Edit Distance for Automatic Lemmatization: A Case Study for Modern Greek and English

Dimitrios P. Lyras; Kyriakos N. Sgarbas; Nikolaos Fakotakis

In the present work we have implemented the Edit Distance (also known as Levenshtein Distance) on a dictionary-based algorithm in order to achieve the automatic induction of the normalized form (lemma) of regular and mildly irregular words with no direct supervision. The algorithm combines two alignment models based on the string similarity and the most frequent inflexional suffixes. In our experiments, we have also examined the language-independency (i.e. independency of the specific grammar and inflexional rules of the language) of the presented algorithm by evaluating its performance on the Modern Greek and English languages. The results were very promising as we achieved more than 95 % of accuracy for the Greek language and more than 96 % for the English language. This algorithm may be useful to various text mining and linguistic applications such as spell-checkers, electronic dictionaries, morphological analyzers, search engines etc.

International Journal on Artificial Intelligence Tools | 2012

BAYESIAN RETRIEVAL USING A SIMILARITY-BASED LEMMATIZER

Manolis Maragoudakis; Dimitrios P. Lyras; Kyriakos N. Sgarbas

The present paper describes a Bayesian network approach to Information Retrieval (IR) from Web documents. The network structure provides an intuitive representation of uncertainty relationships and the embedded conditional probability table is used by inference algorithms in an attempt to identify documents that are relevant to the users needs, expressed in the form of Boolean queries. Our research has been directed in constructing a probabilistic IR framework that focus on assisting users to perform Ad-hoc retrieval of documents from the various domains such as economics, news, sports, etc. Furthermore, users can integrate feedback regarding the relevance of the retrieved documents in an attempt to improve performance on upcoming requests. Towards these goals, we have expanded the traditional Bayesian network IR system and tested it on several Greek web corpora on different application domains. We have developed two different approaches with regards to the structure: a simple one, where the structure is manually provided, and an automated one, where data mining is used in order to extract the networks structure. Results have depicted competitive performance against successful IR models of different theoretical backgrounds, such as the vector space utilizing tf-idf and the probabilistic model of BM25 in terms of precision-recall curves. In order to further improve the performance of the IR system, we have implemented a novel similarity-based lemmatization framework, reducing thus the ambiguity posed by the plethora of morphological variations of the languages in question. The employed lemmatization framework comprises of 3 core components (i.e. the word segregation, the data cleansing and the lemmatization modules) and is language-independent (i.e. can be applied to other languages with morphological peculiarities and thus improve Ad-hoc retrieval) since it achieves the mapping of an input word to its normalized form by employing two state-of-the-art language independent distance metric models, meaning the Levenshtein Edit distance and the Dice coefficient similarity measure, combined with a language model describing the most frequent inflectional suffixes of the examined language. Experimental results support our claim on the significance of this incorporation to Greek texts web retrieval as results improve by a factor of 4% to 11%.

hellenic conference on artificial intelligence | 2010

A stochastic greek-to-greeklish transcriber modeled by real user data

Dimitrios P. Lyras; Ilias Kotinas; Kyriakos N. Sgarbas; Nikos Fakotakis

Greek to Greeklish transcription does not appear to be a difficult task since it can be achieved by directly mapping each Greek character to a corresponding symbol of the Latin alphabet Nevertheless, such transliteration systems do not simulate efficiently the human way of Greeklish writing, since Greeklish users do not follow a standardized way of transliteration In this paper a stochastic Greek to Greeklish transcriber modeled by real user data is presented The proposed transcriber employs knowledge derived from the analytical processing of 9,288 Greek-Greeklish word pairs annotated by real users and achieves the automatic transcription of any Greek word into a valid Greeklish form in a stochastic way (i.e each Greek symbolset corresponds to a variety of Latin symbols according to the processed data), simulating thus human-like behavior This transcriber could be used as a real-time Greek-to-Greeklish transcriber and/or as a data generator engine used for the performance evaluation of Greeklish-to-Greek transliteration systems.

text speech and dialogue | 2007

Detection of dialogue acts using perplexity-based word clustering

Iosif Mporas; Dimitrios P. Lyras; Kyriakos N. Sgarbas; Nikos Fakotakis

In the present work we used a word clustering algorithm based on the perplexity criterion, in a Dialogue Act detection framework in order to model the structure of the speech of a user at a dialogue system. Specifically, we constructed an n-gram based model for each target Dialogue Act, computed over the word classes. Then we evaluated the performance of our dialogue system on ten different types of dialogue acts, using an annotated database which contains 1,403,985 unique words. The results were very promising since we achieved about 70% of accuracy using trigram based models.

bioinformatics and bioengineering | 2010