Yannis Plegas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yannis Plegas is active.

Explore More

Publication

Featured researches published by Yannis Plegas.

acm symposium on applied computing | 2013

Improved text annotation with Wikipedia entities

Christos Makris; Yannis Plegas; Evangelos Theodoridis

Text annotation is the procedure of initially identifying, in a segment of text, a set of dominant in meaning words and later on attaching to them extra information (usually drawn from a concept ontology, implemented as a catalog) that expresses their conceptual content in the current context. Attaching additional semantic information and structure helps to represent, in a machine interpretable way, the topic of the text and is a fundamental preprocessing step to many Information Retrieval tasks like indexing, clustering, classification, text summarization and cross-referencing content on web pages, posts, tweets etc. In this paper, we deal with automatic annotation of text documents with entities of Wikipedia, the largest online knowledge base; a process that is commonly known as Wikification. Moving similarly to previous approaches the cross-reference of words in the text to Wikipedia articles is based on local compatibility between the text around the term and textual information embedded in the article. The main contribution of this paper is a set of disambiguation techniques that enhance previously published approaches by employing both the WordNet lexical database and the Wikipedia articles PageRank scores in the disambiguation process. The experimental evaluation performed depicts that the exploitation of these additional semantic information sources leads to more accurate Text Annotation.

Journal of the Association for Information Science and Technology | 2012

Web query disambiguation using PageRank

Christos Makris; Yannis Plegas; Sofia Stamou

In this article, we propose new word sense disambiguation strategies for resolving the senses of polysemous query terms issued to Web search engines, and we explore the application of those strategies when used in a query expansion framework. The novelty of our approach lies in the exploitation of the Web page PageRank values as indicators of the significance the different senses of a term carry when employed in search queries. We also aim at scalable query sense resolution techniques that can be applied without loss of efficiency to large data sets such as those on the Web. Our experimental findings validate that the proposed techniques perform more accurately than do the traditional disambiguation strategies and improve the quality of the search results, when involved in query expansion.

acm symposium on applied computing | 2013

Reducing information redundancy in search results

Yannis Plegas; Sofia Stamou

It is well-known that the web contains many duplicate and near-duplicate documents. Despite the efforts that have been put towards equipping search engines with duplicate detection algorithms, still there are cases where the documents retrieved in response to web queries contain redundant information. In this paper, we are concerned with effectively identifying and reducing redundant information in search results. In particular, we describe how we automatically detect content that is lexically and/or semantically duplicated across search results and we introduce a novel algorithm that upon the detection of significant (i.e., above a given threshold) content duplication, it filters out redundant information. Information filtering takes place in two-steps depending on whether we are dealing with documents of (nearly) identical lexical content or with documents of lexically distinct but semantically equivalent content. In the first case, our algorithm retains in the result list the document that is the most relevant to the query intention and removes duplicates. In the second case, our algorithm merges into a single text, which we call SuperText, the documents of redundant information in a way that every document contributes diverse semantic content to the generated SuperText. Additionally, the algorithm re-ranks the remaining documents based on their contextual relevance to the query intention. The experimental evaluation of our approach demonstrates that it is very effective in identifying lexical and semantic information redundancy across search results. In addition, we have found that our algorithm manages to filter out successfully content duplication from the results list and the SuperTexts it generates for reducing information redundancy are syntactically and semantically coherent texts.

acm symposium on applied computing | 2008

An integrated web system to facilitate personalized web searching algorithms

Christos Makris; Yannis Panagis; Yannis Plegas; Evangelos Sakkopoulos

Generic web searching often turns out impersonal and frustrating due to lack of adaptivity to user preferences. These problems can be alleviated in the presence of a solution to assist personalization in an effective manner. Such a tool is presented within the context of this paper that enables personalization on the clients browser and is supported by a Web Service based backend system that implement a number of different personalization approaches as an option. Our aim is to provide a generic platform based on web technologies a) for end-user personalization and b) for assistance in the research & development evaluation of existing or novel personalization techniques. The solution is further underpinned with novel personalization techniques. The latter have emerged as fine-grained and improved alternatives to provably efficient personalization methods previously presented in [10]. The solution altogether has been experimentally evaluated and proved effective.

database and expert systems applications | 2014

Reducing Redundant Information in Search Results Employing Approximation Algorithms

Christos Makris; Yannis Plegas; Yannis C. Stamatiou; Elias C. Stavropoulos; Athanasios K. Tsakalidis

It is widely accepted that there are many Web documents that contain identical or near-identical information. Modern search engines have developed duplicate detection algorithms to eliminate this problem in the search results, but difficulties still remain, mainly because the structure and the content of the results could not be changed. In this work we propose an effective methodology for removing redundant information from search results. Using previous methodologies, we extract from the search results a set of composite documents called SuperTexts and then, by applying novel approximation algorithms, we select the SuperTexts that better reduce the redundant information. The final results are next ranked according to their relevance to the initial query. We give some complexity results and experimentally evaluate the proposed algorithms.

international conference on web information systems and technologies | 2013

Improving Search Engines’ Document Ranking Employing Semantics and an Inference Network

Christos Makris; Yannis Plegas; Giannis Tzimas; Emmanouil Viennas

The users search mainly diverse information from several topics and their needs are difficult to be satisfied from the techniques currently employed in commercial search engines and without intervention from the user. In this paper, a novel framework is presented for performing re-ranking in the results of a search engine based on feedback from the user. The proposed scheme combines smoothly techniques from the area of Inference Networks and data from semantic knowledge bases. The novelty lies in the construction of a probabilistic network for each query which takes as input the belief of the user to each result (initially, all are equivalent) and produces as output a new ranking for the search results. We have constructed an implemented prototype that supports different Web search engines and it can be extended to support any search engine. Finally extensive experiments were performed using the proposed methods depicting the improvement of the ranking of the search engines results.

International Journal on Artificial Intelligence Tools | 2016

Ranking Web Search Results Exploiting Wikipedia

Andreas Kanavos; Christos Makris; Yannis Plegas; Evangelos Theodoridis

It is widely known that search engines are the dominating tools for finding information on the web. In most of the cases, these engines return web page references on a global ranking taking in mind either the importance of the web site or the relevance of the web pages to the identified topic. In this paper, we focus on the problem of determining distinct thematic groups on web search engine results that other existing engines provide. We additionally address the problem of dynamically adapting their ranking according to user selections, incorporating user judgments as implicitly registered in their selection of relevant documents. Our system exploits a state of the art semantic web data mining technique that identifies semantic entities of Wikipedia for grouping the result set in different topic groups, according to the various meanings of the provided query. Moreover, we propose a novel probabilistic Network scheme that employs the aforementioned topic identification method, in order to modify ranking of results as the users select documents. We evaluated in practice our implemented prototype with extensive experiments with the ClueWeb09 dataset using the TREC’s 2009, 2010, 2011 and 2012 Web Tracks’ where we observed improved retrieval performance compared to current state of the art re-ranking methods.

artificial intelligence applications and innovations | 2014

Reciprocal Rank Using Web Page Popularity

Xenophon Evangelopoulos; Christos Makris; Yannis Plegas

In recent years, predicting user behavior has drawn much attention in the fields of information retrieval. To that extend, many models and even more evaluation metrics have been proposed, aiming at the accurate evaluation of the information retrieval process. Most of the proposed metrics, including the well-known nDCG and ERR, rely on the assumption that the probability (R) a user finds a document relevant, depends only on its relevance grade. In this paper, we employ the assumption that this probability is a function of a combination of two factors; its relevance grade and its popularity grade. Popularity, as we define it from daily page views, can be considered as users’ vote for a document, and by combining this factor in the probability R we can capture user behavior more accurately. We present a new evaluation metric called Reciprocal Rank using Webpage Popularity (RRP) which takes into account not only the document’s relevance judgment, but also its popularity, and as a result correlates better with click metrics than the other evaluation metrics do.

international conference on engineering applications of neural networks | 2013

Extracting Knowledge from Web Search Engine Using Wikipedia

Andreas Kanavos; Christos Makris; Yannis Plegas; Evangelos Theodoridis

Nowadays, search engines are definitely a dominating web tool for finding information on the web. However, web search engines usually return web page references in a global ranking making it difficult to the users to browse different topics captured in the result set. Recently, there are meta-search engine systems that discover knowledge in these web search results providing the user with the possibility to browse different topics contained in the result set. In this paper, we focus on the problem of determining different thematic groups on web search engine results that existing web search engines provide. We propose a novel system that exploits semantic entities of Wikipedia for grouping the result set in different topic groups, according to the various meanings of the provided query. The proposed method utilizes a number of semantic annotation techniques using Knowledge Bases, like WordNet and Wikipedia, in order to perceive the different senses of each query term. Finally, the method annotates the extracted topics using information derived from clusters which in following are presented to the end user.

Journal of Systems and Software | 2012