Athena Stassopoulou
University of Nicosia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Athena Stassopoulou.
Computer Communications | 2005
Marios D. Dikaiakos; Athena Stassopoulou; Loizos Papageorgiou
In this paper, we present a characterization study of search-engine crawlers. For the purposes of our work, we use Web-server access logs from five academic sites in three different countries. Based on these logs, we analyze the activity of different crawlers that belong to five search engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic and to general characterization studies. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. We propose a set of simple metrics that describe qualitative characteristics of crawler behavior, vis-a-vis a crawlers preference on resources of a particular format, its frequency of visits on a Web site, and the pervasiveness of its visits to a particular site. To the best of our knowledge, this is the first extensive and in depth characterization of search-engine crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of Web crawlers.
Computer Networks | 2009
Athena Stassopoulou; Marios D. Dikaiakos
In this paper, we introduce a probabilistic modeling approach for addressing the problem of Web robot detection from Web-server access logs. More specifically, we construct a Bayesian network that classifies automatically access log sessions as being crawler- or human-induced, by combining various pieces of evidence proven to characterize crawler and human behavior. Our approach uses an adaptive-threshold technique to extract Web sessions from access logs. Then, we apply machine learning techniques to determine the parameters of the probabilistic model. The resulting classification is based on the maximum posterior probability of all classes given the available evidence. We apply our method to real Web-server logs and obtain results that demonstrate the robustness and effectiveness of probabilistic reasoning for crawler detection.
Artificial Intelligence in Medicine | 2014
Kalia Orphanou; Athena Stassopoulou; Elpida Keravnou
OBJECTIVES Temporal abstraction (TA) of clinical data aims to abstract and interpret clinical data into meaningful higher-level interval concepts. Abstracted concepts are used for diagnostic, prediction and therapy planning purposes. On the other hand, temporal Bayesian networks (TBNs) are temporal extensions of the known probabilistic graphical models, Bayesian networks. TBNs can represent temporal relationships between events and their state changes, or the evolution of a process, through time. This paper offers a survey on techniques/methods from these two areas that were used independently in many clinical domains (e.g. diabetes, hepatitis, cancer) for various clinical tasks (e.g. diagnosis, prognosis). A main objective of this survey, in addition to presenting the key aspects of TA and TBNs, is to point out important benefits from a potential integration of TA and TBNs in medical domains and tasks. The motivation for integrating these two areas is their complementary function: TA provides clinicians with high level views of data while TBNs serve as a knowledge representation and reasoning tool under uncertainty, which is inherent in all clinical tasks. METHODS Key publications from these two areas of relevance to clinical systems, mainly circumscribed to the latest two decades, are reviewed and classified. TA techniques are compared on the basis of: (a) knowledge acquisition and representation for deriving TA concepts and (b) methodology for deriving basic and complex temporal abstractions. TBNs are compared on the basis of: (a) representation of time, (b) knowledge representation and acquisition, (c) inference methods and the computational demands of the network, and (d) their applications in medicine. RESULTS The survey performs an extensive comparative analysis to illustrate the separate merits and limitations of various TA and TBN techniques used in clinical systems with the purpose of anticipating potential gains through an integration of the two techniques, thus leading to a unified methodology for clinical systems. The surveyed contributions are evaluated using frameworks of respective key features. In addition, for the evaluation of TBN methods, a unifying clinical domain (diabetes) is used. CONCLUSION The main conclusion transpiring from this review is that techniques/methods from these two areas, that so far are being largely used independently of each other in clinical domains, could be effectively integrated in the context of medical decision-support systems. The anticipated key benefits of the perceived integration are: (a) during problem solving, the reasoning can be directed at different levels of temporal and/or conceptual abstractions since the nodes of the TBNs can be complex entities, temporally and structurally and (b) during model building, knowledge generated in the form of basic and/or complex abstractions, can be deployed in a TBN.
international conference on electronic commerce | 2003
Marios D. Dikaiakos; Athena Stassopoulou; Loizos Papageorgiou
In this paper, we present a study of crawler behavior based on Web-server access logs. To this end, we use logs from five different academic sites in three countries. Based on these logs, we analyze the activity of different crawlers that belong to five Search Engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic, and to general characterization studies based on Web-server access logs. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of WWW robots.
international conference on image and signal processing | 2006
Athena Stassopoulou; Marios D. Dikaiakos
In this paper, we introduce a probabilistic modeling approach for addressing the problem of Web robot detection from Web-server access logs. More specifically, we construct a Bayesian network that classifies automatically access-log sessions as being crawler- or human-induced, by combining various pieces of evidence proven to characterize crawler and human behavior. Our approach uses machine learning techniques to determine the parameters of the probabilistic model. We apply our method to real Web-server logs and obtain results that demonstrate the robustness and effectiveness of probabilistic reasoning for crawler detection
web age information management | 2007
Athena Stassopoulou; Marios D. Dikaiakos
In this paper we introduce a probabilistic-reasoning approach to detect Web robots (crawlers) from human visitors of Web sites. Our approach employs a Naive Bayes network to classify the HTTP sessions of a Web-server access log as crawler or human induced. The Bayesian network combines various pieces of evidence that were shown to distinguish between crawler and human HTTP traffic. The parameters of the Bayesian network are determined with machine learning techniques, and the resulting classification is based on the maximum posterior probability of all classes, given the available evidence. Our method is applied on real Web logs and provides a classification accuracy of 95%. The high accuracy with which our system detects crawler sessions, proves the robustness and effectiveness of the proposed methodology.
Journal of Network and Computer Applications | 2008
Eleni Georgiou; Marios D. Dikaiakos; Athena Stassopoulou
The main purpose of most spam e-mail messages distributed on Internet today is to entice recipients into visiting World Wide Web pages that are advertised through spam. In essence, e-mail spamming is a campaign that advertises URL addresses at a massive scale and at minimum cost for the advertisers and those advertised. Nevertheless, the characteristics of URL addresses and of web sites advertised through spam have not been studied extensively. In this paper, we investigate the properties of URL-dissemination through spam e-mail, and the characteristics of URL addresses disseminated through spam. We conclude that spammers advertise URL addresses non-repetitively and that spam-advertised URLs are short-lived, elusive, and therefore hard to detect and filter. We also observe that reputable URL addresses are sometimes used as decoys against e-mail users and spam filters. These observations can be valuable for the configuration of spam filters and in order to drive the development of new techniques to fight spam.
Computer Communications | 2001
Marios D. Dikaiakos; Athena Stassopoulou
In this paper we study satellite-caching, that is, the employment of satellite multicasting for the dissemination of prefetched content to WWW caches. This approach is currently being deployed by major satellite operators and ISPs around the world. We introduce a theoretical framework to study satellite-caching and formalize the notions of Utility and Quality of Service. We explore two charging schemes, Usage- and Subscription-based pricing, and propose a framework for negotiating the provision of the satellite-caching service between a satellite operator and its potential clients. We use this negotiation framework to compare theoretically the two pricing schemes at hand. We apply our modeling to formulate the selection of Web-content for satellite-multicasting as a combinatorial optimization problem. We study the complexity of Web-content selection and prove it is NP-complete. Finally, we propose and implement an approximation algorithm for content selection, and conduct experiments to assess its efficiency, validity and applicability.
international conference on telecommunications | 2011
Andoena Balla; Athena Stassopoulou; Marios D. Dikaiakos
In this paper we present a methodology for detecting web crawlers in real time. We use decision trees to classify requests in real time, as originating from a crawler or human, while their session is ongoing. For this purpose we used machine learning techniques to identify the most important features that differentiate humans from crawlers. The method was tested in real time with the help of an emulator, using only a small number of requests. Our results demonstrate the effectiveness and applicability of our approach.
Contexts | 2015
Alexandros Theodotou; Athena Stassopoulou
Twitter is a widely used online social networking site where users post short messages limited to 140 characters. The small length of these messages is a challenge when it comes to classifying them into categories. In this paper we propose a system that automatically classifies Twitter messages into a set of predefined categories. The system takes into account not only the tweet text, but also external features such as words from linked URLs, mentioned user profiles, and Wikipedia articles. The system is evaluated using various combinations of feature sets. According to our results, the combination of feature sets that achieves the highest accuracy of 90.8 % is when the original tweet terms are combined with user profile terms along with terms extracted from linked URLs. Including terms from Wikipedia pages, found specifically for each tweet, is shown to decrease accuracy for the original test set, however accuracy was shown to increase using a fraction of the original test set containing only tweets without URLs.