Theodoros Lappas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Theodoros Lappas is active.

Explore More

Publication

Featured researches published by Theodoros Lappas.

knowledge discovery and data mining | 2010

Finding effectors in social networks

Theodoros Lappas; Evimaria Terzi; Dimitrios Gunopulos; Heikki Mannila

Assume a network (V,E) where a subset of the nodes in V are active. We consider the problem of selecting a set of k active nodes that best explain the observed activation state, under a given information-propagation model. We call these nodes effectors. We formally define the k-Effectors problem and study its complexity for different types of graphs. We show that for arbitrary graphs the problem is not only NP-hard to solve optimally, but also NP-hard to approximate. We also show that, for some special cases, the problem can be solved optimally in polynomial time using a dynamic-programming algorithm. To the best of our knowledge, this is the first work to consider the k-Effectors problem in networks. We experimentally evaluate our algorithms using the DBLP co-authorship graph, where we search for effectors of topics that appear in research papers.

knowledge discovery and data mining | 2009

On burstiness-aware search for document sequences

Theodoros Lappas; Benjamin Arai; Manolis Platakis; Dimitrios Kotsakos; Dimitrios Gunopulos

As the number and size of large timestamped collections (e.g. sequences of digitized newspapers, periodicals, blogs) increase, the problem of efficiently indexing and searching such data becomes more important. Term burstiness has been extensively researched as a mechanism to address event detection in the context of such collections. In this paper, we explore how burstiness information can be further utilized to enhance the search process. We present a novel approach to model the burstiness of a term, using discrepancy theory concepts. This allows us to build a parameter-free, linear-time approach to identify the time intervals of maximum burstiness for a given term. Finally, we describe the first burstiness-driven search framework and thoroughly evaluate our approach in the context of different scenarios.

very large data bases | 2012

On the spatiotemporal burstiness of terms

Theodoros Lappas; Marcos R. Vieira; Dimitrios Gunopulos; Vassilis J. Tsotras

Thousands of documents are made available to the users via the web on a daily basis. One of the most extensively studied problems in the context of such document streams is burst identification. Given a term t, a burst is generally exhibited when an unusually high frequency is observed for t. While spatial and temporal burstiness have been studied individually in the past, our work is the first to simultaneously track and measure spatiotemporal term burstiness. In addition, we use the mined burstiness information toward an efficient document-search engine: given a users query of terms, our engine returns a ranked list of documents discussing influential events with a strong spatiotemporal impact. We demonstrate the efficiency of our methods with an extensive experimental evaluation on real and synthetic datasets.

european conference on machine learning | 2010

Efficient confident search in large review corpora

Theodoros Lappas; Dimitrios Gunopulos

Given an extensive corpus of reviews on an item, a potential customer goes through the expressed opinions and collects information, in order to form an educated opinion and, ultimately, make a purchase decision. This task is often hindered by false reviews, that fail to capture the true quality of the items attributes. These reviews may be based on insufficient information or may even be fraudulent, submitted to manipulate the items reputation. In this paper, we formalize the Confident Search paradigm for review corpora. We then present a complete search framework which, given a set of item attributes, is able to efficiently search through a large corpus and select a compact set of high-quality reviews that accurately captures the overall consensus of the reviewers on the specified attributes. We also introduce CREST (Confident REview Search Tool), a user-friendly implementation of our framework and a valuable tool for any person dealing with large review corpora. The efficacy of our framework is demonstrated through a rigorous experimental evaluation.

web search and data mining | 2014

Customized tour recommendations in urban areas

Aristides Gionis; Theodoros Lappas; Konstantinos Pelechrinis; Evimaria Terzi

The ever-increasing urbanization coupled with the unprecedented capacity to collect and process large amounts of data have helped to create the vision of intelligent urban environments. One key aspect of such environments is that they allow people to effectively navigate through their city. While GPS technology and route-planning services have undoubtedly helped towards this direction, there is room for improvement in intelligent urban navigation. This vision can be fostered by the proliferation of location-based social networks, such as Foursquare or Path, which record the physical presence of users in different venues through check-ins. This information can then be used to enhance intelligent urban navigation, by generating customized path recommendations for users. In this paper, we focus on the problem of recommending customized tours in urban settings. These tours are generated so that they consider (a) the different types of venues that the user wants to visit, as well as the order in which the user wants to visit them, (b) limitations on the time to be spent or distance to be covered, and (c) the merit of visiting the included venues. We capture these requirements in a generic definition that we refer to as the TourRec problem. We then introduce two instances of the TourRec problem, study their complexity, and propose efficient algorithmic solutions. Our experiments on real data collected from Foursquare demonstrate the efficacy of our algorithms and the practical utility of the reported recommendations.

Social Network Data Analytics | 2011

A SURVEY OF ALGORITHMS AND SYSTEMS FOR EXPERT LOCATION IN SOCIAL NETWORKS

Theodoros Lappas; Kun Liu; Evimaria Terzi

Given a particular task and a set of candidates, one often wants to identify the right expert (or set of experts) that can perform the given task. We call this problem the expert-location problem and we survey its different aspects as they arise in practice. For example, given the activities of candidates within a context (e.g., authoring a document, answering a question), we first describe methods for evaluating the level of expertise for each of them. Often, experts are organized in networks that correspond to social networks or organizational structures of companies. We next devote part of the chapter for describing algorithms that compute the expertise level of individuals by taking into account their position in such a network. Finally, complex tasks often require the collective expertise of more than one experts. In such cases, it is more realistic to require a team of experts that can collaborate towards a common goal. We describe algorithms that identify effective expert teams within a network of experts. The chapter is a survey of different algorithms for expertise evaluation and team identification. We highlight the basic algorithmic problems and give some indicative algorithms that have been developed in the literature. We conclude the chapter by providing a comprehensive overview of real-life systems for expert location.

applications of natural language to data bases | 2012

Fake reviews: the malicious perspective

Theodoros Lappas

Product reviews have been the focus of numerous research efforts. In particular, the problem of identifying fake reviews has recently attracted significant interest. Writing fake reviews is a form of attack, performed to purposefully harm or boost an items reputation. The effective identification of such reviews is a fundamental problem that affects the performance of virtually every application based on review corpora. While recent work has explored different aspects of the problem, no effort has been done to view the problem from the attackers perspective. In this work, we perform an analysis that emulates an actual attack on a real review corpus. We discuss different attack strategies, as well as the various contributing factors that determine the attacks impact. These factors determine, among others, the authenticity of fake review, evaluated based on its linguistic features and its ability to blend in with the rest of the corpus. Our analysis and experimental evaluation provide interesting findings on the nature of fake reviews and the vulnerability of online review-corpora.

knowledge discovery and data mining | 2012

Efficient and domain-invariant competitor mining

Theodoros Lappas; George Valkanas; Dimitrios Gunopulos

In any competitive business, success is based on the ability to make an item more appealing to customers than the competition. A number of questions arise in the context of this task: how do we formalize and quantify the competitiveness relationship between two items? Who are the true competitors of a given item? What are the features of an item that most affect its competitiveness? Despite the impact and relevance of this problem to many domains, only a limited amount of work has been devoted toward an effective solution. In this paper, we present a formal definition of the competitiveness between two items. We present efficient methods for evaluating competitiveness in large datasets and address the natural problem of finding the top-k competitors of a given item. Our methodology is evaluated against strong baselines via a user study and experiments on multiple datasets from different domains.

knowledge discovery and data mining | 2013

On the importance of temporal dynamics in modeling urban activity

Ke Zhang; Qiuye Jin; Konstantinos Pelechrinis; Theodoros Lappas

The vast amount of available spatio-temporal data of human activities and mobility has given raise to the rapidly emerging field of urban computing/informatics. Central to the latter is understanding the dynamics of the activities that take place in an urban area (e.g., a city). This can significantly enhance functionalities such as resource and service allocation within a city. Existing literature has paid a lot of attention on spatial dynamics, with the temporal ones often being neglected and left out. However, this can lead to non-negligible implications. For instance, while two areas can appear to exhibit similar activity when the latter is aggregated in time, they can be significantly different when introducing the temporal dimension. Furthermore, even when considering a specific area X alone, the transitions of the activity that takes place within X are important themselves. Using data from the most prevalent location-based social network (LBSN for short), Foursquare, we analyze the temporal dynamics of activities in New York City and San Francisco. Our results clearly show that considering the temporal dimension provides us with a different and more detailed description of urban dynamics. We envision this study to lead to more careful and detailed consideration of the temporal dynamics when analyzing urban activities.

knowledge discovery and data mining | 2014

Profit-maximizing cluster hires

Behzad Golshan; Theodoros Lappas; Evimaria Terzi

Team formation has been long recognized as a natural way to acquire a diverse pool of useful skills, by combining experts with complementary talents. This allows organizations to effectively complete beneficial projects from different domains, while also helping individual experts position themselves and succeed in highly competitive job markets. Here, we assume a collection of projects ensuremath{P}, where each project requires a certain set of skills, and yields a different benefit upon completion. We are further presented with a pool of experts ensuremath{X}, where each expert has his own skillset and compensation demands. Then, we study the problem of hiring a cluster of experts T ⊆ X, so that the overall compensation (cost) does not exceed a given budget B, and the total benefit of the projects that this team can collectively cover is maximized. We refer to this as the ClusterHire problem. Our work presents a detailed analysis of the computational complexity and hardness of approximation of the problem, as well as heuristic, yet effective, algorithms for solving it in practice. We demonstrate the efficacy of our approaches through experiments on real datasets of experts, and demonstrate their advantage over intuitive baselines. We also explore additional variants of the fundamental problem formulation, in order to account for constraints and considerations that emerge in realistic cluster-hiring scenarios. All variants considered in this paper have immediate applications in the cluster hiring process, as it emerges in the context of different organizational settings.

Explore More