Themis Palpanas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Themis Palpanas is active.

Explore More

Publication

Featured researches published by Themis Palpanas.

Data Mining and Knowledge Discovery | 2012

Survey on mining subjective data on the web

Mikalai Tsytsarau; Themis Palpanas

In the past years we have witnessed Sentiment Analysis and Opinion Mining becoming increasingly popular topics in Information Retrieval and Web data analysis. With the rapid growth of the user-generated content represented in blogs, wikis and Web forums, such an analysis became a useful tool for mining the Web, since it allowed us to capture sentiments and opinions at a large scale. Opinion retrieval has established itself as an important part of search engines. Ratings, opinion trends and representative opinions enrich the search experience of users when combined with traditional document retrieval, by revealing more insights about a subject. Opinion aggregation over product reviews can be very useful for product marketing and positioning, exposing the customers’ attitude towards a product and its features along different dimensions, such as time, geographical location, and experience. Tracking how opinions or discussions evolve over time can help us identify interesting trends and patterns and better understand the ways that information is propagated in the Internet. In this study, we review the development of Sentiment Analysis and Opinion Mining during the last years, and also discuss the evolution of a relatively new research direction, namely, Contradiction Analysis. We give an overview of the proposed methods and recent advances in these areas, and we try to layout the future research directions in the field.

international conference on data mining | 2010

iSAX 2.0: Indexing and Mining One Billion Time Series

Alessandro Camerra; Themis Palpanas; Jin Shieh; Eamonn J. Keogh

There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of time series. Examples of such applications come from astronomy, biology, the web, and other domains. It is not unusual for these applications to involve numbers of time series in the order of hundreds of millions to billions. However, all relevant techniques that have been proposed in the literature so far have not considered any data collections much larger than one-million time series. In this paper, we describe iSAX 2.0, a data structure designed for indexing and mining truly massive collections of time series. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce a novel bulk loading mechanism, the first of this kind specifically tailored to a time series index. We show how our method allows mining on datasets that would otherwise be completely untenable, including the first published experiments to index one billion time series, and experiments in mining massive data from domains as diverse as entomology, DNA and web-scale image collections.

conference on information and knowledge management | 2006

Maximizing the sustained throughput of distributed continuous queries

Ioana Stanoi; George A. Mihaila; Themis Palpanas; Christian A. Lang

Monitoring systems today often involve continuous queries over streaming data, in a distributed collaborative system. The distribution of query operators over a network of processors, and their processing sequence, form a query configuration with inherent constraints on the throughput it can support. In this paper we propose to optimize stream queries with respect to a version of throughput measure, the profiled input throughput. This measure is focused on matching the expected behavior of the input streams. To prune the search space we used hill-climbing techniques that proved to be efficient and effective.

data and knowledge engineering | 2009

Frequent items in streaming data: An experimental evaluation of the state-of-the-art

Nishad Manerikar; Themis Palpanas

The problem of detecting frequent items in streaming data is relevant to many different applications across many domains. Several algorithms, diverse in nature, have been proposed in the literature for the solution of the above problem. In this paper, we review these algorithms, and we present the results of the first extensive comparative experimental study of the most prominent algorithms in the literature. The algorithms were comprehensively tested using a common test framework on several real and synthetic datasets. Their performance with respect to the different parameters (i.e., parameters intrinsic to the algorithms, and data related parameters) was studied. We report the results, and insights gained through these experiments.

ieee international conference on pervasive computing and communications | 2012

What does model-driven data acquisition really achieve in wireless sensor networks?

Usman Raza; Alessandro Camerra; Amy L. Murphy; Themis Palpanas; Gian Pietro Picco

Model-driven data acquisition techniques aim at reducing the amount of data reported, and therefore the energy consumed, in wireless sensor networks (WSNs). At each node, a model predicts the sampled data; when the latter deviate from the current model, a new model is generated and sent to the data sink. However, experiences in real-world deployments have not been reported in the literature. Evaluation typically focuses solely on the quantity of data reports suppressed at source nodes: the interplay between data modeling and the underlying network protocols is not analyzed. In contrast, this paper investigates in practice whether i) model-driven data acquisition works in a real application; ii) the energy savings it enables in theory are still worthwhile once the network stack is taken into account. We do so in the concrete setting of a WSN-based system for adaptive lighting in road tunnels. Our novel modeling technique, Derivative-Based Prediction (DBP), suppresses up to 99% of the data reports, while meeting the error tolerance of our application. DBP is considerably simpler than competing techniques, yet performs better in our real setting. Experiments in both an indoor testbed and an operational road tunnel show also that, once the network stack is taken into consideration, DBP triples the WSN lifetime-a remarkable result per se, but a far cry from the aforementioned 99% data suppression. This suggests that, to fully exploit the energy savings enabled by data modeling techniques, a coordinated operation of the data and network layers is necessary.

international semantic web conference | 2013

Social Listening of City Scale Events Using the Streaming Linked Data Framework

Marco Balduini; Emanuele Della Valle; Daniele Dell'Aglio; Mikalai Tsytsarau; Themis Palpanas; Cristian Confalonieri

City-scale events may easily attract half a million of visitors in hundreds of venues over just a few days. Which are the most attended venues? What do visitors think about them? How do they feel before, during and after the event? These are few of the questions a city-scale event manger would like to see answered in real-time. In this paper, we report on our experience in social listening of two city-scale events (London Olympic Games 2012, and Milano Design Week 2013) using the Streaming Linked Data Framework.

IEEE Transactions on Knowledge and Data Engineering | 2013

A Blocking Framework for Entity Resolution in Highly Heterogeneous Information Spaces

George Papadakis; Ekaterini Ioannou; Themis Palpanas; Claudia Niederée; Wolfgang Nejdl

In the context of entity resolution (ER) in highly heterogeneous, noisy, user-generated entity collections, practically all block building methods employ redundancy to achieve high effectiveness. This practice, however, results in a high number of pairwise comparisons, with a negative impact on efficiency. Existing block processing strategies aim at discarding unnecessary comparisons at no cost in effectiveness. In this paper, we systemize blocking methods for clean-clean ER (an inherently quadratic task) over highly heterogeneous information spaces (HHIS) through a novel framework that consists of two orthogonal layers: the effectiveness layer encompasses methods for building overlapping blocks with small likelihood of missed matches; the efficiency layer comprises a rich variety of techniques that significantly restrict the required number of pairwise comparisons, having a controllable impact on the number of detected duplicates. We map to our framework all relevant existing methods for creating and processing blocks in the context of HHIS, and additionally propose two novel techniques: attribute clustering blocking and comparison scheduling. We evaluate the performance of each layer and method on two large-scale, real-world data sets and validate the excellent balance between efficiency and effectiveness that they achieve.

IEEE Transactions on Knowledge and Data Engineering | 2008

Streaming Time Series Summarization Using User-Defined Amnesic Functions

Themis Palpanas; Michail Vlachos; Eamonn J. Keogh; Dimitrios Gunopulos

The past decade has seen a wealth of research on time series representations. The vast majority of research has concentrated on representations that are calculated in batch mode and represent each value with approximately equal fidelity. However, the increasing deployment of mobile devices and real time sensors has brought home the need for representations that can be incrementally updated, and can approximate the data with fidelity proportional to its age. The latter property allows us to answer queries about the recent past with greater precision, since in many domains recent information is more useful than older information. We call such representations amnesic. While there has been previous work on amnesic representations, the class of amnesic functions possible was dictated by the representation itself. In this work, we introduce a novel representation of time series that can represent arbitrary, user-specified amnesic functions. We propose online algorithms for our representation, and discuss their properties. Finally, we perform an extensive empirical evaluation on 40 datasets, and show that our approach can efficiently maintain a high quality amnesic approximation.

very large data bases | 2014

Exemplar queries: give me an example of what you need

Davide Mottin; Matteo Lissandrini; Yannis Velegrakis; Themis Palpanas

Search engines are continuously employing advanced techniques that aim to capture user intentions and provide results that go beyond the data that simply satisfy the query conditions. Examples include the personalized results, related searches, similarity search, popular and relaxed queries. In this work we introduce a novel query paradigm that considers a user query as an example of the data in which the user is interested. We call these queries exemplar queries and claim that they can play an important role in dealing with the information deluge. We provide a formal specification of the semantics of such queries and show that they are fundamentally different from notions like queries by example, approximate and related queries. We provide an implementation of these semantics for graph-based data and present an exact solution with a number of optimizations that improve performance without compromising the quality of the answers. We also provide an approximate solution that prunes the search space and achieves considerably better time-performance with minimal or no impact on effectiveness. We experimentally evaluate the effectiveness and efficiency of these solutions with synthetic and real datasets, and illustrate the usefulness of exemplar queries in practice.

Knowledge and Information Systems | 2014