Dimitris Sacharidis
Vienna University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dimitris Sacharidis.
IEEE Transactions on Services Computing | 2010
Dimitrios Skoutas; Dimitris Sacharidis; Alkis Simitsis; Timos K. Sellis
As the web is increasingly used not only to find answers to specific information needs but also to carry out various tasks, enhancing the capabilities of current web search engines with effective and efficient techniques for web service retrieval and selection becomes an important issue. Existing service matchmakers typically determine the relevance between a web service advertisement and a service request by computing an overall score that aggregates individual matching scores among the various parameters in their descriptions. Two main drawbacks characterize such approaches. First, there is no single matching criterion that is optimal for determining the similarity between parameters. Instead, there are numerous approaches ranging from Information Retrieval similarity measures up to semantic logic-based inference rules. Second, the reduction of individual scores to an overall similarity leads to significant information loss. Determining appropriate weights for these intermediate scores requires knowledge of user preferences, which is often not possible or easy to acquire. Instead, using a typical aggregation function, such as the average or the minimum of the degrees of match across the service parameters, introduces undesired bias, which often reduces the accuracy of the retrieval process. Consequently, several services, e.g., those having a single unmatched parameter, may be excluded from the result set, while being potentially good candidates. In this work, we present two complementary approaches that overcome the aforementioned deficiencies. First, we propose a methodology for ranking the relevant services for a given request, introducing objective measures based on dominance relationships defined among the services. Second, we investigate methods for clustering the relevant services in a way that reveals and reflects the different trade-offs between the matched parameters. We demonstrate the effectiveness and the efficiency of our proposed techniques and algorithms through extensive experimental evaluation on both real requests and relevance sets, as well as on synthetic scenarios.
extending database technology | 2006
Graham Cormode; Minos N. Garofalakis; Dimitris Sacharidis
Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for general-purpose summaries of the streaming data distribution (e.g., wavelets). The focus of existing work has typically been on minimizing space requirements of the maintained synopsis — however, to effectively support high-speed data-stream analysis, a crucial practical requirement is to also optimize: (1) the update time for incorporating a streaming data element in the sketch, and (2) the query time for producing an approximate summary (e.g., the top wavelet coefficients) from the sketch. Such time costs must be small enough to cope with rapid stream-arrival rates and the real-time querying requirements of typical streaming applications (e.g., ISP network monitoring). With cheap and plentiful memory, space is often only a secondary concern after query/update time costs. In this paper, we propose the first fast solution to the problem of tracking wavelet representations of one-dimensional and multi-dimensional data streams, based on a novel stream synopsis, the Group-Count Sketch (GCS). By imposing a hierarchical structure of groups over the data and applying the GCS, our algorithms can quickly recover the most important wavelet coefficients with guaranteed accuracy. A tradeoff between query time and update time is established, by varying the hierarchical structure of groups, allowing the right balance to be found for specific data stream. Experimental analysis confirms this tradeoff, and shows that all our methods significantly outperform previously known methods in terms of both update time and query time, while maintaining a high level of accuracy.
very large data bases | 2009
Kyriakos Mouratidis; Dimitris Sacharidis; Hwee Hwa Pang
In the outsourced database model, a data owner publishes her database through a third-party server; i.e., the server hosts the data and answers user queries on behalf of the owner. Since the server may not be trusted, or may be compromised, users need a means to verify that answers received are both authentic and complete, i.e., that the returned data have not been tampered with, and that no qualifying results have been omitted. We propose a result verification approach for one-dimensional queries, called Partially Materialized Digest scheme (PMD), that applies to both static and dynamic databases. PMD uses separate indexes for the data and for their associated verification information, and only partially materializes the latter. In contrast with previous work, PMD avoids unnecessary costs when processing queries that do not request verification, achieving the performance of an ordinary index (e.g., a B+-tree). On the other hand, when an authenticity and completeness proof is required, PMD outperforms the existing state-of-the-art technique by a wide margin, as we demonstrate analytically and experimentally. Furthermore, we design two verification methods for spatial queries. The first, termed Merkle R-tree (MR-tree), extends the conventional approach of embedding authentication information into the data index (i.e., an R-tree). The second, called Partially Materialized KD-tree (PMKD), follows the PMD paradigm using separate data and verification indexes. An empirical evaluation with real data shows that the PMD methodology is superior to the traditional approach for spatial queries too.
extending database technology | 2009
Dimitrios Skoutas; Dimitris Sacharidis; Alkis Simitsis; Verena Kantere; Timos K. Sellis
As we move from a Web of data to a Web of services, enhancing the capabilities of the current Web search engines with effective and efficient techniques for Web services retrieval and selection becomes an important issue. Traditionally, the relevance of a Web service advertisement to a service request is determined by computing an overall score that aggregates individual matching scores among the various parameters in their descriptions. Two drawbacks characterize such approaches. First, there is no single matching criterion that is optimal for determining the similarity between parameters. Instead, there are numerous approaches ranging from using Information Retrieval similarity metrics up to semantic logic-based inference rules. Second, the reduction of individual scores to an overall similarity leads to significant information loss. Since there is no consensus on how to weight these scores, existing methods are typically pessimistic, adopting a worst-case scenario. As a consequence, several services, e.g., those having a single unrelated parameter, can be excluded from the result set, even though they are potentially good alternatives. In this work, we present a methodology that overcomes both deficiencies. Given a request, we introduce an objective measure that assigns a dominance score to each advertised Web service. This score takes into consideration all the available criteria for each parameter in the request. We investigate three distinct definitions of dominance score, and we devise efficient algorithms that retrieve the top-k most dominant Web services in each case. Extensive experimental evaluation on real requests and relevance sets, as well as on synthetically generated scenarios, demonstrates both the effectiveness of the proposed technique and the efficiency of the algorithms.
international conference on data engineering | 2009
Dimitris Sacharidis; Stavros Papadopoulos; Dimitris Papadias
The vast majority of work on skyline queries considers totally ordered domains, whereas in many applications some attributes are partially ordered, as for instance, domains of set values, hierarchies, intervals and preferences. The only work addressing this issue has limited progressiveness and pruning ability, and it is only applicable to static skylines. This paper overcomes these problems with the following contributions. (i) We introduce a generic framework, termed TSS, for handling partially ordered domains using topological sorting. (ii) We propose a novel dominance check that eliminates false hits/misses, further enhancing progressiveness and pruning ability. (iii) We extend our methodology to dynamic skylines with respect to an input query. In this case, the dominance relationships change according to the query specification, and their computation is rather complex. We perform an extensive experimental evaluation demonstrating that TSS is up to 9 times and up to 2 orders of magnitude faster than existing methods in the static and the dynamic case, respectively.
knowledge discovery and data mining | 2007
Panagiotis Karras; Dimitris Sacharidis; Nikos Mamoulis
Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over logε* factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over logε* + logn in time and B(1-log B over log n) in space, where ε* is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation.
international semantic web conference | 2008
Dimitrios Skoutas; Dimitris Sacharidis; Verena Kantere; Timos K. Sellis
Efficient and scalable discovery mechanisms are critical for enabling service-oriented architectures on the Semantic Web. The majority of currently existing approaches focuses on centralized architectures, and deals with efficiency typically by pre-computing and storing the results of the semantic matcher for all possible query concepts. Such approaches, however, fail to scale with respect to the number of service advertisements and the size of the ontologies involved. On the other hand, this paper presents an efficient and scalable index-based method for Semantic Web service discovery that allows for fast selection of services at query time and is suitable for both centralized and P2P environments. We employ a novel encoding of the service descriptions, allowing the match between a request and an advertisement to be evaluated in constant time, and we index these representations to prune the search space, reducing the number of comparisons required. Given a desired ranking function, the search algorithm can retrieve the top-k matches progressively, i.e., better matches are computed and returned first, thereby further reducing the search engines response time. We also show how this search can be performed efficiently in a suitable structured P2P overlay network. The benefits of the proposed method are demonstrated through experimental evaluation on both real and synthetic data.
IEEE Transactions on Knowledge and Data Engineering | 2010
Dimitris Sacharidis; Kyriakos Mouratidis; Dimitris Papadias
The concept of k-anonymity has received considerable attention due to the need of several organizations to release microdata without revealing the identity of individuals. Although all previous k-anonymity techniques assume the existence of a public database (PD) that can be used to breach privacy, none utilizes PD during the anonymization process. Specifically, existing generalization algorithms create anonymous tables using only the microdata table (MT) to be published, independently of the external knowledge available. This omission leads to high information loss. Motivated by this observation, we first introduce the concept of k-join-anonymity (KJA), which permits more effective generalization to reduce the information loss. Briefly, KJA anonymizes a superset of MT, which includes selected records from PD. We propose two methodologies for adapting k-anonymity algorithms to their KJA counterparts. The first generalizes the combination of MT and PD, under the constraint that each group should contain at least 1 tuple of MT (otherwise, the group is useless and discarded). The second anonymizes MT, and then, refines the resulting groups using PD. Finally, we evaluate the effectiveness of our contributions with an extensive experimental evaluation using real and synthetic data sets.
international conference on management of data | 2005
Mehrdad Jahangiri; Dimitris Sacharidis; Cyrus Shahabi
The Discrete Wavelet Transform is a proven tool for a wide range of database applications. However, despite broad acceptance, some of its properties have not been fully explored and thus not exploited, particularly for two common forms of multidimensional decomposition. We introduce two novel operations for wavelet transformed data, termed SHIFT and SPLIT, based on the properties of wavelet trees, which work directly in the wavelet domain. We demonstrate their significance and usefulness by analytically proving six important results in four common data maintenance scenarios, i.e., transformation of massive datasets, appending data, approximation of data streams and partial data reconstruction, leading to significant I/O cost reduction in all cases. Furthermore, we show how these operations can be further improved in combination with the optimal coefficient-to-disk-block allocation strategy. Our exhaustive set of empirical experiments with real-world datasets verifies our claims.
international conference on data engineering | 2010
Dimitris Sacharidis; Anastasios Arvanitis; Timos K. Sellis
The skyline query returns the most interesting tuples according to a set of explicitly defined preferences among attribute values. This work relaxes this requirement, and allows users to pose meaningful skyline queries without stating their choices. To compensate for missing knowledge, we first determine a set of uncertain preferences based on user profiles, i.e., information collected for previous contexts. Then, we define a probabilistic contextual skyline query (p-CSQ) that returns the tuples which are interesting with high probability. We emphasize that, unlike past work, uncertainty lies within the query and not the data, i.e., it is in the relationships among tuples rather than in their attribute values. Furthermore, due to the nature of this uncertainty, popular skyline methods, which rely on a particular tuple visit order, do not apply for p-CSQs. Therefore, we present novel non-indexed and index-based algorithms for answering p-CSQs. Our experimental evaluation concludes that the proposed techniques are significantly more efficient compared to a standard block nested loops approach.