Spyros Kotoulas
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Spyros Kotoulas.
international semantic web conference | 2012
Vanessa Lopez; Spyros Kotoulas; Marco Luca Sbodio; Martin Stephenson; Aris Gkoulalas-Divanis; Pol Mac Aonghusa
In this paper, we present QuerioCity, a platform to catalog, index and query highly heterogenous information coming from complex systems, such as cities. A series of challenges are identified: namely, the heterogeneity of the domain and the lack of a common model, the volume of information and the number of data sets, the requirement for a low entry threshold to the system, the diversity of the input data, in terms of format, syntax and update frequency (streams vs static data), and the sensitivity of the information. We propose an approach for incremental and continuous integration of static and streaming data, based on Semantic Web technologies. The proposed system is unique in the literature in terms of handling of multiple integrations of available data sets in combination with flexible provenance tracking, privacy protection and continuous integration of streams. We report on lessons learnt from building the first prototype for Dublin.
international semantic web conference | 2013
Simone Tallevi-Diotallevi; Spyros Kotoulas; Luca Foschini; Freddy Lécué; Antonio Corradi
Several sources of information, from people, systems, things, are already available in most modern cities. Processing these continuous flows of information and capturing insight poses unique technical challenges that span from response time constraints to data heterogeneity, in terms of format and throughput. To tackle these problems, we focus on a novel prototype to ease real-time monitoring and decision-making processes for the City of Dublin with three main original technical aspects: (i) an extension to SPARQL to support efficient querying of heterogeneous streams; (ii) a query execution framework and runtime environment based on IBM InfoSphere Streams, a high-performance, industrial strength, stream processing engine; (iii) a hybrid RDFS reasoner, optimized for our stream processing execution framework. Our approach has been validated with real data collected on the field, as shown in our Dublin City video demonstration. Results indicate that real-time processing of city information streams based on semantic technologies is indeed not only possible, but also efficient, scalable and low-latency.
IEEE Internet Computing | 2013
Irene Celino; Spyros Kotoulas
Research on smart cities has emerged as an interdisciplinary field covering IT infrastructures, crowdsourcing, and utility services optimization, among others. This special issue focuses on deployed technologies for smart cities based on Internet technologies.
conference on information and knowledge management | 2014
Long Cheng; Spyros Kotoulas; Tomas E. Ward; Georgios K. Theodoropoulos
The performance of joins in parallel database management systems is critical for data intensive operations such as querying. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and performance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed implementation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specifically, compared to the state-of-art PRPD method, we achieve 16% - 167% performance improvement and 24% - 54% less network communication under different join workloads.
european conference on artificial intelligence | 2012
Ilias Tachmazidis; Grigoris Antoniou; Giorgos Flouris; Spyros Kotoulas; Lee McCluskey
We are recently experiencing an unprecedented explosion of available data coming from the Web, sensors readings, scientific databases, government authorities and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, com-monsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling huge amounts of data for these applications. In this paper, we consider inconsistency-tolerant reasoning in the form of defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge datasets. We extend previous work by dealing with predicates of arbitrary arity, under the assumption of stratification. Moving from unary to multi-arity predicates is a decisive step towards practical applications, e.g. reasoning with linked open (RDF) data. Our experimental results demonstrate that defeasible reasoning with millions of data is performant, and has the potential to scale to billions of facts.
cluster computing and the grid | 2014
Long Cheng; Spyros Kotoulas; Tomas E. Ward; Georgios K. Theodoropoulos
Outer joins are ubiquitous in databases and big data systems. The question of how best to execute outer joins in large parallel systems is particularly challenging as real world datasets are characterized by data skew leading to performance issues. Although skew handling techniques have been extensively studied for inner joins, there is little published work solving the corresponding problem for parallel outer joins. Conventional approaches to this problem such as ones based on hash redistribution often lead to load balancing problems while duplication-based approaches incurs significant overhead in terms of network communication. In this paper, we propose a new algorithm, query with counters (QC), for directly handling skew in outer joins on distributed architectures. We present an efficient implementation of our approach based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skew. Experimental results show that our method is scalable and, in cases of high skew, faster than the state-of-the-art.
high performance computing and communications | 2013
Long Cheng; Spyros Kotoulas; Tomas E. Ward; Georgios K. Theodoropoulos
The performance of parallel distributed data management systems becomes increasingly important with the rise of Big Data. Parallel joins have been widely studied both in the parallel processing and the database communities. Nevertheless, most of the algorithms so far developed do not consider the data skew, which naturally exists in various applications. State of the art methods designed to handle this problem are based on extensions to either of the two prevalent conventional approaches to parallel joins - the hash-based and duplication-based frameworks. In this paper, we introduce a novel parallel join framework, query-based distributed join (QbDJ), for handling data skew on distributed architectures. Further, we present an efficient implementation of the method based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skews. The results show that the method is scalable, and also runs faster with less network communication compared to state-of-art PRPD approach in [1] under high data skew.
Journal of Web Semantics | 2014
Spyros Kotoulas; Vanessa Lopez; Raymond Lloyd; Marco Luca Sbodio; Freddy Lécué; Martin Stephenson; Elizabeth M. Daly; Veli Bicer; Aris Gkoulalas-Divanis; Giusy Di Lorenzo; Anika Schumann; Pol Mac Aonghusa
Abstract We present SPUD , a semantic environment for cataloging, exploring, integrating, understanding, processing and transforming urban information. A series of challenges are identified: namely, the heterogeneity of the domain and the impracticality of a common model, the volume of information and the number of data sets, the requirement for a low entry threshold to the system, the diversity of the input data, in terms of format, syntax and update frequency (streams vs static data), the complex data dependencies and the sensitivity of the information. We propose an approach for the incremental and continuous integration of static and streaming data, based on Semantic Web technologies and apply our technology to a traffic diagnosis scenario. We demonstrate our approach through a system operating on real data in Dublin and we show that semantic technologies can be used to obtain business results in an environment with hundreds of heterogeneous datasets coming from distributed data sources and spanning multiple domains.
international semantic web conference | 2016
Vanessa Lopez; Pierpaolo Tommasi; Spyros Kotoulas; Jiewen Wu
We present a domain-agnostic system for Question Answering over multiple semi-structured and possibly linked datasets without the need of a training corpus. The system is motivated by an industry use-case where Enterprise Data needs to be combined with a large body of Open Data to fulfill information needs not satisfied by prescribed application data models. Our proposed Question Answering pipeline combines existing components with novel methods to perform, in turn, linguistic analysis of a query, named entity extraction, entity/graph search, fusion and ranking of possible answers. We evaluate QuerioDALI with two open-domain benchmarks and a biomedical one over Linked Open Data sources, and show that our system produces comparable results to systems that require training data and are domain-dependent. In addition, we analyze the current challenges and shortcomings.
acm conference on hypertext | 2013
Vanessa Lopez; Spyros Kotoulas; Marco Luca Sbodio; Raymond Lloyd
Governments and enterprises are interested in the return-on-investment for exposing their data. This brings forth the problem of making data consumable, with minimal effort. Beyond search techniques, there is a need for effective methods to identify heterogeneous datasets that are closely related, as part of data integration or exploration tasks. The large number of datasets demands a new generation of Smarter Systems for data content aggregation that allows users to incrementally liberate, access and integrate information, in a manner that scales in terms of gain for the effort spent. In the context of such a pay-as-you go system, we are presenting a novel method for exploring and discovering relevant datasets based on semantic relatedness. We are demonstrating a system for contextual knowledge mining on hundreds of real-world datasets from Dublin City. We evaluate our semantic approach, using query logs and domain expert judgments, to show that our approach effectively identifies related datasets and outperforms text-based recommendations.