Sandra de F. Mendes Sampaio
University of Manchester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sandra de F. Mendes Sampaio.
conference on information and knowledge management | 2000
James Smith; Paul Watson; Sandra de F. Mendes Sampaio; Norman W. Paton
Object database management systems (ODBMS) are now established as the database management technology of choice for a range of challenging data intensiv e applications. Furthermore, the applications associated with object databases typically have stringen t performance requirements, and some are associated with very large data sets. Ho w ever, despite the demands made on object databases by applications, there has been surprisingly little work on parallel object databases. This paper presents the arc hitecture and some preliminary performance results for the Polar ODMG compliant parallel object database. The architecture described has been implemented in a shared-nothing environment on a network of PCs. The paper describes how OQL queries are compiled, parallelized and executed in this environment, and includes some preliminary performance results for OQL queries using the 007 benchmark.
web information systems engineering | 2005
Sandra de F. Mendes Sampaio; Chao Dong; Pedro Sampaio
Internet Query Systems (IQS) are information systems used to query the World Wide Web by finding data sources relevant to a given query and retrieving data taking into account issues such as the unpredictability of access and transfer rates, infinite streams of data, and the ability to produce partial results. Despite the wide availability of research focusing on query processing for IQS, there are surprisingly few contributions addressing data quality issues such as timeliness and accuracy of data resulting from Internet query processing. This paper provides an overview of an ongoing research effort to extend IQS with a data quality component to ensure timeliness of data resulting from Internet query processing. In particular, we illustrate the quality model, data source layer design and the quality aware algebraic query processing framework adopted in our implementation effort.
Journal of Data and Information Quality | 2009
Suzanne M. Embury; Paolo Missier; Sandra de F. Mendes Sampaio; R. Mark Greenwood; Alun David Preece
The range of information now available in queryable repositories opens up a host of possibilities for new and valuable forms of data analysis. Database query languages such as SQL and XQuery offer a concise and high-level means by which such analyses can be implemented, facilitating the extraction of relevant data subsets into either generic or bespoke data analysis environments. Unfortunately, the quality of data in these repositories is often highly variable. The data is still useful, but only if the consumer is aware of the data quality problems and can work around them. Standard query languages offer little support for this aspect of data management. In principle, however, it should be possible to embed constraints describing the consumer’s data quality requirements into the query directly, so that the query evaluator can take over responsibility for enforcing them during query processing. Most previous attempts to incorporate information quality constraints into database queries have been based around a small number of highly generic quality measures, which are defined and computed by the information provider. This is a useful approach in some application areas but, in practice, quality criteria are more commonly determined by the user of the information not by the provider. In this article, we explore an approach to incorporating quality constraints into database queries where the definition of quality is set by the user and not the provider of the information. Our approach is based around the concept of a quality view, a configurable quality assessment component into which domain-specific notions of quality can be embedded. We examine how quality views can be incorporated into XQuery, and draw from this the language features that are required in general to embed quality views into any query language. We also propose some syntactic sugar on top of XQuery to simplify the process of querying with quality constraints.
Concurrency and Computation: Practice and Experience | 2006
Sandra de F. Mendes Sampaio; Norman W. Paton; James Smith; Paul Watson
Object database management systems (ODBMSs) are now established as the database management technology of choice for a range of challenging data intensive applications. Furthermore, the applications associated with object databases typically have stringent performance requirements, and some are associated with very large data sets. An important feature for the performance of object databases is the speed at which relationships can be explored. In queries, this depends on the effectiveness of different join algorithms into which queries that follow relationships can be compiled. This paper presents a performance evaluation of the Polar parallel object database system, focusing in particular on the performance of parallel join algorithms. Polar is a parallel, shared‐nothing implementation of the Object Database Management Group (ODMG) standard for object databases. The paper presents an empirical evaluation of queries expressed in the ODMG Query Language (OQL), as well as a cost model for the parallel algebra that is used to evaluate OQL queries. The cost model is validated against the empirical results for a collection of queries using four different join algorithms, one that is value based and three that are pointer based. Copyright
Distributed and Parallel Databases | 2004
Jim Smith; Sandra de F. Mendes Sampaio; Paul Watson; Norman W. Paton
This paper describes the design, implementation and evaluation of a parallel object database server. While a number of research groups and companies now provide object database servers designed to run on uniprocessors, there has been surprisingly little work on the exploitation of parallelism to provide scalable performance in Object Database Management Systems (ODBMS). The work described in this paper takes as its starting-point the Object Database Management Group (ODMG) standard for object databases, thereby allowing the project to focus on research into parallelism, rather than on the ODBMS interfaces. The system is designed to run on a distributed memory parallel machine, and the paper describes the key issues and design decisions including: parallel query optimisation and execution, flow control, support for user-defined operations in queries, object distribution, cache management and navigational client access. The work shows that the significant differences between the object and relational database paradigms lead to significant differences in the designs of parallel servers to support these two paradigms. The paper presents an extensive performance analysis of the prototype systems which shows that good performance can be achieved on a cluster of linux PCs.
Lecture Notes in Computer Science | 2002
Sandra de F. Mendes Sampaio; Norman W. Paton; James Smith; Paul Watson
Query cost models are widely used, both for performance analysis and for comparing execution plans during query optimisation. In essence, a cost modelp redicts where time is being spent during query evaluation. Although many cost models have been proposed, for serial, parallel and distributed database systems, surprisingly few of these have been validated against real systems. This paper presents cost models for the parallel evaluation of ODMG OQL queries, which have been compared with experimental results obtained using the Polar object database system. The paper describes the validation of the cost model for a collection of queries, using three join algorithms over the OO7 benchmark database. The results show that the cost model generally both ranks alternative plans appropriately, and gives a useful indication of the response times that can be expected from a plan. The paper also illustrates the application of the cost model by highlighting the contributions of different features and operations to query response times.
Expert Systems With Applications | 2015
Sandra de F. Mendes Sampaio; Chao Dong; Pedro Sampaio
Design of a data quality-aware information management framework and system.Users measure data quality based on an extensible set of data profiling algorithms.Query language, system architecture and heuristic optimization approach developed.System design based on seamless extensions to SQL and relational database systems.Applied in e-Business scenarios and potential for big data profiling discussed. This paper describes the design and implementation of the Data Quality Query System (DQ2S), a query processing framework and tool incorporating data quality profiling functionality in the processing of queries involving quality-aware query language extensions. DQ2S supports the combination of performance and quality-oriented query optimizations, and a query processing platform that enables advanced data profiling queries to be formulated based on well established query language constructs, often used to interact with relational database management systems. DQ2S encompasses a declarative query language and a data model that provides users with the capability to express constraints on the quality of query results as well as query quality-related information; a set of algebraic operators for manipulating data quality-related information, and optimization heuristics. The proposed query language and algebra represent seamless extensions to SQL and relational database engines, respectively. The constructs of the proposed data model are implemented at the users view level and are internally mapped into relational model constructs. The quality-aware extensions and features are extremely useful when users need to assess the quality of relational data sets and define quality constraints for acceptable data prior to using candidate data sources in decision support systems and conducting big data analytical tasks.
international conference on conceptual modeling | 2006
Chao Dong; Sandra de F. Mendes Sampaio; Pedro Sampaio
With the growing need for querying and combining data from multiple data sources, data analysts, database application programmers and advanced database users are increasingly facing the problem of filtering out low quality data with regard to the intended use. This paper investigates the problem of expressing and processing data quality requests during quality-aware query formulation. The paper proposes the Data Quality Query Language (DQ 2 L), an extension of SQL aimed at enabling query language users to express data quality requests and a query processing framework (architecture, query processing stages, metadata support and quality model) aimed at extending relational query processing with quality-aware query processing structures and techniques. The paper focuses on the timeliness data quality dimension.
acm symposium on applied computing | 2012
Liping Zhao; Keletso Letsholo; Erol-Valeriu Chioasca; Sandra de F. Mendes Sampaio; Pedro Sampaio
Early efforts on bridging the communication gap between a business and its IT systems have resulted in several business analysis and modeling techniques. Most recently, BPMN is rapidly consolidating its position as the established standard for modeling business processes. Yet, research shows that BPMN still lacks comprehensive constructs for representing some core business concepts, including business goals, non-functional requirements and resources. The purpose of this paper is to use Zachman Framework to assess BPMNs modeling capabilities and to identify its modeling gaps. The motivation of the paper is to provide a better understanding of the suitability of BPMN as a business process modeling language for bridging the gap between business and its information systems.
european conference on parallel processing | 2001
Sandra de F. Mendes Sampaio; James Smith; Norman W. Paton; Paul Watson
Parallel relational databases have been successful in providing scalable performance for data intensive applications, and much work has been carried out on query processing techniques in such systems. However, although many applications associated with object databases also have stringent performance requirements, there has been much less work investigating parallel object database systems. An important feature for the performance of object databases is the speed at which relationships can be explored. In queries, this depends upon the effectiveness of different join algorithms into which queries that follow relationships can be compiled. This paper presents the results of empirical evaluations of four parallel join algorithms, two value based and two pointer based. The experiments have been run on Polar, a parallel ODMG object database system.