Giuseppe Di Fatta
University of Reading
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Giuseppe Di Fatta.
NeuroImage | 2015
Esther E. Bron; Marion Smits; Wiesje M. van der Flier; Hugo Vrenken; Frederik Barkhof; Philip Scheltens; Janne M. Papma; Rebecca M. E. Steketee; Carolina Patricia Mendez Orellana; Rozanna Meijboom; Madalena Pinto; Joana R. Meireles; Carolina Garrett; António J. Bastos-Leite; Ahmed Abdulkadir; Olaf Ronneberger; Nicola Amoroso; Roberto Bellotti; David Cárdenas-Peña; Andrés Marino Álvarez-Meza; Chester V. Dolph; Khan M. Iftekharuddin; Simon Fristed Eskildsen; Pierrick Coupé; Vladimir Fonov; Katja Franke; Christian Gaser; Christian Ledig; Ricardo Guerrero; Tong Tong
Algorithms for computer-aided diagnosis of dementia based on structural MRI have demonstrated high performance in the literature, but are difficult to compare as different data sets and methodology were used for evaluation. In addition, it is unclear how the algorithms would perform on previously unseen data, and thus, how they would perform in clinical practice when there is no real opportunity to adapt the algorithm to the data at hand. To address these comparability, generalizability and clinical applicability issues, we organized a grand challenge that aimed to objectively compare algorithms based on a clinically representative multi-center data set. Using clinical practice as the starting point, the goal was to reproduce the clinical diagnosis. Therefore, we evaluated algorithms for multi-class classification of three diagnostic groups: patients with probable Alzheimers disease, patients with mild cognitive impairment and healthy controls. The diagnosis based on clinical criteria was used as reference standard, as it was the best available reference despite its known limitations. For evaluation, a previously unseen test set was used consisting of 354 T1-weighted MRI scans with the diagnoses blinded. Fifteen research teams participated with a total of 29 algorithms. The algorithms were trained on a small training set (n=30) and optionally on data from other sources (e.g., the Alzheimers Disease Neuroimaging Initiative, the Australian Imaging Biomarkers and Lifestyle flagship study of aging). The best performing algorithm yielded an accuracy of 63.0% and an area under the receiver-operating-characteristic curve (AUC) of 78.8%. In general, the best performances were achieved using feature extraction based on voxel-based morphometry or a combination of features that included volume, cortical thickness, shape and intensity. The challenge is open for new submissions via the web-based framework: http://caddementia.grand-challenge.org.
Future Generation Computer Systems | 2014
Giancarlo Fortino; Daniele Parisi; Vincenzo Pirrone; Giuseppe Di Fatta
Body Sensor Networks (BSNs) have been recently introduced for the remote monitoring of human activities in a broad range of application domains, such as health care, emergency management, fitness and behavior surveillance. BSNs can be deployed in a community of people and can generate large amounts of contextual data that require a scalable approach for storage, processing and analysis. Cloud computing can provide a flexible storage and processing infrastructure to perform both online and offline analysis of data streams generated in BSNs. This paper proposes BodyCloud, a SaaS approach for community BSNs that supports the development and deployment of Cloud-assisted BSN applications. BodyCloud is a multi-tier application-level architecture that integrates a Cloud computing platform and BSN data streams middleware. BodyCloud provides programming abstractions that allow the rapid development of community BSN applications. This work describes the general architecture of the proposed approach and presents a case study for the real-time monitoring and analysis of cardiac data streams of many individuals.
ieee international conference on cloud computing technology and science | 2012
Giancarlo Fortino; Mukaddim Pathan; Giuseppe Di Fatta
Spatially distributed sensor nodes can be used to monitor systems and humans conditions in a wide range of application domains. A network of body sensors in a community of people generates large amounts of contextual data that requires a scalable approach for storage and processing. Cloud computing can provide a powerful, scalable storage and processing infrastructure to perform both online and offline analysis and mining of body sensor data streams. This paper presents BodyCloud, a system architecture based on Cloud Computing for the management and monitoring of body sensor data streams. It incorporates key concepts such as scalability and flexibility of resources, sensor heterogeneity, and the dynamic deployment and management of user and community applications.
Proceedings of the 3rd international workshop on Software quality assurance | 2006
Giuseppe Di Fatta; Stefan Leue; Evghenia Stegantova
We present a method to enhance fault localization for software systems based on a frequent pattern mining algorithm. Our method is based on a large set of test cases for a given set of programs in which faults can be detected. The test executions are recorded as function call trees. Based on test oracles the tests can be classified into successful and failing tests. A frequent pattern mining algorithm is used to identify frequent subtrees in successful and failing test executions. This information is used to rank functions according to their likelihood of containing a fault. The ranking suggests an order in which to examine the functions during fault analysis. We validate our approach experimentally using a subset of Siemens benchmark programs.
Journal of Parallel and Distributed Computing | 2013
Giuseppe Di Fatta; Francesco Blasa; Simone Cafiero; Giancarlo Fortino
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
international conference on machine learning and applications | 2010
David Pettinger; Giuseppe Di Fatta
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as ‘brute force’ since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the ‘brute force’ algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.
acm symposium on applied computing | 2007
Giuseppe Di Fatta; Giancarlo Fortino
We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.
international conference on data mining | 2011
Giuseppe Di Fatta; Francesco Blasa; Simone Cafiero; Giancarlo Fortino
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
international conference on e-science | 2009
David Pettinger; Giuseppe Di Fatta
Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.
international conference on computational science | 2002
Giuseppe Di Fatta; Giuseppe Lo Re; Alfonso Urso
In the recent years, the unpredictable growth of the Internet has moreover pointed out the congestion problem, one of the problems that historically have affected the network. This paper deals with the design and the evaluation of a congestion control algorithm which adopts a Fuzzy Controller. The analogy between Proportional Integral (PI) regulators and Fuzzy controllers is discussed and a method to determine the scaling factors of the Fuzzy controller is presented. It is shown that the Fuzzy controller outperforms the PI under traffic conditions which are different from those related to the operating point considered in the design.