Simone Palazzo
University of Catania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Simone Palazzo.
Multimedia Tools and Applications | 2014
Isaak Kavasidis; Simone Palazzo; Roberto Di Salvo; Daniela Giordano; Concetto Spampinato
Large scale labeled datasets are of key importance for the development of automatic video analysis tools as they, from one hand, allow multi-class classifiers training and, from the other hand, support the algorithms’ evaluation phase. This is widely recognized by the multimedia and computer vision communities, as witnessed by the growing number of available datasets; however, the research still lacks in annotation tools able to meet user needs, since a lot of human concentration is necessary to generate high quality ground truth data. Nevertheless, it is not feasible to collect large video ground truths, covering as much scenarios and object categories as possible, by exploiting only the effort of isolated research groups. In this paper we present a collaborative web-based platform for video ground truth annotation. It features an easy and intuitive user interface that allows plain video annotation and instant sharing/integration of the generated ground truths, in order to not only alleviate a large part of the effort and time needed, but also to increase the quality of the generated annotations. The tool has been on-line in the last four months and, at the current date, we have collected about 70,000 annotations. A comparative performance evaluation has also shown that our system outperforms existing state of the art methods in terms of annotation time, annotation quality and system’s usability.
computer vision and pattern recognition | 2015
Daniela Giordano; Francesca Murabito; Simone Palazzo; Concetto Spampinato
In this paper we present an approach for segmenting objects in videos taken in complex scenes with multiple and different targets. The method does not make any specific assumptions about the videos and relies on how objects are perceived by humans according to Gestalt laws. Initially, we rapidly generate a coarse foreground segmentation, which provides predictions about motion regions by analyzing how superpixel segmentation changes in consecutive frames. We then exploit these location priors to refine the initial segmentation by optimizing an energy function based on appearance and perceptual organization, only on regions where motion is observed. We evaluated our method on complex and challenging video sequences and it showed significant performance improvements over recent state-of-the-art methods, being also fast enough to be used for “on-the-fly” processing.
Proceedings of the 1st International Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision Applications | 2012
Isaak Kavasidis; Simone Palazzo; R. Di Salvo; Daniela Giordano; Concetto Spampinato
In this paper we present a tool for the generation of ground-truth data for object detection, tracking and recognition applications. Compared to state of the art methods, such as ViPER-GT, our tool improves the user experience by providing edit shortcuts such as hotkeys and drag-and-drop, and by integrating computer vision algorithms to automate, under the supervision of the user, the extraction of contours and the identification of objects across frames. A comparison between our application and ViPER-GT tool was performed, which showed how our tool allows users to label a video in a shorter time, while at the same time providing a higher ground truth quality.
Medical Image Analysis | 2017
Concetto Spampinato; Simone Palazzo; Daniela Giordano; Marco Aldinucci; Rosalia Leonardi
HIGHLIGHTSTesting several Convolutional Neural Networks on for skeletal bone age assessment with X‐Ray images.BoNet: a CNN for automated skeletal age assessment able to cope with hand nonrigid deformation.First automated skeletal bone age assessment work tested on a public dataset with source code publicly available.Providing answers to more general questions about deep learning on medical images. ABSTRACT Skeletal bone age assessment is a common clinical practice to investigate endocrinology, genetic and growth disorders in children. It is generally performed by radiological examination of the left hand by using either the Greulich and Pyle (G&P) method or the Tanner–Whitehouse (TW) one. However, both clinical procedures show several limitations, from the examination effort of radiologists to (most importantly) significant intra‐ and inter‐operator variability. To address these problems, several automated approaches (especially relying on the TW method) have been proposed; nevertheless, none of them has been proved able to generalize to different races, age ranges and genders. In this paper, we propose and test several deep learning approaches to assess skeletal bone age automatically; the results showed an average discrepancy between manual and automatic evaluation of about 0.8 years, which is state‐of‐the‐art performance. Furthermore, this is the first automated skeletal bone age assessment work tested on a public dataset and for all age ranges, races and genders, for which the source code is available, thus representing an exhaustive baseline for future research in the field. Beside the specific application scenario, this paper aims at providing answers to more general questions about deep learning on medical images: from the comparison between deep‐learned features and manually‐crafted ones, to the usage of deep‐learning methods trained on general imagery for medical problems, to how to train a CNN with few images.
Computer Vision and Image Understanding | 2014
Concetto Spampinato; Simone Palazzo; Isaak Kavasidis
Abstract Background modeling is a well-know approach to detect moving objects in video sequences. In recent years, background modeling methods that adopt spatial and texture information have been developed for dealing with complex scenarios. However, none of the investigated approaches have been tested under extreme conditions, such as the underwater domain, on which effects compromising the video quality affect negatively the performance of the background modeling process. In order to overcome such difficulties, more significant features and more robust methods must be found. In this paper, we present a kernel density estimation method which models background and foreground by exploiting textons to describe textures within small and low contrasted regions. Comparison with other texture descriptors, namely, local binary pattern (LBP) and scale invariant local ternary pattern (SILTP) shown improved performance. Besides, quantitative and qualitative performance evaluation carried out on three standard datasets showing very complex conditions revealed that our method outperformed state-of-the-art methods that use different features and modeling techniques and, most importantly, it is able to generalize over different scenarios and targets.
Multimedia Tools and Applications | 2014
Concetto Spampinato; Simone Palazzo; Bastiaan Johannes Boom; Jacco van Ossenbruggen; Isaak Kavasidis; Roberto Di Salvo; Fang-Pang Lin; Daniela Giordano; Lynda Hardman; Robert B. Fisher
The study of fish populations in their own natural environment is a task that has usually been tackled in invasive ways which inevitably influenced the behavior of the fish under observation. Recent projects involving the installation of permanent underwater cameras (e.g. the Fish4Knowledge (F4K) project, for the observation of Taiwan’s coral reefs) allow to gather huge quantities of video data, without interfering with the observed environment, but at the same time require the development of automatic processing tools, since manual analysis would be impractical for such amounts of videos. Event detection is one of the most interesting aspects from the biologists’ point of view, since it allows the analysis of fish activity during particular events, such as typhoons. In order to achieve this goal, in this paper we present an automatic video analysis approach for fish behavior understanding during typhoon events. The first step of the proposed system, therefore, involves the detection of “typhoon” events and it is based on video texture analysis and on classification by means of Support Vector Machines (SVM). As part of our behavior understanding efforts, trajectory extraction and clustering have been performed to study the differences in behavior when disruptive events happen. The integration of event detection with fish behavior understanding surpasses the idea of simply detecting events by low-level features analysis, as it supports the full semantic comprehension of interesting events.
machine vision applications | 2014
Concetto Spampinato; Emmanuelle Beauxis-Aussalet; Simone Palazzo; Cigdem Beyan; Jacco van Ossenbruggen; Jiyin He; Bas Boom; Xuan Huang
Understanding and analyzing fish behaviour is a fundamental task for biologists that study marine ecosystems because the changes in animal behaviour reflect environmental conditions such as pollution and climate change. To support investigators in addressing these complex questions, underwater cameras have been recently used. They can continuously monitor marine life while having almost no influence on the environment under observation, which is not the case with observations made by divers for instance. However, the huge quantity of recorded data make the manual video analysis practically impossible. Thus machine vision approaches are needed to distill the information to be investigated. In this paper, we propose an automatic event detection system able to identify solitary and pairing behaviours of the most common fish species of the Taiwanese coral reef. More specifically, the proposed system employs robust low-level processing modules for fish detection, tracking and recognition that extract the raw data used in the event detection process. Then each fish trajectory is modeled and classified using hidden Markov models. The events of interest are detected by integrating end-user rules, specified through an ad hoc user interface, and the analysis of fish trajectories. The system was tested on 499 events of interest, divided into solitary and pairing events for each fish species. It achieved an average accuracy of 0.105, expressed in terms of normalized detection cost. The obtained results are promising, especially given the difficulties occurring in underwater environments. And moreover, it allows marine biologists to speed up the behaviour analysis process, and to reliably carry on their investigations.
international conference on image processing | 2012
Concetto Spampinato; Simone Palazzo; Daniela Giordano
Visual tracking is a topic on which a lot of scientific work has been carried out in the last years. An important aspect of tracking algorithms is the performance evaluation, which has been carried out typically through hand-labeled ground-truth data. Since the manual generation of ground truth is a time-consuming, error-prone and tedious task, recently many researchers have focused their attention on self-evaluation techniques for performance analysis. In this paper we propose a novel tool that enables image processing researchers to test the performance of tracking algorithms without resorting to hand-labeled ground truth data. The proposed approach consists of computing a set of features describing shape, appearance and motion of the tracked objects and combining them through a naive Bayesian classifier, in order to obtain a probability score representing the overall evaluation of each tracking decision. The method was tested on three different targets (vehicles, humans and fish) with three different tracking algorithms and the results show how this approach is able to reflect the quality of the performed tracking.
computer vision and pattern recognition | 2017
Concetto Spampinato; Simone Palazzo; Isaak Kavasidis; Daniela Giordano; Nasim Souly; Mubarak Shah
What if we could effectively read the mind and transfer human visual capabilities to computer vision methods? In this paper, we aim at addressing this question by developing the first visual object classifier driven by human brain signals. In particular, we employ EEG data evoked by visual object stimuli combined with Recurrent Neural Networks (RNN) to learn a discriminative brain activity manifold of visual categories in a reading the mind effort. Afterward, we transfer the learned capabilities to machines by training a Convolutional Neural Network (CNN)–based regressor to project images onto the learned manifold, thus allowing machines to employ human brain–based features for automated visual classification. We use a 128-channel EEG with active electrodes to record brain activity of several subjects while looking at images of 40 ImageNet object classes. The proposed RNN-based approach for discriminating object classes using brain signals reaches an average accuracy of about 83%, which greatly outperforms existing methods attempting to learn EEG visual object representations. As for automated object categorization, our human brain–driven approach obtains competitive performance, comparable to those achieved by powerful CNN models and it is also able to generalize over different visual datasets.
international conference on image processing | 2012
Marco Aldinucci; Concetto Spampinato; Maurizio Drocco; Massimo Torquati; Simone Palazzo
In this paper a two-phase filter for removing “salt and pepper” noise is proposed. In the first phase, an adaptive median filter is used to identify the set of the noisy pixels; in the second phase, these pixels are restored according to a regularization method, which contains a data-fidelity term reflecting the impulse noise characteristics. The algorithm, which exhibits good performance both in denoising and in restoration, can be easily and effectively parallelized to exploit the full power of multi-core CPUs and GPGPUs; the proposed implementation based on the FastFlow library achieves both close-to-ideal speedup and very good wall-clock execution figures.