Antonio Pertusa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Antonio Pertusa is active.

Explore More

Publication

Featured researches published by Antonio Pertusa.

international conference on acoustics, speech, and signal processing | 2008

Multiple fundamental frequency estimation using Gaussian smoothness

Antonio Pertusa; José M. Iñesta

The goal of a polyphonic music transcription system is to extract a score from an audio signal. A multiple fundamental frequency estimator is the main piece of these systems, whereas tempo detection and key estimation complement them to correctly extract the score. In this work, in order to detect the fundamental frequencies that are present in a signal, a set of candidates are selected from the spectrum, and all their possible combinations are generated. The best combination is chosen in a frame by frame analysis by applying a set of rules, taking into account the harmonic amplitudes and the spectral smoothness measure described in this work. The system was evaluated and compared to other works, yielding competitive results and performance.

Lecture Notes in Computer Science | 2006

Modelling expressive performance: a regression tree approach based on strongly typed genetic programming

Amaury Hazan; Rafael Ramirez; Esteban Maestre; Alfonso Pérez; Antonio Pertusa

This paper presents a novel Strongly-Typed Genetic Programming approach for building Regression Trees in order to model expressive music performance. The approach consists of inducing a Regression Tree model from training data (monophonic recordings of Jazz standards) for transforming an inexpressive melody into an expressive one. The work presented in this paper is an extension of [1], where we induced general expressive performance rules explaining part of the training examples. Here, the emphasis is on inducing a generative model (i.e. a model capable of generating expressive performances) which covers all the training examples. We present our evolutionary approach for a one-dimensional regression task: the performed note duration ratio prediction. We then show the encouraging results of experiments with Jazz musical material, and sketch the milestones which will enable the system to generate expressive music performance in a broader sense.

Pattern Recognition Letters | 2005

Polyphonic monotimbral music transcription using dynamic networks

Antonio Pertusa; José M. Iñesta

The automatic extraction of the notes that were played in a digital musical signal (automatic music transcription) is an open problem. A number of techniques have been applied to solve it without concluding results. The monotimbral polyphonic version of the problem is posed here: a single instrument has been played and more than one note can sound at the same time. This work tries to approach it through the identification of the pattern of a given instrument in the frequency domain. This is achieved using time-delay neural networks that are fed with the band-grouped spectrogram of a polyphonic monotimbral music recording. The use of a learning scheme based on examples like neural networks permits our system to avoid the use of an auditory model to approach this problem. A number of issues have to be faced to have a robust and powerful system, but promising results using synthesized instruments are presented.

iberoamerican congress on pattern recognition | 2005

Recognition of note onsets in digital music using semitone bands

Antonio Pertusa; Anssi Klapuri; José M. Iñesta

A simple note onset detection system for music is presented in this work. To detect onsets, a 1/12 octave filterbank is simulated in the frequency domain and the band derivatives in time are considered. The first harmonics of a tuned instrument are close to the center frequency of these bands and, in most instruments, these harmonics are those with the highest amplitudes. The goal of this work is to make a musically motivated system which is sensitive on onsets in music but robust against the spectrum variations that occur at times that do not represent onsets. Therefore, the system tries to find semitone variations, which correspond to note onsets. Promising results are presented for this real time onset detection system.

EURASIP Journal on Advances in Signal Processing | 2012

Efficient methods for joint estimation of multiple fundamental frequencies in music signals

Antonio Pertusa; José M. Iñesta

This study presents efficient techniques for multiple fundamental frequency estimation in music signals. The proposed methodology can infer harmonic patterns from a mixture considering interactions with other sources and evaluate them in a joint estimation scheme. For this purpose, a set of fundamental frequency candidates are first selected at each frame, and several hypothetical combinations of them are generated. Combinations are independently evaluated, and the most likely is selected taking into account the intensity and spectral smoothness of its inferred patterns. The method is extended considering adjacent frames in order to smooth the detection in time, and a pitch tracking stage is finally performed to increase the temporal coherence. The proposed algorithms were evaluated in MIREX contests yielding state of the art results with a very low computational burden.

machine vision applications | 2017

Staff-line detection and removal using a convolutional neural network

Jorge Calvo-Zaragoza; Antonio Pertusa; Jose Oncina

Staff-line removal is an important preprocessing stage for most optical music recognition systems. Common procedures to solve this task involve image processing techniques. In contrast to these traditional methods based on hand-engineered transformations, the problem can also be approached as a classification task in which each pixel is labeled as either staff or symbol, so that only those that belong to symbols are kept in the image. In order to perform this classification, we propose the use of convolutional neural networks, which have demonstrated an outstanding performance in image retrieval tasks. The initial features of each pixel consist of a square patch from the input image centered at that pixel. The proposed network is trained by using a dataset which contains pairs of scores with and without the staff lines. Our results in both binary and grayscale images show that the proposed technique is very accurate, outperforming both other classifiers and the state-of-the-art strategies considered. In addition, several advantages of the presented methodology with respect to traditional procedures proposed so far are discussed.

Neurocomputing | 2018

MirBot: A collaborative object recognition system for smartphones using convolutional neural networks

Antonio Pertusa; Antonio-Javier Gallego; Marisa Bernabeu

Abstract MirBot is a collaborative application for smartphones that allows users to perform object recognition. This app can be used to take a photograph of an object, select the region of interest and obtain the most likely class (dog, chair, etc.) by means of similarity search using features extracted from a convolutional neural network (CNN). The answers provided by the system can be validated by the user so as to improve the results for future queries. All the images are stored together with a series of metadata, thus enabling a multimodal incremental dataset labeled with synset identifiers from the WordNet ontology. This dataset grows continuously thanks to the users’ feedback, and is publicly available for research. This work details the MirBot object recognition system, analyzes the statistics gathered after more than four years of usage, describes the image classification methodology, and performs an exhaustive evaluation using handcrafted features, neural codes, different transfer learning techniques, PCA compression and metadata, which can be used to improve the image classifier results. The app is freely available at the Apple and Google Play stores.

international conference on multimodal interfaces | 2011

A multimodal music transcription prototype: first steps in an interactive prototype development

Tomás Pérez-García; José M. Iñesta; Pedro J. Ponce de León; Antonio Pertusa

Music transcription consists of transforming an audio signal encoding a music performance in a symbolic representation such as a music score. In this paper, a multimodal and interactive prototype to perform music transcription is presented. The system is oriented to monotimbral transcription, its working domain is music played by a single instrument. This prototype uses three different sources of information to detect notes in a musical audio excerpt. It has been developed to allow a human expert to interact with the system to improve its results. In its current implementation, it offers a limited range of interaction and multimodality. Further development aimed at full interactivity and multimodal interactions is discussed.

Remote Sensing | 2018

Automatic Ship Classification from Optical Aerial Images with Convolutional Neural Networks

Antonio-Javier Gallego; Antonio Pertusa; Pablo Gil

The automatic classification of ships from aerial images is a considerable challenge. Previous works have usually applied image processing and computer vision techniques to extract meaningful features from visible spectrum images in order to use them as the input for traditional supervised classifiers. We present a method for determining if an aerial image of visible spectrum contains a ship or not. The proposed architecture is based on Convolutional Neural Networks (CNN), and it combines neural codes extracted from a CNN with a k-Nearest Neighbor method so as to improve performance. The kNN results are compared to those obtained with the CNN Softmax output. Several CNN models have been configured and evaluated in order to seek the best hyperparameters, and the most suitable setting for this task was found by using transfer learning at different levels. A new dataset (named MASATI) composed of aerial imagery with more than 6000 samples has also been created to train and evaluate our architecture. The experimentation shows a success rate of over 99% for our approach, in contrast with the 79% obtained with traditional methods in classification of ship images, also outperforming other methods based on CNNs. A dataset of images (MWPU VHR-10) used in previous works was additionally used to evaluate the proposed approach. Our best setup achieves a success ratio of 86% with these data, significantly outperforming previous state-of-the-art ship classification methods.

iberian conference on pattern recognition and image analysis | 2013

MirBot: A Multimodal Interactive Image Retrieval System

Antonio Pertusa; Antonio-Javier Gallego; Marisa Bernabeu

This study presents a multimodal interactive image retrieval system for smartphones (MirBot). The application is designed as a collaborative game where users can categorize photographs according to the WordNet hierarchy. After taking a picture, the region of interest of the target can be selected, and the image information is sent with a set of metadata to a server in order to classify the object. The user can validate the category proposed by the system to improve future queries. The result is a labeled database with a structure similar to ImageNet, but with contents selected by the users, fully marked with regions of interest, and with novel metadata that can be useful to constrain the search space in a future work. The MirBot app is freely available on the Apple app store.

Explore More