Jonatas Wehrmann
Pontifícia Universidade Católica do Rio Grande do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonatas Wehrmann.
international joint conference on neural network | 2016
Gabriel S. Simoes; Jonatas Wehrmann; Rodrigo C. Barros; Duncan D. Ruiz
Automatic pattern recognition from videos is a high-complexity task, and well-established Machine Learning algorithms have difficulties in handling it in an efficient and effective fashion. Convolutional Neural Networks are the state-of-the-art method for supervised image classification, borrowing concepts from image processing in order to ensure some degree of scale and position invariance. They are capable of detecting primary features, which are then combined by subsequent layers of the CNN architecture, resulting in the detection of higher-order complex and relevant novel features. Considering that a video is a set of ordered images in time, we propose in this paper to explore CNNs in the context of movie trailers genre classification. Our contributions are twofold. First, we have developed a novel movie trailers dataset with more than 3500 trailers whose genres are known, and we make it publicly available for the interested reader. Second, we detail a novel classification method that encapsulates a CNN architecture to perform movie trailer genre classification, namely CNN-MoTion, and we compare it with state-of-the-art feature extraction techniques for movie classification such as Gist, CENTRIST, w-CENTRIST, and low-level feature extraction. Results show that our novel method significantly outperforms the current state-of-the-art approaches.
brazilian conference on intelligent systems | 2016
Jonatas Wehrmann; Rodrigo C. Barros; Gabriel S. Simoes; Thomas S. Paula; Duncan D. Ruiz
Learning content from videos is not an easy task and traditional machine learning approaches for computer vision have difficulties in doing it satisfactorily. However, in the past couple of years the machine learning community has seen the rise of deep learning methods that significantly improve the accuracy of several computer vision applications, e.g., Convolutional Neural Networks (ConvNets). In this paper, we explore the suitability of ConvNets for the movie trailers genre classification problem. Assigning genres to movies is particularly challenging because genre is an immaterial feature that is not physically present in a movie frame, so off-the-shelf image detection models cannot be directly applied to this context. Hence, we propose a novel classification method that encapsulates multiple distinct ConvNets to perform genre classification, namely CoNNECT, where each ConvNet learns features that capture distinct aspects from the movie frames. We compare our novel approach with the current state-of-the-art techniques for movie classification, which make use of well-known image descriptors and low-level handcrafted features. Results show that CoNNECT significantly outperforms the state-of-the-art approaches in this task, moving towards effectively solving the genre classification problem.
Applied Soft Computing | 2017
Jonatas Wehrmann; Rodrigo C. Barros
Abstract The task of labeling movies according to their corresponding genre is a challenging classification problem, having in mind that genre is an immaterial feature that cannot be directly pinpointed in any of the movie frames. Hence, off-the-shelf image classification approaches are not capable of handling this task in a straightforward fashion. Moreover, movies may belong to multiple genres at the same time, making movie genre assignment a typical multi-label classification problem, which is per se much more challenging than standard single-label classification. In this paper, we propose a novel deep neural architecture based on convolutional neural networks (ConvNets) for performing multi-label movie-trailer genre classification. It encapsulates an ultra-deep ConvNet with residual connections, and it makes use of a special convolutional layer to extract temporal information from image-based features prior to performing the mapping of movie trailers to genres. We compare the proposed approach with the current state-of-the-art methods for movie classification that employ well-known image descriptors and other low-level handcrafted features. Results show that our method substantially outperforms the state-of-the-art for this task, improving classification performance for all movie genres.
symposium on applied computing | 2017
Jonatas Wehrmann; Rodrigo C. Barros
In this paper, we explore the suitability of employing Convolutional Neural Networks (ConvNets) for multi-label movie trailer genre classification. Assigning genres to movies is a particularly challenging task because genre is an immaterial feature that is not physically present in a movie frame, so off-the-shelf image detection models cannot be easily adapted to this context. Moreover, multi-label classification is more challenging than single-label classification considering that one instance can be assigned to multiple classes at once. We propose a novel classification method that encapsulates an ultra-deep ConvNet with residual connections. Our approach extracts temporal information from image-based features prior to performing the mapping of trailers to genres. We compare our novel approach with the current state-of-the-art techniques for movie classification, which make use of well-known image descriptors and low-level handcrafted features. Results show that our method significantly outperforms the state-of-the-art in this task, improving the classification accuracy for all genres.
Pattern Recognition Letters | 2018
Jonatas Wehrmann; Anderson Mattjie; Rodrigo C. Barros
With the novel and fast advances in the area of deep neural networks, several challenging image-based tasks have been recently approached by researchers in pattern recognition and computer vision. In this paper, we address one of these tasks, which is to match image content with natural language descriptions, sometimes referred as multimodal content retrieval. Such a task is particularly challenging considering that we must find a semantic correspondence between captions and the respective image, a challenge for both computer vision and natural language processing areas. For such, we propose a novel multimodal approach based solely on convolutional neural networks for aligning images with their captions by directly convolving raw characters. Our proposed character-based textual embeddings allow the replacement of both word-embeddings and recurrent neural networks for text understanding, saving processing time and requiring fewer learnable parameters. Our method is based on the idea of projecting both visual and textual information into a common embedding space. For training such embeddings we optimize a contrastive loss function that is computed to minimize order-violations between images and their respective descriptions. We achieve state-of-the-art performance in the largest and most well-known image-text alignment dataset, namely Microsoft COCO, with a method that is conceptually much simpler and that possesses considerably fewer parameters than current approaches.
symposium on applied computing | 2017
Jonatas Wehrmann; Rodrigo C. Barros; Silvia N. das Dôres; Ricardo Cerri
In classification tasks, an object usually belongs to one class within a set of disjoint classes. In more complex tasks, an object can belong to more than one class, in what is conventionally termed multi-label classification. Moreover, there are cases in which the set of classes are organised in a hierarchical fashion, and an object must be associated to a single path in this hierarchy, defining the so-called hierarchical classification. Finally, in even more complex scenarios, the classes are organised in a hierarchical structure and the object can be associated to multiple paths of this hierarchy, defining the problem investigated in this article: hierarchical multi-label classification (HMC). We address a typical problem of HMC, which is protein function prediction, and for that we propose an approach that chains multiple neural networks, performing both local and global optimisation in order to provide the final prediction: one or multiple paths in the hierarchy of classes. We experiment with four variations of this chaining process, and we compare these strategies with the state-of-the-art HMC algorithms for protein function prediction, showing that our novel approach significantly outperforms these methods.
international symposium on neural networks | 2017
Ralph Jose Rassweiler Filho; Jonatas Wehrmann; Rodrigo C. Barros
The movie domain is one of the most common scenarios to test and evaluate recommender systems. These systems are often implemented through a collaborative filtering model, which relies exclusively on the users feedback on items, ignoring content features. Content-based filtering models are nevertheless a potentially good strategy for recommendation, even though identifying relevant semantic representation of items is not a trivial task. Several techniques have been employed to continuously improve the content representation of items in content-based recommender systems, including low-level and high-level features, text analysis, and social tags. Recent advances on deep learning, particularly on convolutional neural networks, are paving the way for better representations to be extracted from unstructured data. In this work, our main goal is to understand whether these networks can extract sufficient semantic representation from items so we can better recommend movies in content-based recommender systems. For that, we propose DeepRecVis, a novel approach that represents items through features extracted from keyframes of the movie trailers, leveraging these features in a content-based recommender system. Experiments shows that our proposed approach outperforms systems that are based on low-level feature representations.
acm symposium on applied computing | 2018
Jonatas Wehrmann; Willian Becker; Rodrigo C. Barros
In this paper, we propose a novel approach for classifying both the sentiment and the language of tweets. Our proposed architecture comprises a convolutional neural network (ConvNet) with two distinct outputs, each of which designed to minimize the classification error of either sentiment assignment or language identification. Results show that our method outperforms both single-task and multi-task state-of-the-art approaches for classifying multilingual tweets.
the florida ai research society | 2017
Willian Becker; Jonatas Wehrmann; Henry E. L. Cagnini; Rodrigo C. Barros
workshop on applications of computer vision | 2018
Jonatas Wehrmann; Mauricio A. Lopes; Martin D. More; Rodrigo C. Barros