Rasmus Rothe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rasmus Rothe is active.

Explore More

Publication

Featured researches published by Rasmus Rothe.

international conference on computer vision | 2015

DEX: Deep EXpectation of Apparent Age from a Single Image

Rasmus Rothe; Radu Timofte; Luc Van Gool

In this paper we tackle the estimation of apparent age in still face images with deep learning. Our convolutional neural networks (CNNs) use the VGG-16 architecture [13] and are pretrained on ImageNet for image classification. In addition, due to the limited number of apparent age annotated images, we explore the benefit of finetuning over crawled Internet face images with available age. We crawled 0.5 million images of celebrities from IMDB and Wikipedia that we make public. This is the largest public dataset for age prediction to date. We pose the age regression problem as a deep classification problem followed by a softmax expected value refinement and show improvements over direct regression training of CNNs. Our proposed method, Deep EXpectation (DEX) of apparent age, first detects the face in the test image and then extracts the CNN predictions from an ensemble of 20 networks on the cropped face. The CNNs of DEX were finetuned on the crawled images and then on the provided images with apparent age annotations. DEX does not use explicit facial landmarks. Our DEX is the winner (1st place) of the ChaLearn LAP 2015 challenge on apparent age estimation with 115 registered teams, significantly outperforming the human reference.

International Journal of Computer Vision | 2018

Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks

Rasmus Rothe; Radu Timofte; Luc Van Gool

In this paper we propose a deep learning solution to age estimation from a single face image without the use of facial landmarks and introduce the IMDB-WIKI dataset, the largest public dataset of face images with age and gender labels. If the real age estimation research spans over decades, the study of apparent age estimation or the age as perceived by other humans from a face image is a recent endeavor. We tackle both tasks with our convolutional neural networks (CNNs) of VGG-16 architecture which are pre-trained on ImageNet for image classification. We pose the age estimation problem as a deep classification problem followed by a softmax expected value refinement. The key factors of our solution are: deep learned models from large data, robust face alignment, and expected value formulation for age regression. We validate our methods on standard benchmarks and achieve state-of-the-art results for both real and apparent age estimation.

asian conference on computer vision | 2014

Non-Maximum Suppression for Object Detection by Passing Messages between Windows

Rasmus Rothe; Matthieu Guillaumin; Luc Van Gool

Non-maximum suppression (NMS) is a key post-processing step in many computer vision applications. In the context of object detection, it is used to transform a smooth response map that triggers many imprecise object window hypotheses in, ideally, a single bounding-box for each detected object. The most common approach for NMS for object detection is a greedy, locally optimal strategy with several hand-designed components (e.g., thresholds). Such a strategy inherently suffers from several shortcomings, such as the inability to detect nearby objects. In this paper, we try to alleviate these problems and explore a novel formulation of NMS as a well-defined clustering problem. Our method builds on the recent Affinity Propagation Clustering algorithm, which passes messages between data points to identify cluster exemplars. Contrary to the greedy approach, our method is solved globally and its parameters can be automatically learned from training data. In experiments, we show in two contexts – object class and generic object detection – that it provides a promising solution to the shortcomings of the greedy NMS.

ieee international conference on automatic face gesture recognition | 2017

Apparent and Real Age Estimation in Still Images with Deep Residual Regressors on Appa-Real Database

Eirikur Agustsson; Radu Timofte; Sergio Escalera; Xavier Baró; Isabelle Guyon; Rasmus Rothe

After decades of research, the real (biological) age estimation from a single face image reached maturity thanks to the availability of large public face databases and impressive accuracies achieved by recently proposed methods. The estimation of “apparent age” is a related task concerning the age perceived by human observers. Significant advances have been also made in this new research direction with the recent Looking At People challenges. In this paper we make several contributions to age estimation research. (i) We introduce APPA-REAL, a large face image database with both real and apparent age annotations. (ii)We study the relationship between real and apparent age. (iii) We develop a residual age regression method to further improve the performance. (iv) We show that real age estimation can be successfully tackled as an apparent age estimation followed by an apparent to real age residual regression. (v) We graphically reveal the facial regions on which the CNN focuses in order to perform apparent and real age estimation tasks.

international conference on computer vision | 2015

DLDR: Deep Linear Discriminative Retrieval for Cultural Event Classification from a Single Image

Rasmus Rothe; Radu Timofte; Luc Van Gool

In this paper we tackle the classification of cultural events from a single image with a deep learning based method. We use convolutional neural networks (CNNs) with VGG-16 architecture [17], pretrained on ImageNet or the Places205 dataset for image classification, and fine-tuned on cultural events data. CNN features are robustly extracted at 4 different layers in each image. At each layer Linear Discriminant Analysis (LDA) is employed for discriminative dimensionality reduction. An image is represented by the concatenated LDA-projected features from all layers or by the concatenation of CNN pooled features at each layer. The classification is then performed through the Iterative Nearest Neighbors-based Classifier (INNC) [20]. Classification scores are obtained for different image representation setups at train and test. The average of the scores is the output of our deep linear discriminative retrieval (DLDR) system. With 0.80 mean average precision (mAP) DLDR is a top entry for the ChaLearn LAP 2015 cultural event recognition challenge.

asian conference on computer vision | 2016

From face images and attributes to attributes

Robert Torfason; Eirikur Agustsson; Rasmus Rothe; Radu Timofte

The face is an important part of the identity of a person. Numerous applications benefit from the recent advances in prediction of face attributes, including biometrics (like age, gender, ethnicity) and accessories (eyeglasses, hat). We study the attributes’ relations to other attributes and to face images and propose prediction models for them. We show that handcrafted features can be as good as deep features, that the attributes themselves are powerful enough to predict other attributes and that clustering the samples according to their attributes can mitigate the training complexity for deep learning. We set new state-of-the-art results on two of the largest datasets to date, CelebA and Facebook BIG5, by predicting attributes either from face images, from other attributes, or from both face and other attributes. Particularly, on Facebook dataset, we show that we can accurately predict personality traits (BIG5) from tens of ‘likes’ or from only a profile picture and a couple of ‘likes’ comparing positively to human reference.

international conference on machine vision | 2015

Discriminative learning of apparel features

Rasmus Rothe; Marko Ristin; Matthias Dantone; Luc Van Gool

Fashion is a major segment in e-commerce with growing importance and a steadily increasing number of products. Since manual annotation of apparel items is very tedious, the product databases need to be organized automatically, e.g. by image classification. Common image classification approaches are based on features engineered for general purposes which perform poorly on specific images of apparel. We therefore propose to learn discriminative features based on a small set of annotated images. We experimentally evaluate our method on a dataset with 30,000 images containing apparel items, and compare it to other engineered and learned sets of features. The classification accuracy of our features is significantly superior to designed HOG and SIFT features (43.7% and 16.1% relative improvement, respectively). Our method allows for fast feature extraction and training, is easy to implement and, unlike deep convolutional networks, does not require powerful dedicated hardware.

Lecture Notes in Computer Science | 2015

Non-maximum suppression for object detection by passing messages between windows

Rasmus Rothe; Matthieu Guillaumin; Luc Van Gool

Automatically understanding and modeling a user’s liking for an image is a challenging problem. This is because the relationship between the images features (even semantic ones extracted by existing tools, viz. faces, objects etc.) and users’ ‘likes’ is non-linear, influenced by several subtle factors. This work presents a deep bi-modal knowledge representation of images based on their visual content and associated tags (text). A mapping step between the different levels of visual and textual representations allows for the transfer of semantic knowledge between the two modalities. It also includes feature selection before learning deep representation to identify the important features for a user to like an image. Then the proposed representation is shown to be effective in learning a model of users image ‘likes’ based on a collection of images ‘liked’ by him. On a collection of images ‘liked’ by users (from Flickr) the proposed deep representation is shown to better state-of-art low-level features used for modeling user ‘likes’ by around 15–20 %.Non-maximum suppression (NMS) is a key post-processing step in many computer vision applications. In the context of object detection, it is used to transform a smooth response map that triggers many imprecise object window hypotheses in, ideally, a single bounding-box for each detected object. The most common approach for NMS for object detection is a greedy, locally optimal strategy with several hand-designed components (e.g., thresholds). Such a strategy inherently suffers from several shortcomings, such as the inability to detect nearby objects. In this paper, we try to alleviate these problems and explore a novel formulation of NMS as a well-defined clustering problem. Our method builds on the recent Affinity Propagation Clustering algorithm, which passes messages between data points to identify cluster exemplars. Contrary to the greedy approach, our method is solved globally and its parameters can be automatically learned from training data. In experiments, we show in two contexts – object class and generic object detection – that it provides a promising solution to the shortcomings of the greedy NMS.

computer vision and pattern recognition | 2016