Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marc Bolaños is active.

Publication


Featured researches published by Marc Bolaños.


IEEE Transactions on Human-Machine Systems | 2017

Toward Storytelling From Visual Lifelogging: An Overview

Marc Bolaños; Mariella Dimiccoli; Petia Radeva

Visual lifelogging consists of acquiring images that capture the daily experiences of the user by wearing a camera over a long period of time. The pictures taken offer considerable potential for knowledge mining concerning how people live their lives; hence, they open up new opportunities for many potential applications in fields including healthcare, security, leisure, and the quantified self. However, automatically building a story from a huge collection of unstructured egocentric data presents major challenges. This paper provides a thorough review of advances made so far in egocentric data analysis and, in view of the current state of the art, indicates new lines of research to move us toward storytelling from visual lifelogging.


Computer Vision and Image Understanding | 2017

SR-clustering: Semantic regularized clustering for egocentric photo streams segmentation

Mariella Dimiccoli; Marc Bolaños; Estefania Talavera; Maedeh Aghaei; Stavri G. Nikolov; Petia Radeva

While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming process. This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments, hence making an important step towards the goal of automatically annotating these photos for browsing and retrieval. In the proposed method, first, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, a vocabulary of concepts is defined in a semantic space by relying on linguistic information. Finally, by exploiting the temporal coherence of concepts in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from event recognition to semantic indexing and summarization. Experimental results over egocentric set of nearly 31,000 images, show the prominence of the proposed approach over state-of-the-art methods.


international conference on pattern recognition | 2016

Simultaneous food localization and recognition

Marc Bolaños; Petia Radeva

The development of automatic nutrition diaries, which would allow to keep track objectively of everything we eat, could enable a whole new world of possibilities for people concerned about their nutrition patterns. With this purpose, in this paper we propose the first method for simultaneous food localization and recognition. Our method is based on two main steps, which consist in, first, produce a food activation map on the input image (i.e. heat map of probabilities) for generating bounding boxes proposals and, second, recognize each of the food types or food-related objects present in each bounding box. We demonstrate that our proposal, compared to the most similar problem nowadays - object localization, is able to obtain high precision and reasonable recall levels with only a few bounding boxes. Furthermore, we show that it is applicable to both conventional and egocentric images.


international conference on multimedia and expo | 2015

Visual summary of egocentric photostreams by representative keyframes

Marc Bolaños; Ricard Mestre; Estefania Talavera; Xavier Giró i Nieto; Petia Radeva

Building a visual summary from an egocentric photostream captured by a lifelogging wearable camera is of high interest for different applications (e.g. memory reinforcement). In this paper, we propose a new summarization method based on keyframes selection that uses visual features extracted by means of a convolutional neural network. Our method applies an unsupervised clustering for dividing the photostreams into events, and finally extracts the most relevant keyframe for each event. We assess the results by applying a blind-taste test on a group of 20 people who assessed the quality of the summaries.


iberian conference on pattern recognition and image analysis | 2015

R-Clustering for Egocentric Video Segmentation

Estefania Talavera; Mariella Dimiccoli; Marc Bolaños; Maedeh Aghaei; Petia Radeva

In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energy-minimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate both techniques in an energy-minimization framework that serves to disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames descriptors. We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.


international conference on artificial neural networks | 2016

Video Description Using Bidirectional Recurrent Neural Networks

Álvaro Peris; Marc Bolaños; Petia Radeva; Francisco Casacuberta

Although traditionally used in the machine translation field, the encoder-decoder framework has been recently applied for the generation of video and image descriptions. The combination of Convolutional and Recurrent Neural Networks in these models has proven to outperform the previous state of the art, obtaining more accurate video descriptions. In this work we propose pushing further this model by introducing two contributions into the encoding stage. First, producing richer image representations by combining object and location information from Convolutional Neural Networks and second, introducing Bidirectional Recurrent Neural Networks for capturing both forward and backward temporal relationships in the input frames.


articulated motion and deformable objects | 2014

Video Segmentation of Life-Logging Videos

Marc Bolaños; Maite Garolera; Petia Radeva

Life-logging devices are characterized by easily collecting huge amount of images. One of the challenges of lifelogging is how to organize the big amount of image data acquired in semantically meaningful segments. In this paper, we propose an energy-based approach for motion-based event segmentation of life-logging sequences of low temporal resolution. The segmentation is reached integrating different kind of image features and classifiers into a graph-cut framework to assure consistent sequence treatment. The results show that the proposed method is promising to create summaries of everyday person’s life.


acm multimedia | 2013

Active labeling application applied to food-related object recognition

Marc Bolaños; Maite Garolera; Petia Radeva

Every day, lifelogging devices, available for recording different aspects of our daily life, increase in number, quality and functions, just like the multiple applications that we give to them. Applying wearable devices to analyse the nutritional habits of people is a challenging application based on acquiring and analyzing life records in long periods of time. However, to extract the information of interest related to the eating patterns of people, we need automatic methods to process large amount of life-logging data (e.g. recognition of food-related objects). Creating a rich set of manually labeled samples to train the algorithms is slow, tedious and subjective. To address this problem, we propose a novel method in the framework of Active Labeling for construct- ing a training set of thousands of images. Inspired by the hierarchical sampling method for active learning [6], we pro- pose an Active forest that organizes hierarchically the data for easy and fast labeling. Moreover, introducing a classifier into the hierarchical structures, as well as transforming the feature space for better data clustering, additionally im- prove the algorithm. Our method is successfully tested to label 89.700 food-related objects and achieves significant reduction in expert time labelling.


arXiv: Computer Vision and Pattern Recognition | 2017

Semantic Summarization of Egocentric Photo Stream Events

Aniol Lidon; Marc Bolaños; Mariella Dimiccoli; Petia Radeva; Maite Garolera; Xavier Giro-i-Nieto

With the rapid increase of users of wearable cameras in recent years and of the amount of data they produce, there is a strong need for automatic retrieval and summarization techniques. This work addresses the problem of automatically summarizing egocentric photo streams captured through a wearable camera by taking an image retrieval perspective. After removing non-informative images by a new CNN-based filter, images are ranked by relevance to ensure semantic diversity and finally re-ranked by a novelty criterion to reduce redundancy. To assess the results, a new evaluation metric is proposed which takes into account the non-uniqueness of the solution. Experimental results applied on a database of 7,110 images from 6 different subjects and evaluated by experts gave 95.74% of experts satisfaction and a Mean Opinion Score of 4.57 out of 5.0.


Journal of Visual Communication and Image Representation | 2018

Egocentric video description based on temporally-linked sequences

Marc Bolaños; Álvaro Peris; Francisco Casacuberta; Sergi Soler; Petia Radeva

Egocentric vision consists in acquiring images along the day from a first person point-of-view using wearable cameras. The automatic analysis of this information allows to discover daily patterns for improving the quality of life of the user. A natural topic that arises in egocentric vision is storytelling, that is, how to understand and tell the story relying behind the pictures. In this paper, we tackle storytelling as an egocentric sequences description problem. We propose a novel methodology that exploits information from temporally neighboring events, matching precisely the nature of egocentric sequences. Furthermore, we present a new method for multimodal data fusion consisting on a multi-input attention recurrent network. We also publish the first dataset for egocentric image sequences description, consisting of 1,339 events with 3,991 descriptions, from 55 days acquired by 11 people. Furthermore, we prove that our proposal outperforms classical attentional encoder-decoder methods for video description.

Collaboration


Dive into the Marc Bolaños's collaboration.

Top Co-Authors

Avatar

Petia Radeva

University of Barcelona

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Francisco Casacuberta

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar

Xavier Giró i Nieto

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Álvaro Peris

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar

Aniol Lidon

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge