Gloria Zen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gloria Zen is active.

Explore More

Publication

Featured researches published by Gloria Zen.

computer vision and pattern recognition | 2011

Earth mover's prototypes: A convex learning approach for discovering activity patterns in dynamic scenes

Gloria Zen; Elisa Ricci

We present a novel approach for automatically discovering spatio-temporal patterns in complex dynamic scenes. Similarly to recent non-object centric methods, we use low level visual cues to detect atomic activities and then construct clip histograms. Differently from previous works, we formulate the task of discovering high level activity patterns as a prototype learning problem where the correlation among atomic activities is explicitly taken into account when grouping clip histograms. Interestingly at the core of our approach there is a convex optimization problem which allows us to efficiently extract patterns at multiple levels of detail. The effectiveness of our method is demonstrated on publicly available datasets.

acm multimedia | 2014

We are not All Equal: Personalizing Models for Facial Expression Analysis with Transductive Parameter Transfer

Enver Sangineto; Gloria Zen; Elisa Ricci; Nicu Sebe

Previous works on facial expression analysis have shown that person specific models are advantageous with respect to generic ones for recognizing facial expressions of new users added to the gallery set. This finding is not surprising, due to the often significant inter-individual variability: different persons have different morphological aspects and express their emotions in different ways. However, acquiring person-specific labeled data for learning models is a very time consuming process. In this work we propose a new transfer learning method to compute personalized models without labeled target data Our approach is based on learning multiple person-specific classifiers for a set of source subjects and then directly transfer knowledge about the parameters of these classifiers to the target individual. The transfer process is obtained by learning a regression function which maps the data distribution associated to each source subject to the corresponding classifiers parameters. We tested our approach on two different application domains, Action Units (AUs) detection and spontaneous pain recognition, using publicly available datasets and showing its advantages with respect to the state-of-the-art both in term of accuracy and computational cost.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

A Prototype Learning Framework Using EMD: Application to Complex Scenes Analysis

Elisa Ricci; Gloria Zen; Nicu Sebe; Stefano Messelodi

In the last decades, many efforts have been devoted to develop methods for automatic scene understanding in the context of video surveillance applications. This paper presents a novel nonobject centric approach for complex scene analysis. Similarly to previous methods, we use low-level cues to individuate atomic activities and create clip histograms. Differently from recent works, the task of discovering high-level activity patterns is formulated as a convex prototype learning problem. This problem results in a simple linear program that can be solved efficiently with standard solvers. The main advantage of our approach is that, using as the objective function the Earth Movers Distance (EMD), the similarity among elementary activities is taken into account in the learning phase. To improve scalability we also consider some variants of EMD adopting L1 as ground distance for 1D and 2D, linear and circular histograms. In these cases, only the similarity between neighboring atomic activities, corresponding to adjacent histogram bins, is taken into account. Therefore, we also propose an automatic strategy for sorting atomic activities. Experimental results on publicly available datasets show that our method compares favorably with state-of-the-art approaches, often outperforming them.

international conference on multimodal interfaces | 2014

Unsupervised Domain Adaptation for Personalized Facial Emotion Recognition

Gloria Zen; Enver Sangineto; Elisa Ricci; Nicu Sebe

The way in which human beings express emotions depends on their specific personality and cultural background. As a consequence, person independent facial expression classifiers usually fail to accurately recognize emotions which vary between different individuals. On the other hand, training a person-specific classifier for each new user is a time consuming activity which involves collecting hundreds of labeled samples. In this paper we present a personalization approach in which only unlabeled target-specific data are required. The method is based on our previous paper [20] in which a regression framework is proposed to learn the relation between the users specific sample distribution and the parameters of her/his classifier. Once this relation is learned, a target classifier can be constructed using only the new users sample distribution to transfer the personalized parameters. The novelty of this paper with respect to [20] is the introduction of a new method to represent the source sample distribution based on using only the Support Vectors of the source classifiers. Moreover, we present here a simplified regression framework which achieves the same or even slightly superior experimental results with respect to [20] but it is much easier to reproduce.

IEEE Transactions on Multimedia | 2016

Learning Personalized Models for Facial Expression Analysis and Gesture Recognition

Gloria Zen; Lorenzo Porzi; Enver Sangineto; Elisa Ricci; Nicu Sebe

Facial expression and gesture recognition algorithms are key enabling technologies for human-computer interaction (HCI) systems. State of the art approaches for automatic detection of body movements and analyzing emotions from facial features heavily rely on advanced machine learning algorithms. Most of these methods are designed for the average user, but the assumption “one-size-fits-all” ignores diversity in cultural background, gender, ethnicity, and personal behavior, and limits their applicability in real-world scenarios. A possible solution is to build personalized interfaces, which practically implies learning person-specific classifiers and usually collecting a significant amount of labeled samples for each novel user. As data annotation is a tedious and time-consuming process, in this paper we present a framework for personalizing classification models which does not require labeled target data. Personalization is achieved by devising a novel transfer learning approach. Specifically, we propose a regression framework which exploits auxiliary (source) annotated data to learn the relation between person-specific sample distributions and parameters of the corresponding classifiers. Then, when considering a new target user, the classification model is computed by simply feeding the associated (unlabeled) sample distribution into the learned regression function. We evaluate the proposed approach in different applications: pain recognition and action unit detection using visual data and gestures classification using inertial measurements, demonstrating the generality of our method with respect to different input data types and basic classifiers. We also show the advantages of our approach in terms of accuracy and computational time both with respect to user-independent approaches and to previous personalization techniques.

international conference on multimedia retrieval | 2016

Mouse Activity as an Indicator of Interestingness in Video

Gloria Zen; Paloma de Juan; Yale Song; Alejandro Jaimes

Automatic detection of interesting moments in video has many real-world applications such as video summarization and efficient online video browsing. In this paper, we present a lightweight and scalable solution to this problem based on user mouse activity while watching video. Unlike previous approaches that analyze video content to infer the interestingness, we leverage the implicit user feedback obtained from thousands of online video watching sessions. This makes our method computationally efficient and scalable to billions of videos. Most importantly, our approach can handle a variety of video genres because we make no assumption on what constitutes interestingness: we let the crowd tell us through their mouse activity. By analyzing 106,212 user sessions collected from a popular online video website, we show that mouse activity is highly indicative of interestingness, and that our approach has competitive performance to several state-of-the-art methods.

international conference on image analysis and processing | 2013

Daily Living Activities Recognition via Efficient High and Low Level Cues Combination and Fisher Kernel Representation

Negar Rostamzadeh; Gloria Zen; Ionut Mironica; Jasper R. R. Uijlings; Nicu Sebe

In this work we propose an efficient method for activity recognition in a daily living scenario. At feature level, we propose a method to extract and combine low- and high-level information and we show that the performance of body pose estimation (and consequently of activity recognition) can be significantly improved. Particularly, we propose an approach extending the pictorial deformable models for the body pose estimation from the state-of-the-art. We show that including low level cues (e.g. optical flow and foreground) together with an off-the-shelf body part detector allows reaching better performance without the need to re-train the detectors. Finally, we apply the Fisher Kernel representation that takes the temporal variation into account and we show that we outperform state-of-the-art methods on a public dataset with daily living activities.

acm ieee international workshop on analysis and retrieval of tracked events and motion in imagery stream | 2013

Nobody likes Mondays: foreground detection and behavioral patterns analysis in complex urban scenes

Gloria Zen; John Krumm; Nicu Sebe; Eric Horvitz; Ashish Kapoor

Streams of images from large numbers of surveillance webcams are available via the web. The continuous monitoring of activities at different locations provides a great opportunity for research on the use of vision systems for detecting actors, objects, and events, and for understanding patterns of activity and anomaly in real-world settings. In this work we show how images available on the web from surveillance webcams can be used as sensors in urban scenarios for monitoring and interpreting states of interest such as traffic intensity. We highlight the power of the cyclical aspect of the lives of people and of cities. We extract from long-term streams of images typical patterns of behavior and anomalous events and situations, based on considerations of day of the week and time of day. The analysis of typia and atypia required a robust method for background subtraction. For this purpose, we present a method based on sparse coding which outperforms state-of-the-art works on complex and crowded scenes.

acm multimedia | 2016

Are Safer Looking Neighborhoods More Lively?: A Multimodal Investigation into Urban Life

Marco De Nadai; Radu L. Vieriu; Gloria Zen; Stefan Dragicevic; Nikhil Naik; Michele Caraviello; César A. Hidalgo; Nicu Sebe; Bruno Lepri

Policy makers, urban planners, architects, sociologists, and economists are interested in creating urban areas that are both lively and safe. But are the safety and liveliness of neighborhoods independent characteristics? Or are they just two sides of the same coin? In a world where people avoid unsafe looking places, neighborhoods that look unsafe will be less lively, and will fail to harness the natural surveillance of human activity. But in a world where the preference for safe looking neighborhoods is small, the connection between the perception of safety and liveliness will be either weak or nonexistent. In this paper we explore the connection between the levels of activity and the perception of safety of neighborhoods in two major Italian cities by combining mobile phone data (as a proxy for activity or liveliness) with scores of perceived safety estimated using a Convolutional Neural Network trained on a dataset of Google Street View images scored using a crowdsourced visual perception survey. We find that: (i) safer looking neighborhoods are more active than what is expected from their population density, employee density, and distance to the city centre; and (ii) that the correlation between appearance of safety and activity is positive, strong, and significant, for females and people over 50, but negative for people under 30, suggesting that the behavioral impact of perception depends on the demographic of the population. Finally, we use occlusion techniques to identify the urban features that contribute to the appearance of safety, finding that greenery and street facing windows contribute to a positive appearance of safety (in agreement with Oscar Newmans defensible space theory). These results suggest that urban appearance modulates levels of human activity and, consequently, a neighborhoods rate of natural surveillance.

international conference on pattern recognition | 2014

Simultaneous Ground Metric Learning and Matrix Factorization with Earth Mover's Distance

Gloria Zen; Elisa Ricci; Nicu Sebe

Non-negative matrix factorization is widely used in pattern recognition as it has been proved to be an effective method for dimensionality reduction and clustering. We propose a novel approach for matrix factorization which is based on Earth Movers Distance (EMD) as a measure of reconstruction error. Differently from previous works on EMD matrix decomposition, we consider a semi-supervised learning setting and we also propose to learn the ground distance parameters. While few previous works have addressed the problem of ground distance computation, these methods do not learn simultaneously the optimal metric and the reconstruction matrices. We demonstrate the effectiveness of the proposed approach both on synthetic data experiments and on a real world scenario, i.e. addressing the problem of complex video scene analysis in the context of video surveillance applications. Our experiments show that our method allows not only to achieve state-of-the-art performance on video segmentation, but also to learn the relationship among elementary activities which characterize the high level events in the video scene.

Explore More