Ferda Ofli
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ferda Ofli.
international conference of the ieee engineering in medicine and biology society | 2012
Stepán Obdrzálek; Gregorij Kurillo; Ferda Ofli; Ruzena Bajcsy; Edmund Seto; Holly Jimison; Michael Pavel
The Microsoft Kinect camera is becoming increasingly popular in many areas aside from entertainment, including human activity monitoring and rehabilitation. Many people, however, fail to consider the reliability and accuracy of the Kinect human pose estimation when they depend on it as a measuring system. In this paper we compare the Kinect pose estimation (skeletonization) with more established techniques for pose estimation from motion capture data, examining the accuracy of joint localization and robustness of pose estimation with respect to the orientation and occlusions. We have evaluated six physical exercises aimed at coaching of elderly population. Experimental results present pose estimation accuracy rates and corresponding error bounds for the Kinect system.
Journal of Visual Communication and Image Representation | 2014
Ferda Ofli; Rizwan Chaudhry; Gregorij Kurillo; René Vidal; Ruzena Bajcsy
Much of the existing work on action recognition combines simple features (e.g., joint angle trajectories, optical flow, spatio-temporal video features) with somewhat complex classifiers or dynamical models (e.g., kernel SVMs, HMMs, LDSs, deep belief networks). Although successful, these approaches represent an action with a set of parameters that usually do not have any physical meaning. As a consequence, such approaches do not provide any qualitative insight that relates an action to the actual motion of the body or its parts. For example, it is not necessarily the case that clapping can be correlated to hand motion or that walking can be correlated to a specific combination of motions from the feet, arms and body. In this paper, we propose a new representation of human actions called Sequence of the Most Informative Joints (SMIJ), which is extremely easy to interpret. At each time instant, we automatically select a few skeletal joints that are deemed to be the most informative for performing the current action. The selection of joints is based on highly interpretable measures such as the mean or variance of joint angles, maximum angular velocity of joints, etc. We then represent an action as a sequence of these most informative joints. Our experiments on multiple databases show that the proposed representation is very discriminative for the task of human action recognition and performs better than several state-of-the-art algorithms.
workshop on applications of computer vision | 2013
Ferda Ofli; Rizwan Chaudhry; Gregorij Kurillo; René Vidal; Ruzena Bajcsy
Over the years, a large number of methods have been proposed to analyze human pose and motion information from images, videos, and recently from depth data. Most methods, however, have been evaluated on datasets that were too specific to each application, limited to a particular modality, and more importantly, captured under unknown conditions. To address these issues, we introduce the Berkeley Multimodal Human Action Database (MHAD) consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones. This controlled multimodal dataset provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains. To demonstrate possible use of MHAD for action recognition, we compare results using the popular Bag-of-Words algorithm adapted to each modality independently with the results of various combinations of modalities using the Multiple Kernel Learning. Our comparative results show that multimodal analysis of human motion yields better action recognition rates than unimodal analysis.
computer vision and pattern recognition | 2013
Rizwan Chaudhry; Ferda Ofli; Gregorij Kurillo; Ruzena Bajcsy; René Vidal
Over the last few years, with the immense popularity of the Kinect, there has been renewed interest in developing methods for human gesture and action recognition from 3D data. A number of approaches have been proposed that extract representative features from 3D depth data, a reconstructed 3D surface mesh or more commonly from the recovered estimate of the human skeleton. Recent advances in neuroscience have discovered a neural encoding of static 3D shapes in primate infero-temporal cortex that can be represented as a hierarchy of medial axis and surface features. We hypothesize a similar neural encoding might also exist for 3D shapes in motion and propose a hierarchy of dynamic medial axis structures at several spatio-temporal scales that can be modeled using a set of Linear Dynamical Systems (LDSs). We then propose novel discriminative metrics for comparing these sets of LDSs for the task of human activity recognition. Combined with simple classification frameworks, our proposed features and corresponding hierarchical dynamical models provide the highest human activity recognition rates as compared to state-of-the-art methods on several skeletal datasets.
computer vision and pattern recognition | 2012
Ferda Ofli; Rizwan Chaudhry; Gregorij Kurillo; René Vidal; Ruzena Bajcsy
Much of the existing work on action recognition combines simple features (e.g., joint angle trajectories, optical flow, spatio-temporal video features) with somewhat complex classifiers or dynamical models (e.g., kernel SVMs, HMMs, LDSs, deep belief networks). Although successful, these approaches represent an action with a set of parameters that usually do not have any physical meaning. As a consequence, such approaches do not provide any qualitative insight that relates an action to the actual motion of the body or its parts. For example, it is not necessarily the case that clapping can be correlated to hand motion or that walking can be correlated to a specific combination of motions from the feet, arms and body. In this paper, we propose a new representation of human actions called Sequence of the Most Informative Joints (SMIJ), which is extremely easy to interpret. At each time instant, we automatically select a few skeletal joints that are deemed to be the most informative for performing the current action. The selection of joints is based on highly interpretable measures such as the mean or variance of joint angles, maximum angular velocity of joints, etc. We then represent an action as a sequence of these most informative joints. Our experiments on multiple databases show that the proposed representation is very discriminative for the task of human action recognition and performs better than several state-of-the-art algorithms.
ieee international conference on healthcare informatics | 2015
Qifei Wang; Gregorij Kurillo; Ferda Ofli; Ruzena Bajcsy
Microsoft Kinect camera and its skeletal tracking capabilities have been embraced by many researchers and commercial developers in various applications of real-time human movement analysis. In this paper, we evaluate the accuracy of the human kinematic motion data in the first and second generation of the Kinect system, and compare the results with an optical motion capture system. We collected motion data in 12 exercises for 10 different subjects and from three different viewpoints. We report on the accuracy of the joint localization and bone length estimation of Kinect skeletons in comparison to the motion capture. We also analyze the distribution of the joint localization offsets by fitting a mixture of Gaussian and uniform distribution models to determine the outliers in the Kinect motion data. Our analysis shows that overall Kinect 2 has more robust and more accurate tracking of human pose as compared to Kinect 1.
IEEE Journal of Biomedical and Health Informatics | 2016
Ferda Ofli; Gregorij Kurillo; Stepán Obdrzálek; Ruzena Bajcsy; Holly Jimison; Misha Pavel
Although the positive effects of exercise on the well-being and quality of independent living for older adults are well accepted, many elderly individuals lack access to exercise facilities, or the skills and motivation to perform exercise at home. To provide a more engaging environment that promotes physical activity, various fitness applications have been proposed. Many of the available products, however, are geared toward a younger population and are not appropriate or engaging for an older population. To address these issues, we developed an automated interactive exercise coaching system using the Microsoft Kinect. The coaching system guides users through a series of video exercises, tracks and measures their movements, provides real-time feedback, and records their performance over time. Our system consists of exercises to improve balance, flexibility, strength, and endurance, with the aim of reducing fall risk and improving performance of daily activities. In this paper, we report on the development of the exercise system, discuss the results of our recent field pilot study with six independently living elderly individuals, and highlight the lessons learned relating to the in-home system setup, user tracking, feedback, and exercise performance evaluation.
computer vision and pattern recognition | 2017
Amaia Salvador; Nicholas Hynes; Yusuf Aytar; Javier Marin; Ferda Ofli; Ingmar Weber; Antonio Torralba
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available
international conference on 3d vision | 2015
Qifei Wang; Gregorij Kurillo; Ferda Ofli; Ruzena Bajcsy
In this paper, we propose a method for temporal segmentation of human repetitive actions based on frequency analysis of kinematic parameters, zero-velocity crossing detection, and adaptive k-means clustering. Since the human motion data may be captured with different modalities which have different temporal sampling rate and accuracy (e.g., Optical motion capture systems vs. Microsoft Kinect), we first apply a generic full-body kinematic model with an unscented Kalman filter to convert the motion data into a unified representation that is robust to noise. Furthermore, we extract the most representative kinematic parameters via the primary frequency analysis. The sequences are segmented based on zero-velocity crossing of the selected parameters followed by an adaptive k-means clustering to identify the repetition segments. Experimental results demonstrate that for the motion data captured by both the motion capture system and the Microsoft Kinect, our proposed algorithm obtains robust segmentation of repetitive action sequences.
IEEE Transactions on Multimedia | 2012
Ferda Ofli; Engin Erzin; Yücel Yemez; A.M. Tekalp
We propose a novel framework for learning many-to-many statistical mappings from musical measures to dance figures towards generating plausible music-driven dance choreographies. We obtain music-to-dance mappings through use of four statistical models: 1) musical measure models, representing a many-to-one relation, each of which associates different melody patterns to a given dance figure via a hidden Markov model (HMM); 2) exchangeable figures model, which captures the diversity in a dance performance through a one-to-many relation, extracted by unsupervised clustering of musical measure segments based on melodic similarity; 3) figure transition model, which captures the intrinsic dependencies of dance figure sequences via an n-gram model; 4) dance figure models, capturing the variations in the way particular dance figures are performed, by modeling the motion trajectory of each dance figure via an HMM. Based on the first three of these statistical mappings, we define a discrete HMM and synthesize alternative dance figure sequences by employing a modified Viterbi algorithm. The motion parameters of the dance figures in the synthesized choreography are then computed using the dance figure models. Finally, the generated motion parameters are animated synchronously with the musical audio using a 3-D character model. Objective and subjective evaluation results demonstrate that the proposed framework is able to produce compelling music-driven choreographies.