Hedvig Kjellström
Royal Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hedvig Kjellström.
Computer Vision and Image Understanding | 2011
Hedvig Kjellström; Javier Romero; Danica Kragic
This paper investigates object categorization according to function, i.e., learning the affordances of objects from human demonstration. Object affordances (functionality) are inferred from observations of humans using the objects in different types of actions. The intended application is learning from demonstration, in which a robot learns to employ objects in household tasks, from observing a human performing the same tasks with the objects. We present a method for categorizing manipulated objects and human manipulation actions in context of each other. The method is able to simultaneously segment and classify human hand actions, and detect and classify the objects involved in the action. This can serve as an initial step in a learning from demonstration method. Experiments show that the contextual information improves the classification of both objects and actions.
european conference on computer vision | 2008
Hedvig Kjellström; Javier Romero; David Martínez; Danica Kragic
The visual analysis of human manipulation actions is of interest for e.g. human-robot interaction applications where a robot learns how to perform a task by watching a human. In this paper, a method for classifying manipulation actions in the context of the objects manipulated, and classifying objects in the context of the actions used to manipulate them is presented. Hand and object features are extracted from the video sequence using a segmentation based approach. A shape based representation is used for both the hand and the object. Experiments show this representation suitable for representing generic shape classes. The action-object correlation over time is then modeled using conditional random fields. Experimental comparison show great improvement in classification rate when the action-object correlation is taken into account, compared to separate classification of manipulation actions and manipulated objects.
international conference on robotics and automation | 2010
Javier Romero; Hedvig Kjellström; Danica Kragic
This paper presents a method for vision based estimation of the pose of human hands in interaction with objects. Despite the fact that most robotics applications of human hand tracking involve grasping and manipulation of objects, the majority of methods in the literature assume a free hand, isolated from the surrounding environment. Our hand tracking method is non-parametric, performing a nearest neighbor search in a large database (100000 entries) of hand poses with and without grasped objects. The system operates in real time, it is robust to self occlusions, object occlusions and segmentation errors, and provides full hand pose reconstruction from markerless video. Temporal consistency in hand pose is taken into account, without explicitly tracking the hand in the high dimensional pose space.
ieee-ras international conference on humanoid robots | 2009
Javier Romero; Hedvig Kjellström; Danica Kragic
Markerless, vision based estimation of human hand pose over time is a prerequisite for a number of robotics applications, such as learning by demonstration (LbD), health monitoring, teleoperation, human-robot interaction. It has special interest in humanoid platforms, where the number of degrees of freedom makes conventional programming challenging. Our primary application is LbD in natural environments where the humanoid robot learns how to grasp and manipulate objects by observing a human performing a task. This paper presents a method for continuous vision based estimation of human hand pose. The method is non-parametric, performing a nearest neighbor search in a large database (100000 entries) of hand pose examples. The main contribution is a real time system, robust to partial occlusions and segmentation errors, that provides full hand pose recognition from markerless data. An additional contribution is the modeling of constraints based on temporal consistency in hand pose, without explicitly tracking the hand in the high dimensional pose space. The pose representation is rich enough to enable a descriptive human-to-robot mapping. Experiments show the pose estimation to be more robust and accurate than a non-parametric method without temporal constraints.
Image and Vision Computing | 2005
Dirk Ormoneit; Michael J. Black; Trevor Hastie; Hedvig Kjellström
We present a robust automatic method for modeling cyclic 3D human motion such as walking using motion-capture data. The pose of the body is represented by a time-series of joint angles which are automatically segmented into a sequence of motion cycles. The mean and the principal components of these cycles are computed using a new algorithm that enforces smooth transitions between the cycles by operating in the Fourier domain. Key to this method is its ability to automatically deal with noise and missing data. A learned walking model is then exploited for Bayesian tracking of 3D human motion.
conference on computers and accessibility | 2005
Olle Bälter; Olov Engwall; Anne-Marie Öster; Hedvig Kjellström
This study has been performed in order to test the human-machine interface of a computer-based speech training aid named ARTUR with the main feature that it can give suggestions on how to improve articulation. Two user groups were involved: three children aged 9-14 with extensive experience of speech training, and three children aged 6. All children had general language disorders.The study indicates that the present interface is usable without prior training or instructions, even for the younger children, although it needs some improvement to fit illiterate children. The granularity of the mesh that classifies mispronunciations was satisfactory, but can be developed further.
conference on computers and accessibility | 2006
Olov Engwall; Olle Bälter; Anne-Marie Öster; Hedvig Kjellström
This study has been performed in order to evaluate a prototype for the human – computer interface of a computer-based speech training aid named ARTUR. The main feature of the aid is that it can give suggestions on how to improve articulations. Two user groups were involved: three children aged 9 – 14 with extensive experience of speech training with therapists and computers, and three children aged 6, with little or no prior experience of computer-based speech training. All children had general language disorders. The study indicates that the present interface is usable without prior training or instructions, even for the younger children, but that more motivational factors should be introduced. The granularity of the mesh that classifies mispronunciations was satisfactory, but the flexibility and level of detail of the feedback should be developed further.
Information Fusion | 2007
Simon Ahlberg; Pontus Hörling; Katarina Johansson; Karsten Jored; Hedvig Kjellström; Christian Mårtenson; Göran Neider; Johan Schubert; Pontus Svenson; Per Svensson; Johan Walter
The Swedish Defence Research Agency (FOI) has developed a concept demonstrator called the Information Fusion Demonstrator 2003 (IFD03) for demonstrating information fusion methodology suitable for a future Network Based Defense (NBD) C4ISR system. The focus of the demonstrator is on real-time tactical intelligence processing at the division level in a ground warfare scenario. The demonstrator integrates novel force aggregation, particle filtering, and sensor allocation methods to create, dynamically update, and maintain components of a tactical situation picture. This is achieved by fusing physically modelled and numerically simulated sensor reports from several different sensor types with realistic a priori information sampled from both a high-resolution terrain model and an enemy organizational and behavioral model. This represents a key step toward the goal of creating in real time a dynamic, high fidelity representation of a moving battalion-sized organization, based on sensor data as well as a priori intelligence and terrain information, employing fusion, tracking, aggregation, and resource allocation methods all built on well-founded theories of uncertainty. The motives behind this project, the fusion methods developed for the system, as well as its scenario model and simulator architecture are described. The main services of the demonstrator are discussed and early experience from using the system is shared.
computer vision and pattern recognition | 2010
Hedvig Kjellström; Danica Kragic; Michael J. Black
While the problem of tracking 3D human motion has been widely studied, most approaches have assumed that the person is isolated and not interacting with the environment. Environmental constraints, however, can greatly constrain and simplify the tracking problem. The most studied constraints involve gravity and contact with the ground plane. We go further to consider interaction with objects in the environment. In many cases, tracking rigid environmental objects is simpler than tracking high-dimensional human motion. When a human is in contact with objects in the world, their poses constrain the pose of body, essentially removing degrees of freedom. Thus what would appear to be a harder problem, combining object and human tracking, is actually simpler. We use a standard formulation of the body tracking problem but add an explicit model of contact with objects. We find that constraints from the world make it possible to track complex articulated human motion in 3D from a monocular camera.
international conference on robotics and automation | 2013
Alessandro Pieropan; Carl Henrik Ek; Hedvig Kjellström
The ability to learn from human demonstration is essential for robots in human environments. The activity models that the robot builds from observation must take both the human motion and the objects involved into account. Object models designed for this purpose should reflect the role of the object in the activity - its function, or affordances. The main contribution of this paper is to represent object directly in terms of their interaction with human hands, rather than in terms of appearance. This enables the direct representation of object affordances/function, while being robust to intra-class differences in appearance. Object hypotheses are first extracted from a video sequence as tracks of associated image segments. The object hypotheses are encoded as strings, where the vocabulary corresponds to different types of interaction with human hands. The similarity between two such object descriptors can be measured using a string kernel. Experiments show these functional descriptors to capture differences and similarities in object affordances/function that are not represented by appearance.