Atsuhiro Kojima
Osaka Prefecture University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Atsuhiro Kojima.
International Journal of Computer Vision | 2002
Atsuhiro Kojima; Takeshi Tamura; Kunio Fukunaga
We propose a method for describing human activities from video images based on concept hierarchies of actions. Major difficulty in transforming video images into textual descriptions is how to bridge a semantic gap between them, which is also known as inverse Hollywood problem. In general, the concepts of events or actions of human can be classified by semantic primitives. By associating these concepts with the semantic features extracted from video images, appropriate syntactic components such as verbs, objects, etc. are determined and then translated into natural language sentences. We also demonstrate the performance of the proposed method by several experiments.
international conference on pattern recognition | 2000
Atsuhiro Kojima; Masao Izumi; Takeshi Tamura; Kunio Fukunaga
In visual surveillance applications, it is becoming popular to perceive video images and to interpret them using natural language concepts. We propose an approach to generating a natural language description of human behavior appearing in real video images. First, a head region of a human, on behalf of the whole body, is extracted from each frame. Using a model based method, three dimensional pose and position of the head are estimated. Next, the trajectory of these parameters is divided into segments of monotonous motions. For each segment, we evaluate conceptual features such as degree of change of pose and position and that of relative distance to some objects in the surroundings, and so on. By calculating the product of these feature values, a most suitable verb is selected and other syntactic elements are supplied. Finally natural language text is generated using a technique of machine translation.
international conference on pattern recognition | 2002
Atsuhiro Kojima; Takeshi Tamura; Kunio Fukunaga
We propose a method for describing human activities from video images by tracking human skin regions: facial and hand regions. To detect skin regions robustly, three kinds of probabilistic information are extracted and integrated using Dempster-Shafer theory. The main difficulty in transforming video images into textual descriptions is bridging the semantic gap between them. By associating visual features of head and hand motion with natural language concepts, appropriate syntactic components such as verbs, objects, etc. are determined and translated into natural language.
international conference on pattern recognition | 2004
Mirai Higuchi; Shigeki Aoki; Atsuhiro Kojima; Kunio Fukunaga
In this paper, we propose a novel method for scene recognition using video images through the analysis of human activities. We aim at recognizing three kinds of things such as human activities, objects and environment. In the previous method, locations and orientations of objects are estimated using shape models, which are often claimed to be dependent upon individual scene. Instead of shape models, we employ conceptual knowledge about function and/or usage of objects as well as that about human actions. In our method, the location and usage of objects can be identified by observing interaction of human with them.
international conference on innovative computing, information and control | 2008
Atsuhiro Kojima; Mamoru Takaya; Shigeki Aoki; Takao Miyamoto; Kunio Fukunaga
In this paper, we propose a method for recognizing human actions and objects and translating them into natural language text. First, 3D environmental map is constructed by accumulating range maps captured from a 3D range sensor mounted on a mobile robot. Then, pose of a person in the scene is estimated by fitting articulated cylindrical model and also object is recognized by matching 3D models. On condition that the person handles some objects, interaction with the object is classified. Finally, using conceptual model representing human actions and related objects, a natural language expression which is most suitable to explain the persons action is generated.
ieee conference on cybernetics and intelligent systems | 2004
Shigeki Aoki; Masaki Onishi; Atsuhiro Kojima; Kunio Fukunaga
In general, it is possible to find certain behavioral patterns in human daily activity. Such patterns are called as daily behavioral patterns. The purpose of this research is to learn and recognize behavioral patterns. In the previous methods, it is difficult to recognize in detail how a person acts in a room because the methods recognize only a sequence of existing position of human by using the information of infrared sensors or of switching on/off of electrical appliances. On the other hand, many have proposed the methods recognizing human motions from sequential images, in most of which motion models must be prepared in advance. In this paper, we propose a method for learning and recognizing motions of human without any motion models. In addition, we also propose perceptive methods of recognizing behavioral patterns by taking not only the sequence of position but also the sequence of motion into consideration. Experiments show that our approach is able to learn and recognize human behavior and confirm effectiveness of our method
international conference on pattern recognition | 2006
Masakatsu Mitani; Mamoru Takaya; Atsuhiro Kojima; Kunio Fukunaga
In this paper, we propose a novel method for recognizing environment based on relationship between human actions and objects for a mobile robot. Most of previous works on environment recognition for robots focused on generating obstacle maps for path-planning. In addition, model-based object recognition techniques are also used for searching particular objects. It is, however, difficult in reality to prepare a lot of models in advance for recognizing various objects in unknown environments. On the other hand, human can often recognize objects not from their appearances but by watching other person taking actions on them. This is because the function and/or the usage of the objects are closely related with human actions. We have introduced conceptual models of human actions and objects for classifying objects by observing human activities in our previous work. In this paper, we apply this key idea to a mobile robot. We also demonstrate that the arrangement of objects can be recognized by analyzing human actions
international conference on innovative computing, information and control | 2006
Masahiro Saitou; Atsuhiro Kojima; Tadahiro Kitahashi; Kunio Fukunaga
In the field of image understanding, it is becoming popular to recognize human actions and objects in a scene to improve human-computer interaction. Recently, various methods for scene recognition have been proposed, and most of them focus on recognizing human actions and objects separately. We consider, however, it is important to recognize them complementary, because human actions are closely related with objects. The relationship between actions and objects are represented by hierarchical models. First, movement of human head and hands are tracked by stereo vision. Features of actions such as position and direction of the head and hands are extracted and input to dynamic Bayesian networks to classify actions roughly. Then the actions and related objects are refined by using conceptual models of human actions and objects. Finally, detailed actions and objects are recognized by referring each models cooperatively
international conference on pattern recognition | 2010
Atsuhiro Kojima; Hiroshi Miki; Koichi Kise
In this paper, we propose a novel method for recognizing objects by observing human actions based on bag-of-features. The key contribution of our method is that human actions are represented as n-grams of symbols and used to identify specific object categories. First, features of human actions taken on a object are extracted from video images and encoded to symbols. Then, n-grams are generated from the sequence of symbols and registered for corresponding object category. For recognition phase, actions taken on the object are converted into set of n-grams in the same way and compared with ones representing object categories. We performed experiments to recognize objects in an office environment and confirmed the effectiveness of our method.
international conference on innovative computing, information and control | 2007
Atsuhiro Kojima; Shigeki Aoki; Takao Miyamoto; Kunio Fukunaga
In this paper, we propose a method for generating natural language annotation automatically from video sequences taken by handy camera. In general, video image and natural language are totally different form of information. The main challenge is how to bridge the gap between them and to translate one to another. In our method, an appropriate predicate is selected for expressing video contents by combining semantic features extracted from video sequences. It is, however, difficult to select most appropriate words for expressing the contents. Therefore we construct a concept hierarchy of motions and situations to select verbs from coarse to fine. We perform some experiments to test the effectiveness of the proposed method, and confirmed appropriate annotations are generated.