Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ryosuke Kojima is active.

Publication


Featured researches published by Ryosuke Kojima.


intelligent robots and systems | 2016

Semi-automatic bird song analysis by spatial-cue-based integration of sound source detection, localization, separation, and identification

Ryosuke Kojima; Osamu Sugiyama; Reiji Suzuki; Kazuhiro Nakadai; Charles E. Taylor

This paper addresses bird song analysis based on semi-automatic annotation. Research in animal behavior, especially with birds, would be aided by automated (or semiautomated) systems that can localize sounds, measure their timing, and identify their source. This is difficult to achieve in real environments where several birds may be singing from different locations and at the same time. Analysis of recordings from the wild has in the past typically required manual annotation. Such annotation is not always accurate or even consistent, as it may vary both within or between observers. Here we propose a system that uses automated methods from robot audition, including sound source detection, localization, separation and identification. In robot audition these technologies have typically been studied separately; combining them often leads to poor performance in real-time application from the wild. We suggest that integration is aided by placing a primary focus on spatial cues, then combining other features within a Bayesian framework. A second problem has been that supervised machine learning methods typically requires a pre-trained model that may require a large training set of annotated labels. We have employed a semi-automatic annotation approach that requires much less pre-annotation. Preliminary experiments with recordings of bird songs from the wild revealed that for identification accuracy our system outperformed a method based on conventional robot audition.


Journal of robotics and mechatronics | 2017

Acoustic Monitoring of the Great Reed Warbler Using Multiple Microphone Arrays and Robot Audition

Shiho Matsubayashi; Reiji Suzuki; Fumiyuki Saito; Tatsuyoshi Murate; Tomohisa Masuda; Koichi Yamamoto; Ryosuke Kojima; Kazuhiro Nakadai; Hiroshi G. Okuno

This paper reports the results of our field test of HARKBird, a portable system that consists of robot audition, a laptop PC, and omnidirectional microphone arrays. We assessed its localization accuracy to monitor songs of the great reed warbler (Acrocephalus arundinaceus) in time and two-dimensional space by comparing locational and temporal data collected by human observers and HARKBird. Our analysis revealed that stationarity of the singing individual affected the spatial accuracy. Temporally, HARKBird successfully captured the exact song duration in seconds, which cannot be easily achieved by human observers. The data derived from HARKBird suggest that one of the warbler males dominated the sound space. Given the assumption that the cost of the singing activity is represented by song duration in relation to the total recording session, this particular male paid a higher cost of singing, possibly to win the territory of best quality. Overall, this study demonstrated the high potential of HARKBird as an effective alternative to the point count method to survey bird songs in the field.


intelligent robots and systems | 2016

Partially Shared Deep Neural Network in sound source separation and identification using a UAV-embedded microphone array

Takayuki Morito; Osamu Sugiyama; Ryosuke Kojima; Kazuhiro Nakadai

This paper addresses sound source separation and identification for noise-contaminated acoustic signals recorded with a microphone array embedded in an Unmanned Aerial Vehicle (UAV), aiming at peoples voice detection quickly and widely in a disaster situation. The key approach to achieve this is Deep Neural Network (DNN), but it is well known that training a DNN needs a huge dataset to improve its performance. In a practical application, building such a dataset is not often realistic owing to the cost of manual data annotation. Therefore, we propose a Partially-Shared Deep Neural Network (PS-DNN) which can learn multiple tasks at the same time with a small amount of annotated data. Preliminary results show that the PS-DNN outperforms conventional DNN-based approaches which require fully-annotated data in training in terms of identification accuracy. In addition, it maintains performance even when noise-suppressed signals are used for sound source separation training, and partially annotated data is used for sound source identification training.


inductive logic programming | 2014

Goal and Plan Recognition via Parse Trees Using Prefix and Infix Probability Computation

Ryosuke Kojima; Taisuke Sato

We propose new methods for goal and plan recognition based on prefix and infix probability computation in a probabilistic context-free grammar PCFG which are both applicable to incomplete data. We define goal recognition as a task of identifying a goal from an action sequence and plan recognition as that of discovering a plan for the goal consisting of goal-subgoal structure respectively. To achieve these tasks, in particular from incomplete data such as sentences in a PCFG that often occurs in applications, we introduce prefix and infix probability computation via parse trees in PCFGs and compute the most likely goal and plan from incomplete data by considering them as prefixes and infixes. We applied our approach to web session logs taken from the Internet Traffic Archive whose goal and plan recognition is important to improve websites. We tackled the problem of goal recognition from incomplete logs and empirically demonstrated the superiority of our approach compared to other approaches which do not use parsing. We also showed that it is possible to estimate the most likely plans from incomplete logs. All prefix and infix probability computation together with the computation of the most likely goal and plan in this paper is carried out using logic-based modeling language PRISM.


Ecology and Evolution | 2018

A spatiotemporal analysis of acoustic interactions between great reed warblers (Acrocephalus arundinaceus) using microphone arrays and robot audition software HARK

Reiji Suzuki; Shiho Matsubayashi; Fumiyuki Saito; Tatsuyoshi Murate; Tomohisa Masuda; Koichi Yamamoto; Ryosuke Kojima; Kazuhiro Nakadai; Hiroshi G. Okuno

Abstract Acoustic interactions are important for understanding intra‐ and interspecific communication in songbird communities from the viewpoint of soundscape ecology. It has been suggested that birds may divide up sound space to increase communication efficiency in such a manner that they tend to avoid overlap with other birds when they sing. We are interested in clarifying the dynamics underlying the process as an example of complex systems based on short‐term behavioral plasticity. However, it is very problematic to manually collect spatiotemporal patterns of acoustic events in natural habitats using data derived from a standard single‐channel recording of several species singing simultaneously. Our purpose here was to investigate fine‐scale spatiotemporal acoustic interactions of the great reed warbler. We surveyed spatial and temporal patterns of several vocalizing color‐banded great reed warblers (Acrocephalus arundinaceus) using an open‐source software for robot audition HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) and three new 16‐channel, stand‐alone, and water‐resistant microphone arrays, named DACHO spread out in the birds habitat. We first show that our system estimated the location of two color‐banded individuals’ song posts with mean error distance of 5.5 ± 4.5 m from the location of observed song posts. We then evaluated the temporal localization accuracy of the songs by comparing the duration of localized songs around the song posts with those annotated by human observers, with an accuracy score of average 0.89 for one bird that stayed at one song post. We further found significant temporal overlap avoidance and an asymmetric relationship between songs of the two singing individuals, using transfer entropy. We believe that our system and analytical approach contribute to a better understanding of fine‐scale acoustic interactions in time and space in bird communities.


Journal of robotics and mechatronics | 2017

Bird Song Scene Analysis Using a Spatial-Cue-Based Probabilistic Model

Ryosuke Kojima; Osamu Sugiyama; Kotaro Hoshiba; Kazuhiro Nakadai; Reiji Suzuki; Charles E. Taylor

∗ Graduate School of Information Science and Engineering, Tokyo Institute of Technology 2-12-1, O-okayama, Meguro-ku, Tokyo, 152-8552, JAPAN. [email protected] ∗∗Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology ∗∗∗ Honda Research Institute Japan Co., Ltd., 8-1 Honcho, Wako, Saitama,351-0114, JAPAN. ∗∗∗∗ Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan. ∗∗∗∗∗ Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, CA 90095, USA.


intelligent robots and systems | 2015

Audio-visual scene understanding utilizing text information for a cooking support robot

Ryosuke Kojima; Osamu Sugiyama; Kazuhiro Nakadai

This paper addresses multimodal “scene understanding” for a robot using audio-visual and text information. Scene understanding is defined by extracting six-W information such as What, When, Where, Who, Why, and hoW on the surrounding environment. Although scene understanding for a robot has been studied in the fields of robot vision and audition, only the first four Ws except for why and how information were considered. We, thus, focus on extracting how information, in particular, on cooking scenes. In cooking scenes, we define how information as a cooking procedure, and it is useful that a robot gives appropriate advice for cooking. To realize such cooking support, we propose a multi-modal cooking procedure recognition framework consisting of Convolutional Neural Network (CNN), and Hierarchical Hidden Markov Model (HHMM). CNN is knows as one of the most advanced classifiers, and it is applied to recognize a cooking events from audio and visual information. HHMM models a cooking procedure represented by a sequence of cooking events, which is defined as a relationship between cooking events using text data obtained from web, and the cooking events classified with CNN. Therefore, our proposed framework integrates these three types of modalities. We constructed an interactive cooking support system based on the proposed framework, which advice a next step in the current cooking procedure through human-robot communication. Preliminary results with simulated and real recorded multi-modal scenes showed the robustness of the proposed framework in a noisy and/or occluded situation.


PLOS ONE | 2018

Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization

Mizuho Nishio; Mitsuo Nishizawa; Osamu Sugiyama; Ryosuke Kojima; Masahiro Yakami; Tomohiro Kuroda; Kaori Togashi

We aimed to evaluate a computer-aided diagnosis (CADx) system for lung nodule classification focussing on (i) usefulness of the conventional CADx system (hand-crafted imaging feature + machine learning algorithm), (ii) comparison between support vector machine (SVM) and gradient tree boosting (XGBoost) as machine learning algorithms, and (iii) effectiveness of parameter optimization using Bayesian optimization and random search. Data on 99 lung nodules (62 lung cancers and 37 benign lung nodules) were included from public databases of CT images. A variant of the local binary pattern was used for calculating a feature vector. SVM or XGBoost was trained using the feature vector and its corresponding label. Tree Parzen Estimator (TPE) was used as Bayesian optimization for parameters of SVM and XGBoost. Random search was done for comparison with TPE. Leave-one-out cross-validation was used for optimizing and evaluating the performance of our CADx system. Performance was evaluated using area under the curve (AUC) of receiver operating characteristic analysis. AUC was calculated 10 times, and its average was obtained. The best averaged AUC of SVM and XGBoost was 0.850 and 0.896, respectively; both were obtained using TPE. XGBoost was generally superior to SVM. Optimal parameters for achieving high AUC were obtained with fewer numbers of trials when using TPE, compared with random search. Bayesian optimization of SVM and XGBoost parameters was more efficient than random search. Based on observer study, AUC values of two board-certified radiologists were 0.898 and 0.822. The results show that diagnostic accuracy of our CADx system was comparable to that of radiologists with respect to classifying lung nodules.


International Journal of Approximate Reasoning | 2018

Learning to rank in PRISM

Ryosuke Kojima; Taisuke Sato

Abstract Learning parameters associated with propositions is one of the main tasks of probabilistic logic programming (PLP), and learning algorithms for PLP have been primarily developed based on maximum likelihood estimation or the optimization of discriminative criteria. This paper explores yet another innovative approach to parameter learning, learning to rank or rank learning, that has been studied mainly in the field of preference learning. We combine learning to rank with techniques developed in PLP to make the latter applicable to a variety of ranking problems such as information retrieval. We implement our approach in PRISM, a PLP system based on the distribution semantics. It supports many parameter learning algorithms such as the expectation maximization algorithm, the variational Bayes algorithm and an algorithm for Viterbi training efficiently by mapping them onto a single data structure called explanation graph. To ensure the same efficiency for parameter learning by learning to rank as in the current PRISM, we introduce a gradient-based learning method that takes advantage of dynamic programming on the explanation graph. This paper also presents three experimental results. The first one is with synthetic data to check the learning behaviors of the proposed approach. The second one uses a knowledge base (knowledge graph) and apply rank learning to a DistMult model for the task of deciding whether relations over entities exist or not. The last one tackles the problem of parsing by a probabilistic context free grammar whose parameters are learned from a tree corpus by rank learning. These experiments successfully demonstrated the potential and effectiveness of learning to rank in PLP. We plan to release a new version of PRISM augmented with the ability of learning to rank in the near future.


intelligent robots and systems | 2017

Development of microphone-array-embedded UAV for search and rescue task

Kazuhiro Nakadai; Makoto Kumon; Hiroshi G. Okuno; Kotaro Hoshiba; Mizuho Wakabayashi; Kai Washizaki; Takahiro Ishiki; Daniel Gabriel; Yoshiaki Bando; Takayuki Morito; Ryosuke Kojima; Osamu Sugiyama

This paper addresses online outdoor sound source localization using a microphone array embedded in an unmanned aerial vehicle (UAV). In addition to sound source localization, sound source enhancement and robust communication method are also described. This system is one instance of deployment of our continuously developing open source software for robot audition called HARK (Honda Research Institute Japan Audition for Robots with Kyoto University). To improve the robustness against outdoor acoustic noise, we propose to combine two sound source localization methods based on MUSIC (multiple signal classification) to cope with trade-off between latency and noise robustness. The standard Eigenvalue decomposition based MUSIC (SEVD-MUSIC) has smaller latency but less noise robustness, whereas the incremental generalized singular value decomposition based MUSIC (iGSVD-MUSIC) has higher noise robustness but larger latency. A UAV operator can use an appropriate method according to the situation. A sound enhancement method called online robust principal component analysis (ORPCA) enables the operator to detect a target sound source more easily. To improve the stability of wireless communication, and robustness of the UAV system against weather changes, we developed data compression based on free lossless audio codec (FLAC) extended to support a 16 ch audio data stream via UDP, and developed a water-resistant microphone array. The resulting system successfully worked in an outdoor search and rescue task in ImPACT Tough Robotics Challenge in November 2016.

Collaboration


Dive into the Ryosuke Kojima's collaboration.

Top Co-Authors

Avatar

Osamu Sugiyama

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kotaro Hoshiba

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Satoshi Uemura

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Takayuki Morito

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Akihide Nagamine

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Taisuke Sato

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge