Shoji Kajita
Nagoya University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shoji Kajita.
international conference on acoustics, speech, and signal processing | 2000
Satoshi Kurita; Hiroshi Saruwatari; Shoji Kajita; Kazuya Takeda; Fumitada Itakura
This paper describes a new blind signal separation method using the directivity patterns of a microphone array. In this method, to deal with the arriving lags among each microphone, the inverses of the mixing matrices are calculated in the frequency domain so that the separated signals are mutually independent. Since the calculations are carried out in each frequency independently, the following problems arise: (1) permutation of each sound source, (2) arbitrariness of each source gain. In this paper, we propose a new solution that directivity patterns are explicitly used to estimate each sound source direction. As the results of signal separation experiments, it is shown that the proposed method improves the SNR of degraded speech by about 16 dB under non-reverberant condition. Also, the proposed method improves the SNR by 8.7 dB when the reverberation time is 184 ms, and by 5.1 dB when the reverberation time is 322 ms.
ubiquitous intelligence and computing | 2007
Zhiwen Yu; Yuichi Nakamura; Seiie Jang; Shoji Kajita; Kenji Mase
Nowadays, e-learning systems are widely used for education and training in universities and companies because of their electronic course content access and virtual classroom participation. However, with the rapid increase of learning content on the Web, it will be time-consuming for learners to find contents they really want to and need to study. Aiming at enhancing the efficiency and effectiveness of learning, we propose an ontology-based approach for semantic content recommendation towards context-aware e-learning. The recommender takes knowledge about the learner (user context), knowledge about content, and knowledge about the domain being learned into consideration. Ontology is utilized to model and represent such kinds of knowledge. The recommendation consists of four steps: semantic relevance calculation, recommendation refining, learning path generation, and recommendation augmentation. As a result, a personalized, complete, and augmented learning program is suggested for the learner.
international conference on acoustics, speech, and signal processing | 2000
Yasuhiro Shimizu; Shoji Kajita; Kazuya Takeda; Fumitada Itakura
This paper proposes space diversity speech recognition technique using distributed multi-microphones in a room, as a new paradigm of speech recognition. The key technology to realize the system is (1) distant-talking speech recognition and (2) the integration method of multiple inputs. In this paper, we propose the use of a distant speech model for distant-talking speech recognition, and feature-based and likelihood-based integration methods for multimicrophones distributed in the room. The distant speech model is a set of HMMs learned using speech data convolved with the impulse responses measured at several points in the room. The experimental results of simulated distant-talking speech recognition show that the proposed space diversity speech recognition system can attain about 80% in accuracy, while the performances of conventional HMMs using close-talking microphones are less than 50%. These results indicate that the space diversity approach is promising for robust speech recognition under a real acoustic environment.
workshop on applications of signal processing to audio and acoustics | 1999
Takanori Nishino; Shoji Kajita; Kazuya Takeda; Fumitada Itakura
This paper describes the interpolation of head related transfer functions (HRTFs) for all direction in the median plane. The interpolation of HRTFs enables us to reduce the number of measurements for new users HRTFs, and also reduce the data of HRTFs in auditory virtual systems. In this paper, a simple linear interpolation method and the spline interpolation method are evaluated and advantages of both methods clarified. In experiments, the interpolation methods are applied to HRTFs measured using a dummy head. The experimental results show that the two methods are comparable in the best case. The resultant minimum spectral distortion is about 2 dB for both methods. The results clarify that the linear interpolation is effective for a set of elevations selected based on the cross correlation and that the spline interpolation is effective at large and equal intervals. These results indicate that HRTFs in the median plane can be interpolated by the methods.
IEEE Pervasive Computing | 2008
Zhiwen Yu; Yuichi Nakamura; Daqing Zhang; Shoji Kajita; Kenji Mase
In this article, the authors present an approach for context-aware and QoS-enabled learning content provisioning, one of the essential elements in ubiquitous learning. The essence of the system is recommending the right content, in the right form, to the right learner, based on a wide range of user context information and QoS requirements. To facilitate knowledge interoperability and sharing, they modeled the learner context, content knowledge, and domain knowledge using ontologies. They first propose a knowledge-based semantic recommendation method to acquire the content the user really wants and needs to learn. Then, a fuzzy logic-based decision-making strategy and an adaptive QoS mapping mechanism determine the appropriate presentation according to users QoS requirements and device/network capability.
international conference on multimodal interfaces | 2007
Tomoyuki Morita; Kenji Mase; Yasushi Hirano; Shoji Kajita
In this paper, we investigate the reciprocal attention modality in remotecommunication. A remote meeting system with a humanoid robot avatar is proposedto overcome the invisible wall for a video conferencing system. Ourexperimental result shows that a tangible robot avatar provides more effectivereciprocal attention against video communication. The subjects in the experimentare asked to determine whether a remote participant with the avatar is activelylistening or not to the local presenters talk. In this system, the head motionof a remote participant is transferred and expressed by the head motion of ahumanoid robot. While the presenter has difficulty in determining the extentof a remote participants attention with a video conferencing system, she/he hasbetter sensing of remote attentive states with the robot. Based on theevaluation result, we propose a vision system for the remote user thatintegrates omni-directional camera and robot-eye camera images to provide a wideview with a delay compensation feature.
international conference on acoustics speech and signal processing | 1999
Hiroshi Saruwatari; Shoji Kajita; Kazuya Takeda; Fumitada Itakura
This paper describes an improved spectral subtraction method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition. The complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. It is shown that the nonlinear subtraction processing with complementary beamforming can result in a kind of the spectral subtraction without the need for speech pause detection. In addition, the design of the optimization algorithm for the directivity pattern is also described. To evaluate the effectiveness, speech enhancement experiments and speech recognition experiments are performed based on computer simulations. In comparison with the optimized conventional delay-and-sum array, it is shown that the proposed array improves the signal-to-noise ratio of degraded speech by about 2 dB and performs about 10% better in word recognition rates under heavy noisy conditions.
international conference on multimodal interfaces | 2005
Tomoyuki Morita; Yasushi Hirano; Yasuyuki Sumi; Shoji Kajita; Kenji Mase
This paper proposes a novel mining method for multimodal interactions to extract important patterns of group activities. These extracted patterns can be used as machine-readable event indices in developing an interaction corpus based on a huge collection of human interaction data captured by various sensors. The event indices can be used, for example, to summarize a set of events and to search for particular events because they contain various pieces of context information. The proposed method extracts simultaneously occurring patterns of primitive events in interaction, such as gaze and speech, that in combination occur more consistently than randomly. The proposed method provides a statistically plausible definition of interaction events that is not possible through intuitive top-down definitions. We demonstrate the effectiveness of our method for the data captured in an experimental setup of a poster-exhibition scene. Several interesting patterns are extracted by the method, and we examined their interpretations.
international conference on spoken language processing | 1996
Daisuke Kobayashi; Shoji Kajita; Kazuya Takeda; Fumitada Itakura
Human speech-like noise (HSLN) is a kind of bubble noise generated by superimposing independent speech signals typically more than one thousand times. Since the basic feature of HSLN varies from that of overlapped speech to stationary noise, keeping long time spectra in the same shape, we investigate perceptual discrimination of speech from stationary noise and its acoustic correlates using HSLN of various numbers of superposition. First we confirm the perceptual score, i.e. how much the HSLN sounds like stationary noise, and that the number of superpositions of HSLN is proportional by subjective tests. Then, we show that the amplitude distribution of the difference signal of HSLN approaches the Gaussian distribution from the Gamma distribution as the number of superpositions increase. The other subjective test to perceive three HSLN of different dynamic characteristics clarifies that the temporal change of spectral envelope plays an important roll in discriminating speech from noise.
Journal of the Acoustical Society of America | 1996
Takanori Nishino; Sumie Mase; Shoji Kajita; Kazuya Takeda; Fumitada Itakura
Two (linear and nonlinear) interpolation methods of the head‐related transfer function (HRTF) are exploited in order to realize virtual auditory localization. In both methods, HRTFs of the left and right ears are represented by a delay time and a common impulse response, where delay time is determined so that the cross correlation of two HRTFs takes the maximum value. A three‐layer neural network is trained for the nonlinear method, whereas basic linear interpolation is used for the linear method. Evaluation tests are performed by using HRTF prototypes, Web‐published by the MIT Media Lab. The signal‐to‐deviation ratios (SDR) of the measured and interpolated HRTFs are calculated for objective evaluation of the methods. The SDR of the nonlinear method is much better, i.e., 50 to 70 dB, than that of the linear method, i.e., 5 to 30 dB. On the other hand, there is no significant difference in the subjective evaluation of localizing the earphone‐presented sounds generated by the two interpolated HRTFs. Further...
Collaboration
Dive into the Shoji Kajita's collaboration.
National Institute of Information and Communications Technology
View shared research outputs