Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sunao Hara is active.

Publication


Featured researches published by Sunao Hara.


Journal of the Acoustical Society of America | 2006

An online customizable music retrieval system with a spoken dialogue interface

Sunao Hara; Chiyomi Miyajima; Katsunobu Itou; Kazuya Takeda

In this paper, we introduce a spoken language interface for music information retrieval. In response to voice commands, the system searches for a song through an internet music shop or a ‘‘playlist’’ stored in the local PC; the system then plays it. To cope with the almost unlimited size of the vocabulary, a remote server program with which a user can customize their recognition grammar and dictionary is implemented. When a user selects favorite artists, the server program automatically generates a minimal set of recognition grammars and a dictionary. The system then sends them to the interface program. Therefore, on average, the vocabulary is less than 1000 words for each user. To perform a field test of the system, we implemented a speech collection capability, whereby speech utterances are compressed in free lossless audio codec (FLAC) format and are sent back to the server program with dialogue logs. Currently, the system is available to the public for experimental use. More than 100 users are involve...


international conference on pervasive computing | 2015

Sound collection and visualization system enabled participatory and opportunistic sensing approaches

Sunao Hara; Masanobu Abe; Noboru Sonehara

This paper presents a sound collection system to visualize environmental sounds that are collected using a crowd-sourcing approach. An analysis of physical features is generally used to analyze sound properties; however, human beings not only analyze but also emotionally connect to sounds. If we want to visualize the sounds according to the characteristics of the listener, we need to collect not only the raw sound, but also the subjective feelings associated with them. For this purpose, we developed a sound collection system using a crowdsourcing approach to collect physical sounds, their statistics, and subjective evaluations simultaneously. We then conducted a sound collection experiment using the developed system on ten participants. We collected 6,257 samples of equivalent loudness levels and their locations, and 516 samples of sounds and their locations. Subjective evaluations by the participants are also included in the data. Next, we tried to visualize the sound on a map. The loudness levels are visualized as a color map and the sounds are visualized as icons which indicate the sound type. Finally, we conducted a discrimination experiment on the sound to implement a function of automatic conversion from sounds to appropriate icons. The classifier is trained on the basis of the GMM-UBM (Gaussian Mixture Model and Universal Background Model) method. Experimental results show that the F-measure is 0.52 and the AUC is 0.79.


international conference on human-computer interaction | 2015

Algorithm to Estimate a Living Area Based on Connectivity of Places with Home

Yuji Matsuo; Sunao Hara; Masanobu Abe

We propose an algorithm to estimate a person’s living area using his/her collected Global Positioning System (GPS) data. The most important feature of the algorithm is the connectivity of places with a home, i.e., a living area must consist of a home, important places, and routes that connect them. This definition is logical because people usually go to a place from home, and there can be several routes to that place. Experimental results show that the proposed algorithm can estimate living area with a precision of 0.82 and recall of 0.86 compared with the grand truth established by users. It is also confirmed that the connectivity of places with a home is necessary to estimate a reasonable living area.


asia pacific signal and information processing association annual summit and conference | 2014

A hybrid text-to-speech based on sub-band approach

Takuma Inoue; Sunao Hara; Masanobu Abe

This paper proposes a sub-band speech synthesis approach to develop high-quality Text-to-Speech (TTS). For the low-frequency band and high-frequency band, Hidden Markov Model (HMM)-based speech synthesis and waveform-based speech synthesis are used, respectively. Both speech synthesis methods are widely known to show good performance and to have benefits and shortcomings from different points of view. One motivation is to apply the right speech synthesis method in the right frequency band. Experiment results show that in terms of the smoothness the proposed approach shows better performance than waveform-based speech synthesis, and in terms of the clarity it shows better than HMM-based speech synthesis. Consequently, the proposed approach combines the inherent benefits from both waveform-based speech synthesis and HMM-based speech synthesis.


Proceedings of IWSDS 2011 | 2011

On-line detection of task incompletion for spoken dialog systems using utterance and behavior tag N-gram vectors

Sunao Hara; Norihide Kitaoka; Kazuya Takeda

We propose a method of detecting the task incompletion in spoken dialog systems using N-gram-based dialog features. We used a database created during a field test in which inexperienced users used a client-server music retrieval system with a spoken dialog interface on their own PCs. The dialog for a music retrieval task consisted of a sequence of user and system tags that related their utterances and behaviors. The dialogs were manually classified into two classes: completed and uncompleted music retrieval tasks. We then detected dialogs that did not complete the task using a Support Vector Machine with N-gram feature vectors and interaction parameters trained using manually classified dialogs. Off-line and on-line detection experiments were conducted on a large amount of real data, and the results show that our proposed method achieved good classification performance.


international symposium on wearable computers | 2017

Prediction of subjective assessments for a noise map using deep neural networks

Shota Kobayashi; Masanobu Abe; Sunao Hara

In this paper, we investigate a method of creating noise maps that take account of human senses. Physical measurements are not enough to design our living environment and we need to know subjective assessments. To predict subjective assessments from loudness values, we propose to use metadata related to where, who and what is recording. The proposed method is implemented using deep neural networks because these can naturally treat a variety of information types. First, we evaluated its performance in predicting five-point subjective loudness levels based on a combination of several features: location-specific, participant-specific, and sound-specific features. The proposed method achieved a 16.3 point increase compared with the baseline method. Next, we evaluated its performance based on noise map visualization results. The proposed noise maps were generated from the predicted subjective loudness level. Considering the differences between the two visualizations, the proposed method made fewer errors than the baseline method.


acm sigspatial workshop recommendations for location based services and social networks | 2017

New monitoring scheme for persons with dementia through monitoring-area adaptation according to stage of disease

Shigeki Kamada; Yuji Matsuo; Sunao Hara; Masanobu Abe

In this paper, we propose a new monitoring scheme for a person with dementia (PwD). The novel aspect of this monitoring scheme is that the size of the monitoring area changes for different stages of dementia, and the monitoring area is automatically generated using global positioning system (GPS) data collected by the PwD. The GPS data are quantized using the GeoHex code, which breaks down the map of the entire world into regular hexagons. The monitoring area is defined as a set of GeoHex codes, and the size of the monitoring area is controlled by the granularity of hexagons in the GeoHex code. The stages of dementia are estimated by analyzing the monitoring area to determine how frequently the PwD wanders. In this paper, we also examined two aspects of the implementation of the proposed scheme. First, we proposed an algorithm to estimate the monitoring area and evaluate its performance. The experimental results showed that the proposed algorithm can estimate the monitoring area with a precision of 0.82 and recall of 0.86 compared with the ground truth. Second, to investigate privacy considerations, we showed that different persons have different preferences for the granularity of the hexagons in the monitoring systems. 1The results indicate that the size of the monitoring area also should be changed for PwDs.


ubiquitous computing | 2016

Safety vs. privacy: user preferences from the monitored and monitoring sides of a monitoring system

Shigeki Kamada; Masanobu Abe; Sunao Hara

In this study, in order to develop a monitoring system that takes into account privacy issues, we investigated user preferences in terms of the monitoring and privacy protection levels. The people on the monitoring side wanted the monitoring system to allow them to monitor in detail. Conversely, it was observed for the people being monitored that the more detailed the monitoring, the greater the feelings of being surveilled intrusively. Evaluation experiments were performed using the location data of three people in different living areas. The results of the experiments show that it is possible to control the levels of monitoring and privacy protection without being affected by the shape of a living area by adjusting the quantization level of location information. Furthermore, it became clear that the granularity of location information satisfying the people on the monitored side and the monitoring side is different.


asia pacific signal and information processing association annual summit and conference | 2016

Enhancing a glossectomy patient's speech via GMM-based voice conversion

Kei Tanaka; Sunao Hara; Masanobu Abe; Shogo Minagi

In this paper, we describe the use of a voice conversion algorithm for improving the intelligibility of speech by patients with articulation disorders caused by a wide glossectomy and/or segmental mandibulectomy. As a first trial, to demonstrate the difficulty of the task at hand, we implemented a conventional Gaussian mixture model (GMM)-based algorithm using a frame-by-frame approach. We compared voice conversion performance among normal speakers and one with an articulation disorder by measuring the number of training sentences, the number of GMM mixtures, and the variety of speaking styles of training speech. According to our experiment results, the mel-cepstrum (MC) distance was decreased by 40% in all pairs of speakers as compared with that of pre-conversion measures; however, at post-conversion, the MC distance between a pair of a glossectomy speaker and a normal speaker was 28% larger than that between pairs of normal speakers. The analysis of resulting spectrograms showed that the voice conversion algorithm successfully reconstructed high-frequency spectra in phonemes /h/, /t/, /k/, /ts/, and /ch/; we also confirmed improvements of speech intelligibility via informal listening tests.


Journal of the Acoustical Society of America | 2016

A classification method for crowded situation using environmental sounds based on Gaussian mixture model-universal background model

Tomoyasu Tanaka; Sunao Hara; Masanobu Abe

This paper presents a method to classify a situation of being crowded with people using environmental sounds that are collected by smartphones. A final goal of the research is to estimate “crowd-density” using only environmental sounds collect by smartphones. Advantages of the approach are (1) acoustic singles can be collected and processed at low cost, (2) because many people carry smartphones, crowd-density can be obtained not only from many places, but also at any time. As the first step, in this paper, we tried to classify “a situation of being crowded with people.” We collected environmental sounds using smartphones both in residential area and downtown area. The total duration of the collected data is 77,900 seconds. The sound of “crowded with people” is defined as buzz-buzz where more than one person talked at the same time. Two kinds of classifiers were trained on the basis of the GMM-UBM (Gaussian Mixture Model and Universal Background Model) method. The one was trained with acoustic features tha...

Collaboration


Dive into the Sunao Hara's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge