Kaoru Hiramatsu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kaoru Hiramatsu is active.

Explore More

Publication

Featured researches published by Kaoru Hiramatsu.

international conference on acoustics, speech, and signal processing | 2017

Generative adversarial network-based postfilter for statistical parametric speech synthesis

Takuhiro Kaneko; Hirokazu Kameoka; Nobukatsu Hojo; Yusuke Ijima; Kaoru Hiramatsu; Kunio Kashino

We propose a postfilter based on a generative adversarial network (GAN) to compensate for the differences between natural speech and speech synthesized by statistical parametric speech synthesis. In particular, we focus on the differences caused by over-smoothing, which makes the sounds muffled. Over-smoothing occurs in the time and frequency directions and is highly correlated in both directions, and conventional methods based on heuristics are too limited to cover all the factors (e.g., global variance was designed only to recover the dynamic range). To solve this problem, we focus on “spectral texture”, i.e., the details of the time-frequency representation, and propose a learning-based postfilter that captures the structures directly from the data. To estimate the true distribution, we utilize a GAN composed of a generator and a discriminator. This optimizes the generator to produce samples imitating the dataset according to the adversarial discriminator. This adversarial process encourages the generator to fit the true data distribution, i.e., to generate realistic spectral texture. Objective evaluation of experimental results shows that the GAN-based postfilter can compensate for detailed spectral structures including modulation spectrum, and subjective evaluation shows that its generated speech is comparable to natural speech.

computer vision and pattern recognition | 2017

Generative Attribute Controller with Conditional Filtered Generative Adversarial Networks

Takuhiro Kaneko; Kaoru Hiramatsu; Kunio Kashino

We present a generative attribute controller (GAC), a novel functionality for generating or editing an image while intuitively controlling large variations of an attribute. This controller is based on a novel generative model called the conditional filtered generative adversarial network (CFGAN), which is an extension of the conventional conditional GAN (CGAN) that incorporates a filtering architecture into the generator input. Unlike the conventional CGAN, which represents an attribute directly using an observable variable (e.g., the binary indicator of attribute presence) so its controllability is restricted to attribute labeling (e.g., restricted to an ON or OFF control), the CFGAN has a filtering architecture that associates an attribute with a multi-dimensional latent variable, enabling latent variations of the attribute to be represented. We also define the filtering architecture and training scheme considering controllability, enabling the variations of the attribute to be intuitively controlled using typical controllers (radio buttons and slide bars). We evaluated our CFGAN on MNIST, CUB, and CelebA datasets and show that it enables large variations of an attribute to be not only represented but also intuitively controlled while retaining identity. We also show that the learned latent space has enough expressive power to conduct attribute transfer and attribute-based image retrieval.

acm multimedia | 2016

Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks

Takuhiro Kaneko; Kaoru Hiramatsu; Kunio Kashino

While many studies in computer vision and pattern recognition have been actively conducted to recognize peoples current states, few studies have tackled the problem of generating feedback on how people can improve their states, although there are many real-world applications such as in sports, education, and health care. In particular, it has been challenging to develop such a system that can adaptively generate feedback for real-world situations, namely various input and target states, since it requires formulating various rules of feedback to do so. We propose a learning-based method to solve this problem. If we can obtain a large amount of feedback annotations, it is possible to explicitly learn the rules, but it is difficult to do so due to the subjective nature of the task. To mitigate this problem, our method implicitly learns the rules from training data consisting of input images, key-point annotations, and state annotations that do not require professional knowledge in feedback. Given such training data, we first learn a multi-task deep neural network with state recognition and key-point localization. Then, we apply a novel propagation method for extracting feedback information from the network. We evaluated our method in a facial expression improvement task using real-world data and clarified its characteristics and effectiveness.

International Journal of Computer Vision | 2018

Label Propagation with Ensemble of Pairwise Geometric Relations: Towards Robust Large-Scale Retrieval of Object Instances

Xiaomeng Wu; Kaoru Hiramatsu; Kunio Kashino

Spatial verification methods permit geometrically stable image matching, but still involve a difficult trade-off between robustness as regards incorrect rejection of true correspondences and discriminative power in terms of mismatches. To address this issue, we ask whether an ensemble of weak geometric constraints that correlates with visual similarity only slightly better than a bag-of-visual-words model performs better than a single strong constraint. We consider a family of spatial verification methods and decompose them into fundamental constraints imposed on pairs of feature correspondences. Encompassing such constraints leads us to propose a new method, which takes the best of existing techniques and functions as a unified Ensemble of pAirwise GEometric Relations (EAGER), in terms of both spatial contexts and between-image transformations. We also introduce a novel and robust reranking method, in which the object instances localized by EAGER in high-ranked database images are reissued as new queries. EAGER is extended to develop a smoothness constraint where the similarity between the optimized ranking scores of two instances should be maximally consistent with their geometrically constrained similarity. Reranking is newly formulated as two label propagation problems: one is to assess the confidence of new queries and the other to aggregate new independently executed retrievals. Extensive experiments conducted on four datasets show that EAGER and our reranking method outperform most of their state-of-the-art counterparts, especially when large-scale visual vocabularies are used.

international conference on acoustics, speech, and signal processing | 2017

Deep salience map guided arbitrary direction scene text recognition

Xinhao Liu; Takahito Kawanishi; Xiaomeng Wu; Kaoru Hiramatsu; Kunio Kashino

Irregular scene text such as curved, rotated or perspective texts commonly appear in natural scene images due to different camera view points, special design purposes etc. In this work, we propose a text salience map guided model to recognize these arbitrary direction scene texts. We train a deep Fully Convolutional Network (FCN) to calculate the precise salience map for texts. Then we estimate the positions and rotations of the text and utilize this information to guide the generation of CNN sequence features. Finally the sequence is recognized with a Recurrent Neural Network (RNN) model. Experiments on various public datasets show that the proposed approach is robust to different distortions and performs superior or comparable to the state-of-the-art techniques.

international conference on acoustics, speech, and signal processing | 2017

Edited film alignment via selective Hough transform and accurate template matching

Xiaomeng Wu; Takahito Kawanishi; Minoru Mori; Kaoru Hiramatsu; Kunio Kashino

Edited film alignment is the post-production process of finding small parts of unedited footage that temporally and spatially match an edited film. The huge amount of data to be processed makes significant downsampling of the videos essential in real-life applications. Simultaneously, professional users demand that the task be achieved with frame and pixel-level accuracy. We propose a novel selective Hough transform (SHT) and an accurate template matching method to address the difficult trade-off between accuracy and scalability. For robust temporal alignment, SHT investigates the selectivity of frame-level similarities and advantageously reduces the weights of mismatches. The template matching method encompasses spatial Hough transform and sum of squared differences (SSD) minimization. SSD is efficiently approximated by exploiting the second-order derivative of image intensity. Experiments conducted on real-world data show the superiority of our methods.

international acm sigir conference on research and development in information retrieval | 2017

Information Retrieval Model using Generalized Pareto Distribution and Its Application to Instance Search

Masaya Murata; Kaoru Hiramatsu; Shin'ichi Satoh

We adopt the generalized Pareto distribution for the information-based model and show that the parameters can be estimated based on the mean excess function. The proposed information retrieval model corresponds to the extension of the divergence from independence and is designed to be data-driven. The proposed model is then applied to the specific object search called the instance search and the effectiveness is experimentally confirmed.

discovery science | 2017

Recursive Extraction of Modular Structure from Layered Neural Networks Using Variational Bayes Method

Chihiro Watanabe; Kaoru Hiramatsu; Kunio Kashino

Deep neural networks have made a substantial contribution to the recognition and prediction of complex data in various fields, such as image processing, speech recognition and bioinformatics. However, it is very difficult to discover knowledge from the inference provided by a neural network, since its internal representation consists of many nonlinear and hierarchical parameters. To solve this problem, an approach has been proposed that extracts a global and simplified structure for a neural network. Although it can successfully detect such a hidden modular structure, its convergence is not sufficiently stable and is vulnerable to the initial parameters. In this paper, we propose a new deep learning algorithm that consists of recursive back propagation, community detection using a variational Bayes, and pruning unnecessary connections. We show that the proposed method can appropriately detect a hidden inference structure and compress a neural network without increasing the generalization error.

computational color imaging workshop | 2017

Visualizing Lost Designs in Degraded Early Modern Tapestry Using Infra-red Image

Masaru Tsuchida; Keiji Yano; Kaoru Hiramatsu; Kunio Kashino

This paper shows how to experimentally visualize lost designs in damaged early modern tapestries used in the Kyoto Gion festival. Unlike cloth weaving, tapestry is weft-faced weaving. As the surface welt threads become worn or turn over time, the design in a tapestry is gradually lost. On the other hand, weft threads hidden by warp threads still remain. In the tapestries of the Kyoto Gion festival, gold and silver threads were often used as weft, and they reflect infrared radiation. In experiments, a tapestry woven in the seventeenth century was used. Six-band images were taken for accurate color reproduction and infrared images were taken for visualizing the lost design. The viewing angle and image resolution of both types of images were the same. Superimposing the infrared image on color image after correcting registration errors revealed the original design of the tapestry.

advances in computing and communications | 2016

Filter design based on multiple model estimation

Masaya Murata; Hidehisa Nagano; Kaoru Hiramatsu; Kunio Kashino

We show that famous filtering algorithms such as Gaussian sum filter (GSF) and particle filter (PF) are derived from the multiple model estimation (MME). Based on the MME, we propose a new filter called particle Gaussian sum filter (PGSF) to overcome the problems of GSF and PF. To realize the algorithm of PGSF, we also show that ensemble Kalman filter (EnKF) asymptotically approaches Gaussian filter (GF) when using sufficiently large ensemble number. The PGSF employing the EnKF achieves higher estimation accuracy than that using the extended Kalman filter (EKF), while the latter approach is much faster in terms of processing time. We compare the proposed filter with several existing filters and demonstrate its effectiveness through a numerical simulation.

Explore More