Yoshitaka Ushiku | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoshitaka Ushiku is active.

Explore More

Publication

Featured researches published by Yoshitaka Ushiku.

computer vision and pattern recognition | 2011

Discriminative spatial pyramid

Tatsuya Harada; Yoshitaka Ushiku; Yuya Yamashita; Yasuo Kuniyoshi

Spatial Pyramid Representation (SPR) is a widely used method for embedding both global and local spatial information into a feature, and it shows good performance in terms of generic image recognition. In SPR, the image is divided into a sequence of increasingly finer grids on each pyramid level. Features are extracted from all of the grid cells and are concatenated to form one huge feature vector. As a result, expensive computational costs are required for both learning and testing. Moreover, because the strategy for partitioning the image at each pyramid level is designed by hand, there is weak theoretical evidence of the appropriate partitioning strategy for good categorization. In this paper, we propose discriminative SPR, which is a new representation that forms the image feature as a weighted sum of semi-local features over all pyramid levels. The weights are automatically selected to maximize a discriminative power. The resulting feature is compact and preserves high discriminative power, even in low dimension. Furthermore, the discriminative SPR can suggest the distinctive cells and the pyramid levels simultaneously by observing the optimal weights generated from the fine grid cells.

international conference on multimedia and expo | 2017

DualNet: Domain-invariant network for visual question answering

Kuniaki Saito; Andrew Shin; Yoshitaka Ushiku; Tatsuya Harada

Visual question answering (VQA) tasks use two types of images: abstract (illustrations) and real. Domain-specific differences exist between the two types of images with respect to “objectness,” “texture,” and “color.” Therefore, achieving similar performance by applying methods developed for real images to abstract images, and vice versa, is difficult. This is a critical problem in VQA, because image features are crucial clues for correctly answering the questions about the images. However, an effective, domain-invariant method can provide insight into the high-level reasoning required for VQA. We thus propose a method called DualNet that demonstrates performance that is invariant to the differences in real and abstract scene domains. Experimental results show that DualNet outperforms state-of-the-art methods, especially for the abstract images category.

acm multimedia | 2011

Understanding images with natural sentences

Yoshitaka Ushiku; Tatsuya Harada; Yasuo Kuniyoshi

We propose a novel system which generates sentential captions for general images. For people to use numerous images effectively on the web, technologies must be able to explain image contents and must be capable of searching for data that users need. Moreover, images must be described with natural sentences based not only on the names of objects contained in an image but also on their mutual relations. The proposed system uses general images and captions available on the web as training data to generate captions for new images. Furthermore, because the learning cost is independent from the amount of data, the system has scalability, which makes it useful with large-scale data.

computer vision and pattern recognition | 2014

Three Guidelines of Online Learning for Large-Scale Visual Recognition

Yoshitaka Ushiku; Masatoshi Hidaka; Tatsuya Harada

In this paper, we would like to evaluate online learning algorithms for large-scale visual recognition using state-of-the-art features which are preselected and held fixed. Today, combinations of high-dimensional features and linear classifiers are widely used for large-scale visual recognition. Numerous so-called mid-level features have been developed and mutually compared on an experimental basis. Although various learning methods for linear classification have also been proposed in the machine learning and natural language processing literature, they have rarely been evaluated for visual recognition. Therefore, we give guidelines via investigations of state-of-the-art online learning methods of linear classifiers. Many methods have been evaluated using toy data and natural language processing problems such as document classification. Consequently, we gave those methods a unified interpretation from the viewpoint of visual recognition. Results of controlled comparisons indicate three guidelines that might change the pipeline for visual recognition.

british machine vision conference | 2016

Image Captioning with Sentiment Terms via Weakly-Supervised Sentiment Dataset.

Andrew Shin; Yoshitaka Ushiku; Tatsuya Harada

Image captioning task has become a highly competitive research area with successful application of convolutional and recurrent neural networks, especially with the advent of long short-term memory (LSTM) architecture. However, its primary focus has been a factual description of the images, including the objects, movements, and their relations. While such focus has demonstrated competence, describing the images along with nonfactual elements, namely sentiments of the images expressed via adjectives, has mostly been neglected. We attempt to address this issue by fine-tuning an additional convolutional neural network solely devoted to sentiments, where dataset on sentiment is built from a data-driven, multi-label approach. Our experimental results show that our method can generate image captions with sentiment terms that are more compatible with the images than solely relying on features devoted to object classification, while capable of preserving the semantics.

international conference on image processing | 2010

Improving image similarity measures for image browsing and retrieval through latent space learning between images and long texts

Yoshitaka Ushiku; Tatsuya Harada; Yasuo Kuniyoshi

The amount of multimedia data on personal devices and the Web is increasing daily. Image browsing and retrieval systems in a low-dimensional space have been widely studied to manage and view large numbers of images. It is essential for such systems to exploit an efficient similarity measure of the images when searching for them. Existing methods use the distance in a low-level image feature space as the similarity measure, and therefore, images with different content may be treated as similar images. In this paper, we propose a novel method to improve the similarity measures for images by considering the text surrounding the images. If there is text describing the images, similarities can be measured more effectively by taking into account the text streams. The proposed method improves the image similarity measures based on the latent semantics obtained from the combination of image and text. It should be noted that the text does not need to be clear tags; indeed, any generic Web text is applicable. Moreover, our method can effectively improve the similarities even if only a small portion of the images include textual descriptions. Additionally, the proposed method is scalable as it has linear computational complexity based on the number of images. In the experiments, we compare our method with previous methods using an original dataset in which a portion of the images are annotated by long text. We show that the proposed method can retrieve semantically similar images more precisely than existing methods.

acm multimedia | 2017

WebDNN: Fastest DNN Execution Framework on Web Browser

Masatoshi Hidaka; Yuichiro Kikura; Yoshitaka Ushiku; Tatsuya Harada

Recently, deep neural network (DNN) is drawing a lot of attention because of its applications. However, it requires a lot of computational resources and tremendous processes in order to setup an execution environment based on hardware acceleration such as GPGPU. Therefore, providing DNN applications to end-users is very hard. To solve this problem, we have developed an installation-free web browser-based DNN execution framework, WebDNN. WebDNN optimizes the trained DNN model to compress model data and accelerate the execution. It executes the DNN model with novel JavaScript API to achieve zero-overhead execution. Empirical evaluations show that it achieves more than two-hundred times the unusual acceleration. WebDNN is an open source framework and you can download it from https://github.com/mil-tokyo/webdnn.

international conference on robotics and automation | 2014

Hard Negative Classes for Multiple Object Detection

Asako Kanezaki; Sho Inaba; Yoshitaka Ushiku; Yuya Yamashita; Hiroshi Muraoka; Yasuo Kuniyoshi; Tatsuya Harada

We propose an efficient method to train multiple object detectors simultaneously using a large scale image dataset. The one-vs-all approach that optimizes the boundary between positive samples from a target class and negative samples from the others has been the most standard approach for object detection. However, because this approach trains each object detector independently, the scores are not balanced between object classes. The proposed method combines ideas derived from both detection and classification in order to balance the scores across all object classes. We optimized the boundary between target classes and their “hard negative” samples, just as in detection, while simultaneously balancing the detector scores across object classes, as done in multi-class classification. We evaluated the performances on multi-class object detection using a subset of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2011 dataset and showed our method outperformed a de facto standard method.

european conference on computer vision | 2018

Open Set Domain Adaptation by Backpropagation

Kuniaki Saito; Shohei Yamamoto; Yoshitaka Ushiku; Tatsuya Harada

Numerous algorithms have been proposed for transferring knowledge from a label-rich domain (source) to a label-scarce domain (target). Most of them are proposed for closed-set scenario, where the source and the target domain completely share the class of their samples. However, in practice, a target domain can contain samples of classes that are not shared by the source domain. We call such classes the “unknown class” and algorithms that work well in the open set situation are very practical. However, most existing distribution matching methods for domain adaptation do not work well in this setting because unknown target samples should not be aligned with the source. In this paper, we propose a method for an open set domain adaptation scenario, which utilizes adversarial training. This approach allows to extract features that separate unknown target from known target samples. During training, we assign two options to the feature generator: aligning target samples with source known ones or rejecting them as unknown target ones. Our method was extensively evaluated and outperformed other methods with a large margin in most settings.

acm multimedia | 2017

Multispectral Object Detection for Autonomous Vehicles

Karasawa Takumi; Kohei Watanabe; Qishen Ha; Antonio Tejero-de-Pablos; Yoshitaka Ushiku; Tatsuya Harada

Recently, researchers have actively conducted studies on mobile robot technologies that involve autonomous driving. To implement an automatic mobile robot (e.g., an automated driving vehicle) in traffic, robustly detecting various types of objects such as cars, people, and bicycles in various conditions such as daytime and nighttime is necessary. In this paper, we propose the use of multispectral images as input information for object detection in traffic. Multispectral images are composed of RGB images, near-infrared images, middle-infrared images, and far-infrared images and have multilateral information as a whole. For example, some objects that cannot be visually recognized in the RGB image can be detected in the far-infrared image. To train our multispectral object detection system, we need a multispectral dataset for object detection in traffic. Since such a dataset does not currently exist, in this study we generated our own multispectral dataset. In addition, we propose a multispectral ensemble detection pipeline to fully use the features of multispectral images. The pipeline is divided into two parts: the single-spectral detection model and the ensemble part. We conducted two experiments in this work. In the first experiment, we evaluate our single-spectral object detection model. Our results show that each component in the multispectral image was individually useful for the task of object detection when applied to different types of objects. In the second experiment, we evaluate the entire multispectral object detection system and show that the mean average precision (mAP) of multispectral object detection is 13% higher than that of RGB-only object detection.

Explore More