Forrest N. Iandola | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Forrest N. Iandola is active.

Explore More

Publication

Featured researches published by Forrest N. Iandola.

computer vision and pattern recognition | 2015

Deformable part models are convolutional neural networks

Ross B. Girshick; Forrest N. Iandola; Trevor Darrell; Jitendra Malik

Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition. They are typically viewed as distinct approaches: DPMs are graphical models (Markov random fields), while CNNs are “black-box” non-linear classifiers. In this paper, we show that a DPM can be formulated as a CNN, thus providing a synthesis of the two ideas. Our construction involves unrolling the DPM inference algorithm and mapping each step to an equivalent CNN layer. From this perspective, it is natural to replace the standard image features used in DPMs with a learned feature extractor. We call the resulting model a DeepPyramid DPM and experimentally validate it on PASCAL VOC object detection. We find that DeepPyramid DPMs significantly outperform DPMs based on histograms of oriented gradients features (HOG) and slightly outperforms a comparable version of the recently introduced R-CNN detection system, while running significantly faster.

international conference on computer vision | 2013

Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

Ning Zhang; Ryan Farrell; Forrest N. Iandola; Trevor Darrell

Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements of our approach over state-of-art algorithms.

computer vision and pattern recognition | 2016

FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters

Forrest N. Iandola; Matthew W. Moskewicz; Khalid Ashraf; Kurt Keutzer

Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the training of deep neural networks. The speed and scalability of distributed algorithms is almost always limited by the overhead of communicating between servers, DNN training is not an exception to this rule. Therefore, the key consideration here is to reduce communication overhead wherever possible, while not degrading the accuracy of the DNN models that we train. Our approach has three key pillars. First, we select network hardware that achieves high bandwidth between GPU servers - Infiniband or Cray interconnects are ideal for this. Second, we consider a number of communication algorithms, and we find that reduction trees are more efficient and scalable than the traditional parameter server approach. Third, we optionally increase the batch size to reduce the total quantity of communication during DNN training, and we identify hyperparameters that allow us to reproduce the small-batch accuracy while training with large batch sizes. When training GoogLeNet and Network-in-Network on ImageNet, we achieve a 47x and 39x speedup, respectively, when training on a cluster of 128 GPUs.

conference on decision and control | 2013

Optimal load management system for Aircraft Electric Power distribution

Mehdi Maasoumy; Pierluigi Nuzzo; Forrest N. Iandola; Maryam Kamgarpour; Alberto L. Sangiovanni-Vincentelli; Claire J. Tomlin

Aircraft Electric Power Systems (EPS) route power from generators to vital avionic loads by configuring a set of electronic control switches denoted as contactors. In this paper, we address the problem of designing a hierarchical optimal control strategy for the EPS contactors in the presence of system faults. We first formalize the system connectivity, safety and performance requirements in terms of mathematical constraints. We then show that the EPS control problem can be formulated as a Mixed-Integer Linear Program (MILP) and efficiently solved to yield load shedding, source allocation, contactor switching and battery charging policies, while optimizing a number of performance metrics, such as the number of used generators and shed loads. This solution is then integrated into a hierarchical control scheme consisting of two layers of controllers. The high-level controller provides control optimality by solving the MILP within a receding horizon approach. The low-level controller handles system faults, by directly actuating the EPS contactors, and implements the solution from the high-level controller only if it is safe. Simulation results confirm the effectiveness of the proposed approach.

international conference on multimedia retrieval | 2015

Audio-Based Multimedia Event Detection with DNNs and Sparse Sampling

Khalid Ashraf; Benjamin Elizalde; Forrest N. Iandola; Matthew W. Moskewicz; Julia Bernd; Gerald Friedland; Kurt Keutzer

This paper presents advances in analyzing audio content information to detect events in videos, such as a parade or a birthday party. We developed a set of tools for audio processing within the predominantly vision-focused deep neural network (DNN) framework Caffe. Using these tools, we show, for the first time, the potential of using only a DNN for audio-based multimedia event detection. Training DNNs for event detection using the entire audio track from each video causes a computational bottleneck. Here, we address this problem by developing a sparse audio frame-sampling method that improves event-detection speed and accuracy. We achieved a 10 percentage-point improvement in event classification accuracy, with a 200x reduction in the number of training input examples as compared to using the entire track. This reduction in input feature volume led to a 16x reduction in the size of the DNN architecture and a 300x reduction in training time. We applied our method using the recently released YLI-MED dataset and compared our results with a state-of-the-art system and with results reported in the literature for TRECVIDMED. Our results show much higher MAP scores compared to a baseline i-vector system - at a significantly reduced computational cost. The speed improvement is relevant for processing videos on a large scale, and could enable more effective deployment in mobile systems.

computer vision and pattern recognition | 2017

SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving

Bichen Wu; Forrest N. Iandola; Peter H. Jin; Kurt Keutzer

Object detection is a crucial task for autonomous driving. In addition to requiring high accuracy to ensure safety, object detection for autonomous driving also requires realtime inference speed to guarantee prompt vehicle control, as well as small model size and energy efficiency to enable embedded system deployment.,,,,,, In this work, we propose SqueezeDet, a fully convolutional neural network for object detection that aims to simultaneously satisfy all of the above constraints. In our network we use convolutional layers not only to extract feature maps, but also as the output layer to compute bounding boxes and class probabilities. The detection pipeline of our model only contains a single forward pass of a neural network, thus it is extremely fast. Our model is fullyconvolutional, which leads to small model size and better energy efficiency. Finally, our experiments show that our model is very accurate, achieving state-of-the-art accuracy on the KITTI [10] benchmark. The source code of SqueezeDet is open-source released.

international conference on image processing | 2013

Communication-minimizing 2D convolution in GPU registers

Forrest N. Iandola; David Sheffield; Michael J. Anderson; Phitchaya Mangpo Phothilimthana; Kurt Keutzer

2D image convolution is ubiquitous in image processing and computer vision problems such as feature extraction. Exploiting parallelism is a common strategy for accelerating convolution. Parallel processors keep getting faster, but algorithms such as image convolution remain memory bounded on parallel processors such as GPUs. Therefore, reducing memory communication is fundamental to accelerating image convolution. To reduce memory communication, we reorganize the convolution algorithm to prefetch image regions to register, and we do more work per thread with fewer threads. To enable portability to future architectures, we implement a convolution autotuner that sweeps the design space of memory layouts and loop unrolling configurations. We focus on convolution with small filters (2×2-7×7), but our techniques can be extended to larger filter sizes. Depending on filter size, our speedups on two NVIDIA architectures range from 1.2× to 4.5× over state-of-the-art GPU libraries.

arXiv: Computer Vision and Pattern Recognition | 2017

Shallow Networks for High-accuracy Road Object-detection.

Khalid Ashraf; Bichen Wu; Forrest N. Iandola; Mattthew W. Moskewicz; Kurt Keutzer

The ability to automatically detect other vehicles on the road is vital to the safety of partially-autonomous and fully-autonomous vehicles. Most of the high-accuracy techniques for this task are based on R-CNN or one of its faster variants. In the research community, much emphasis has been applied to using 3D vision or complex R-CNN variants to achieve higher accuracy. However, are there more straightforward modifications that could deliver higher accuracy? Yes. We show that increasing input image resolution (i.e. upsampling) offers up to 12 percentage-points higher accuracy compared to an off-the-shelf baseline. We also find situations where earlier/shallower layers of CNN provide higher accuracy than later/deeper layers. We further show that shallow models and upsampled images yield competitive accuracy. Our findings contrast with the current trend towards deeper and larger models to achieve high accuracy in domain specific detection tasks.

international conference on hardware software codesign and system synthesis | 2017

Keynote: small neural nets are beautiful: enabling embedded systems with small deep-neural- network architectures

Forrest N. Iandola; Kurt Keutzer

Over the last five years Deep Neural Nets have offered more accurate solutions to many problems in speech recognition, and computer vision, and these solutions have surpassed a threshold of acceptability for many applications. As a result, Deep Neural Networks have supplanted other approaches to solving problems in these areas, and enabled many new applications. While the design of Deep Neural Nets is still something of an art form, in our work we have found basic principles of design space exploration used to develop embedded microprocessor architectures to be highly applicable to the design of Deep Neural Net architectures. In particular, we have used these design principles to create a novel Deep Neural Net called SqueezeNet that requires only 480KB of storage for its model parameters. We have further integrated all these experiences to develop something of a playbook for creating small Deep Neural Nets for embedded systems.

wireless and mobile computing, networking and communications | 2016

Boda-RTC: Productive generation of portable, efficient code for convolutional neural networks on mobile computing platforms

Matthew W. Moskewicz; Forrest N. Iandola; Kurt Keutzer

The popularity of neural networks (NNs) spans academia [1], industry [2], and popular culture [3]. In particular, convolutional neural networks (CNNs) have been applied to many image based machine learning tasks and have yielded strong results [4]. The availability of hardware/software systems for efficient training and deployment of large and/or deep CNN models is critical for the continued success of the field [5] [1]. Early systems for NN computation focused on leveraging existing dense linear algebra techniques and libraries [6] [7]. Current approaches use low-level machine specific programming [8] and/or closed-source, purpose-built vendor libraries [9]. In this work, we present an open source system that, compared to existing approaches, achieves competitive computational speed while achieving significantly greater portability. We achieve this by targeting the vendor-neutral OpenCL platform [10] using a code-generation approach. We argue that our approach allows for both: (1) the rapid development of new computational kernels for existing hardware targets, and (2) the rapid tuning of existing computational kernels for new hardware targets. Results are presented for a case study of targeting the Qualcomm Snapdragon 820 mobile computing platform [11] for CNN deployment.

Explore More