Is this you? Create Your Porfile

Jürgen Schmidhuber

Dalle Molle Institute for Artificial Intelligence Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jürgen Schmidhuber is active.

Explore More

Publication

Featured researches published by Jürgen Schmidhuber.

Neural Computation | 1997

Long short-term memory

Sepp Hochreiter; Jürgen Schmidhuber

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiters (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

computer vision and pattern recognition | 2012

Multi-column deep neural networks for image classification

Dan Ciregan; Ueli Meier; Jürgen Schmidhuber

Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.

Neural Computation | 2000

Learning to Forget: Continual Prediction with LSTM

Felix A. Gers; Jürgen Schmidhuber; Fred Cummins

Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the networks internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel, adaptive forget gate that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them, and in an elegant way.

international conference on machine learning | 2006

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

Alex Graves; Santiago Fernández; Faustino J. Gomez; Jürgen Schmidhuber

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

Neural Networks | 2005

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Alex Graves; Jürgen Schmidhuber

In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.

computer and communications security | 2010

Modeling attacks on physical unclonable functions

Ulrich Rührmair; Frank Sehnke; Jan Sölter; Gideon Dror; Srinivas Devadas; Jürgen Schmidhuber

We show in this paper how several proposed Physical Unclonable Functions (PUFs) can be broken by numerical modeling attacks. Given a set of challenge-response pairs (CRPs) of a PUF, our attacks construct a computer algorithm which behaves indistinguishably from the original PUF on almost all CRPs. This algorithm can subsequently impersonate the PUF, and can be cloned and distributed arbitrarily. This breaks the security of essentially all applications and protocols that are based on the respective PUF. The PUFs we attacked successfully include standard Arbited PUFs and Ring Oscillator PUFs of arbitrary sizes, and XO Arbiter PUFs, Lightweight Secure PUFs, and Feed-Forward Arbiter PUFs of up to a given size and complexity. Our attacks are based upon various machine learning techniques including Logistic Regression and Evolution Strategies. Our work leads to new design requirements for secure electrical PUFs, and will be useful to PUF designers and attackers alike.

international joint conference on artificial intelligence | 2011

Flexible, high performance convolutional neural networks for image classification

Dan C. Ciresan; Ueli Meier; Jonathan Masci; Luca Maria Gambardella; Jürgen Schmidhuber

We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way. Our deep hierarchical architectures achieve the best published results on benchmarks for object classification (NORB, CIFAR10) and handwritten digit recognition (MNIST), with error rates of 2.53%, 19.51%, 0.35%, respectively. Deep nets trained by simple back-propagation perform better than more shallow ones. Learning is surprisingly rapid. NORB is completely trained within five epochs. Test error rates on MNIST drop to 2.42%, 0.97% and 0.48% after 1, 3 and 17 epochs, respectively.

international conference on artificial neural networks | 2011

Stacked convolutional auto-encoders for hierarchical feature extraction

Jonathan Masci; Ueli Meier; Dan C. Ciresan; Jürgen Schmidhuber

We present a novel convolutional auto-encoder (CAE) for unsupervised feature learning. A stack of CAEs forms a convolutional neural network (CNN). Each CAE is trained using conventional on-line gradient descent without additional regularization terms. A max-pooling layer is essential to learn biologically plausible features consistent with those found by previous approaches. Initializing a CNN with filters of a trained CAE stack yields superior performance on a digit (MNIST) and an object recognition (CIFAR10) benchmark.

Neural Networks | 2012

2012 Special Issue: Multi-column deep neural network for traffic sign classification

Dan C. Ciresan; Ueli Meier; Jonathan Masci; Jürgen Schmidhuber

We describe the approach that won the final phase of the German traffic sign recognition benchmark. Our method is the only one that achieved a better-than-human recognition rate of 99.46%. We use a fast, fully parameterizable GPU implementation of a Deep Neural Network (DNN) that does not require careful design of pre-wired feature extractors, which are rather learned in a supervised way. Combining various DNNs trained on differently preprocessed data into a Multi-Column DNN (MCDNN) further boosts recognition performance, making the system insensitive also to variations in contrast and illumination.

international conference on document analysis and recognition | 2011

Convolutional Neural Network Committees for Handwritten Character Classification

Dan C. Ciresan; Ueli Meier; Luca Maria Gambardella; Jürgen Schmidhuber

In 2010, after many years of stagnation, the MNIST handwriting recognition benchmark record dropped from 0.40% error rate to 0.35%. Here we report 0.27% for a committee of seven deep CNNs trained on graphics cards, narrowing the gap to human performance. We also apply the same architecture to NIST SD 19, a more challenging dataset including lower and upper case letters. A committee of seven CNNs obtains the best results published so far for both NIST digits and NIST letters. The robustness of our method is verified by analyzing 78125 different 7-net committees.

Explore More