Patrice Y. Simard | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patrice Y. Simard is active.

Explore More

Publication

Featured researches published by Patrice Y. Simard.

IEEE Transactions on Neural Networks | 1994

Learning long-term dependencies with gradient descent is difficult

Yoshua Bengio; Patrice Y. Simard; Paolo Frasconi

Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered.

Statistical Science | 1998

Metrics and Models for Handwritten Character Recognition

Trevor Hastie; Patrice Y. Simard

Figure 1 shows some handwritten digits taken from US envelopes. Each image consists of 16 × 16 pixels of greyscale values ranging from 0 – 255. These 256 pixel values are regarded as a feature vector to be used as input to a classifier, which will automatically assign a digit class based on the pixel values.

international conference on pattern recognition | 1994

Memory-based character recognition using a transformation invariant metric

Patrice Y. Simard; Yann Le Cun; John S. Denker

Memory-based classification algorithms such as radial basis functions or K-nearest neighbors often rely on simple distances (Euclidean distance, Hamming distance, etc.), which are rarely meaningful on pattern vectors. More complex better suited distance measures are often expensive and rather ad-hoc. We propose a new distance measure which: 1) can be made locally invariant to any set of transformations of the input; and 2) can be computed efficiently. We tested the method on large handwritten character databases provided by the US Post Office and NIST. Using invariances with respect to translation, rotation, scaling, skewing and line thickness, the method outperformed all other systems on a small (less than 10,000 patterns) database and was competitive on our largest (60,000 patterns) database.

international conference on microelectronics | 1994

Hardware implementation of the backpropagation without multiplication

Jocelyn Cloutier; Patrice Y. Simard

The backpropagation algorithm has been modified to work without any multiplications and to tolerate computations with a low resolution, which makes it more attractive for a hardware implementation. Numbers are represented in floating-point format with 1 bit mantissa and 2 bits in the exponent for the states, and 1 bit mantissa and 4 bit exponent for the gradients, while the weights are 16 bit fixed-point numbers. In this way, all the computations can be executed with shift and add operations. Large networks with over 100000 weights were trained and demonstrated the same performance as networks computed with full precision. An estimate of a circuit implementation shows that a large network can be placed on a single chip, reaching more than 1 billion weight updates per second. A speedup is also obtained on any machine where a multiplication is slower than a shift operation.

international conference on pattern recognition | 1992

An efficient algorithm for learning invariance in adaptive classifiers

Patrice Y. Simard; Y. Le Cun; John S. Denker; B. Victorri

In many machine learning applications, one has not only training data but also some high-level information about certain invariances that the system should exhibit. In character recognition, for example, the answer should be invariant with respect to small spatial distortions in the input images (translations, rotations, scale changes, etcetera). The authors have implemented a scheme that minimizes the derivative of the classifier outputs with respect to distortion operators. This not only produces tremendous speed advantages, but also provides a powerful language for specifying what generalizations the network can perform.<<ETX>>

Journal of Electronic Imaging | 1998