Eric Cosatto | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eric Cosatto is active.

Explore More

Publication

Featured researches published by Eric Cosatto.

international conference on image processing | 1998

An image transform approach for HMM based automatic lipreading

Gerasimos Potamianos; Hans Peter Graf; Eric Cosatto

This paper concentrates on the visual front end for hidden Markov model based automatic lipreading. Two approaches for extracting features relevant to lipreading, given image sequences of the speakers mouth region, are considered: a lip contour based feature approach which first obtains estimates of the speakers lip contours and subsequently extracts features from them; and an image transform based approach, which obtains a compressed representation of the image pixel values that contain the speakers mouth. Various possible features are considered in each approach, and experimental results on a number of visual-only recognition tasks are reported. It is shown that the image transform based approach results in superior lipreading performance. In addition, feature mean subtraction is demonstrated to improve the performance in multi-speaker and speaker-independent recognition tasks. Finally, the effects of video degradations to image transform based automatic lipreading are studied. It is shown that lipreading performance dramatically deteriorates below a 10 Hz field rate, and that image transform features are robust to noise and compression artifacts.

application specific systems architectures and processors | 2009

A Massively Parallel Coprocessor for Convolutional Neural Networks

Murugan Sankaradas; Venkata Jakkula; Srihari Cadambi; Srimat T. Chakradhar; Igor Durdanovic; Eric Cosatto; Hans Peter Graf

We present a massively parallel coprocessor for accelerating Convolutional Neural Networks (CNNs), a class of important machine learning algorithms. The coprocessor functional units, consisting of parallel 2D convolution primitives and programmable units performing sub-sampling and non-linear functions specific to CNNs, implement a “meta-operator” to which a CNN may be compiled to. The coprocessor is serviced by distributed off-chip memory banks with large data bandwidth. As a key feature, we use low precision data and further increase the effective memory bandwidth by packing multiple words in every memory operation, and leverage the algorithm’s simple data access patterns to use off-chip memory as a scratchpad for intermediate data, critical for CNNs. A CNN is mapped to the coprocessor hardware primitives with instructions to transfer data between the memory and coprocessor. We have implemented a prototype of the CNN coprocessor on an off-the-shelf PCI FPGA card with a single Xilinx Virtex5 LX330T FPGA and 4 DDR2 memory banks totaling 1GB. The coprocessor prototype can process at the rate of 3.4 billion multiply accumulates per second (GMACs) for CNN forward propagation, a speed that is 31x faster than a software implementation on a 2.2 GHz AMD Opteron processor. For a complete face recognition application with the CNN on the coprocessor and the rest of the image processing tasks on the host, the prototype is 6-10x faster, depending on the host-coprocessor bandwidth.

Proceedings Computer Animation '98 (Cat. No.98EX169) | 1998

Sample-based synthesis of photo-realistic talking heads

Eric Cosatto; Hans Peter Graf

The paper describes a system that generates photo-realistic video animations of talking heads. First the system derives head models from existing video footage using image recognition techniques. It locates, extracts and labels facial parts such as mouth, eyes, and eyebrows into a compact library. Then, using these face models and a text-to-speech synthesizer, it synthesizes new video sequences of the head where the lips are in synchrony with the accompanying soundtrack. Emotional cues and conversational signals are produced by combining head movements, raising eyebrows, wide open eyes, etc. with the mouth animation. For these animations to be believable, care has to be taken aligning the facial parts so that they blend smoothly into each other and produce seamless animations. Our system uses precise multi channel facial recognition techniques to track facial parts, and it derives the exact 3D position of the head, enabling the automatic extraction of normalized face parts. Such talking head animations are useful because they generally increase intelligibility of the human machine interface in applications where content needs to be narrated to the user, such as educative software.

international conference on pattern recognition | 2008

Grading nuclear pleomorphism on histological micrographs

Eric Cosatto; Matthew L. Miller; Hans Peter Graf; John S. Meyer

A mainstay in cancer diagnostics is the classification or grading of cell nuclei based on their appearance. While the analysis of cytological samples has been automated successfully for a long time, the complexity of histological tissue samples has prevented a reliable classification with machine vision techniques. We approach this complex problem in multiple stages, analyzing first image quality, staining quality, and tissue appearance, before segmenting nuclei and finally classifying or grading areas of tissue. The key step is the training of a classifier to judge the nuclei segmentation quality. Using active learning techniques, we train this classifier to identify problems in the image as well as weaknesses of the image analysis tools. This way we obtain robust nuclear segmentation allowing precise measurements of features that can be used safely for classification. We validate our findings on several hundred cases of breast cancer, demonstrating that automatic pleomorphism grading is possible with high accuracy. This technique can provide a stable and objective basis for what has been a subjective process that suffers from low reproducibility.

Journal of Pathology Informatics | 2013

Classification of mitotic figures with convolutional neural networks and seeded blob features.

Christopher Malon; Eric Cosatto

Background: The mitotic figure recognition contest at the 2012 International Conference on Pattern Recognition (ICPR) challenges a system to identify all mitotic figures in a region of interest of hematoxylin and eosin stained tissue, using each of three scanners (Aperio, Hamamatsu, and multispectral). Methods: Our approach combines manually designed nuclear features with the learned features extracted by convolutional neural networks (CNN). The nuclear features capture color, texture, and shape information of segmented regions around a nucleus. The use of a CNN handles the variety of appearances of mitotic figures and decreases sensitivity to the manually crafted features and thresholds. Results : On the test set provided by the contest, the trained system achieves F1 scores up to 0.659 on color scanners and 0.589 on multispectral scanner. Conclusions : We demonstrate a powerful technique combining segmentation-based features with CNN, identifying the majority of mitotic figures with a fair precision. Further, we show that the approach accommodates information from the additional focal planes and spectral bands from a multi-spectral scanner without major redesign.

field-programmable custom computing machines | 2009

A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines

Srihari Cadambi; Igor Durdanovic; Venkata Jakkula; Murugan Sankaradass; Eric Cosatto; Srimat T. Chakradhar; Hans Peter Graf

We present a massively parallel FPGA-based coprocessor for Support Vector Machines (SVMs), a machine learning algorithm whose applications include recognition tasks such as learning scenes, situations and concepts, and reasoning tasks such as analyzing the recognized scenes and semantics. The coprocessor architecture, targeted at both SVM training and classification, is based on clusters of vector processing elements (VPEs) operating in single-instruction multiple data (SIMD) mode to take advantage of large amounts of data parallelism in the application. We use the FPGA’s DSP elements as parallel multiply-accumulators (MACs), a core computation in SVMs. A key feature of the architecture is that it is customized to low precision arithmetic which permits one DSP unit to perform two or more MACs in parallel. Low precision also reduces the required number of parallel off-chip memory accesses by packing multiple data words on the FPGA-memory bus. We have built a prototype using an off-the-shelf PCI-based FPGA card with a Xilinx Virtex 5 FPGA and 1GB DDR2 memory. For SVM training, we observe application-level end-to-end computation speeds of over 9 billion multiply-accumulates per second (GMACs). For SVM classification, using data packing, the application speed increases to 14 GMACs. The FPGA-based system is about 20x faster than a dual Opteron 2.2 GHz processor CPU, and dissipates around 10W of power.

international conference on multimedia and expo | 2000

Audio-visual unit selection for the synthesis of photo-realistic talking-heads

Eric Cosatto; Gerasimos Potamianos; Hans Peter Graf

This paper investigates audio-visual unit selection for the synthesis of photo-realistic, speech-synchronized talking-head animations. These animations are synthesized from recorded video samples of a subject speaking in front of a camera, resulting in a photo-realistic appearance. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. Synthesizing a new speech animation from these recorded units starts with audio speech and its phonetic annotation from a text-to-speech synthesizer. Then, optimal image units are selected from the recorded set using a Viterbi search through a graph of candidate image units. Costs are attached to the nodes and arcs of the graph that are computed from similarities in both the acoustic and visual domain. While acoustic similarities are computed by simple phonetic matching, visual similarities are estimated using a hierarchical metric that uses high-level features (position and sizes of facial parts) and low-level features (projection of the image pixels on principal components of the database). This method preserves coarticulation and temporal coherence, producing smooth, lip-synched animations. Once the database has been prepared, this system can produce animations from ASCII text fully automatically.

international conference on acoustics, speech, and signal processing | 2002

Triphone based unit selection for concatenative visual speech synthesis

Fu Jie Huang; Eric Cosatto; Hans Peter Graf

Concatenative visual speech synthesis selects frames from a large recorded video database of mouth shapes to generate photo-realistic talking head sequences. The synthesized sequence must exhibit precise lip-sound synchronization and smooth articulation. The selection process for finding the best lip shapes has been computationally expensive [1], limiting the speed of the synthesis to far less than real time. In this paper, we propose a rapid unit selection approach based on triphone units. Experiments show that this algorithm can make the synthesis, excluding the rendering, 50 times faster than real-time on a standard desktop PC. We also developed a metric to test the quality of the synthesis objectively, and show that this measurement is consistent with subjective measurement results.

systems man and cybernetics | 1997

Robust recognition of faces and facial features with a multi-modal system

Hans Peter Graf; Eric Cosatto; Makis Potamianos

We use a combination of shape and texture analysis, color segmentation and motion information for finding the positions of whole faces plus the precise location and shape of the mouth. Combining several modalities improves the robustness of the analysis considerably and allows handling of a wide variety of conditions. Mouth shapes can enhance the accuracy of speech recognition systems, in particular under noisy conditions. However, finding the shape of the mouth precisely under unrestricted conditions is a challenging task. To be of practical value a system must handle different complexions of people as well as variations in lighting, different head orientations, moustaches, beards, and glasses. To deal with such a diversity of conditions, our system includes several different models of the face and the mouth area. New faces are compared to these models and the most representative one is chosen for the analysis. We tested our system on samples from video sequences of 50 different speakers. When trained on a particular person, the mouth location is found correctly in more than 98% of the images. When trained on a random set of 10 people from the database, the system handles typically 87% of the other people correctly. In speaker-dependent lip reading experiments we observed 93% word accuracy on five-digit strings.

conference on soft computing as transdisciplinary science and technology | 2008

Identifying histological elements with convolutional neural networks

Christopher Malon; Matthew L. Miller; Harold Christopher Burger; Eric Cosatto; Hans Peter Graf

Histological analysis on stained biopsy samples requires recognizing many kinds of local and structural details, with some awareness of context. Machine learning algorithms such as convolutional networks can be powerful tools for such problems, but often there may not be enough training data to exploit them to their full potential. In this paper, we show how convolutional networks can be combined with appropriate image analysis to achieve high accuracies on three very different tasks in breast and gastric cancer grading, despite the challenge of limited training data. The three problems are to count mitotic figures in the breast, to recognize epithelial layers in the stomach, and to detect signet ring cells.

Explore More