Hauke Schramm
Philips
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hauke Schramm.
IEEE Transactions on Medical Imaging | 2008
Olivier Ecabert; Jochen Peters; Hauke Schramm; Cristian Lorenz; J. von Berg; Matthew J. Walker; Mani Vembar; Mark E. Olszewski; K. Subramanyan; G. Lavi; Jürgen Weese
Automatic image processing methods are a pre-requisite to efficiently analyze the large amount of image data produced by computed tomography (CT) scanners during cardiac exams. This paper introduces a model-based approach for the fully automatic segmentation of the whole heart (four chambers, myocardium, and great vessels) from 3-D CT images. Model adaptation is done by progressively increasing the degrees-of-freedom of the allowed deformations. This improves convergence as well as segmentation accuracy. The heart is first localized in the image using a 3-D implementation of the generalized Hough transform. Pose misalignment is corrected by matching the model to the image making use of a global similarity transformation. The complex initialization of the multicompartment mesh is then addressed by assigning an affine transformation to each anatomical region of the model. Finally, a deformable adaptation is performed to accurately match the boundaries of the patients anatomy. A mean surface-to-surface error of 0.82 mm was measured in a leave-one-out quantitative validation carried out on 28 images. Moreover, the piecewise affine transformation introduced for mesh initialization and adaptation shows better interphase and interpatient shape variability characterization than commonly used principal component analysis.
IEEE Transactions on Speech and Audio Processing | 2000
Bernd Souvignier; Andreas Kellner; Bernhard Rueber; Hauke Schramm; Frank Seide
We present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and flexible dialog flow similar to human-human interaction. This imposes the challenging task to recognize and interpret user input, where he/she is allowed to choose from an unrestricted vocabulary and an infinite set of possible formulations. We therefore put emphasis on strategies that make the system more robust while still maintaining a high level of naturalness and flexibility. In view of this paradigm, we found that two fundamental principles characterize many of the proposed methods: to consider available sources of information as early as possible; and to keep alternative hypotheses and delay the decision for a single option as long as possible. We describe how our system architecture caters to incorporating application specific knowledge, including, for example, database constraints, in the determination of the best sentence hypothesis for a user turn. On the next higher level, we use the dialog history to assess the plausibility of a sentence hypothesis by applying consistency checks with information items from previous user turns. In particular, we demonstrate how combination decisions over several turns can be exploited to boost the recognition performance of the system.
medical image computing and computer assisted intervention | 2007
Jochen Peters; Olivier Ecabert; Carsten Meyer; Hauke Schramm; Reinhard Kneser; Alexandra Groth; Jürgen Weese
We present a fully automatic segmentation algorithm for the whole heart (four chambers, left ventricular myocardium and trunks of the aorta, the pulmonary artery and the pulmonary veins) in cardiac MR image volumes with nearly isotropic voxel resolution, based on shape-constrained deformable models. After automatic model initialization and reorientation to the cardiac axes, we apply a multi-stage adaptation scheme with progressively increasing degrees of freedom. Particular attention is paid to the calibration of the MR image intensities. Detailed evaluation results for the various anatomical heart regions are presented on a database of 42 patients. On calibrated images, we obtain an average segmentation error of 0.76mm.
Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376) | 1998
Andreas Kellner; Bernhard Rueber; Hauke Schramm
Recognition of large numbers of different names is the central problem in automatic directory assistance services and many other applications for spoken language dialogue systems. This paper investigates a methodology of stochastically combining N-best lists retrieved from multiple user utterances with the telephone database as an additional knowledge source. This strategy is used in a prototype of a fully automated directory information system which is designed to cover a whole country. After the city has been selected, the user is asked to spell and say the name of the desired person and if necessary also the first name and street. The number of active database entries is reduced in every turn until only a single database entry is left. Results for different recognition strategies are presented on a real-life data collection for databases of various sizes with up to 1 million entries (city of Berlin). The experiments show that a substantial part of all simple requests can be automated with the strategy presented (>80% correctly recognized, 10% rejected).
Speech Communication | 2006
Carsten Meyer; Hauke Schramm
Abstract Boosting algorithms have been successfully used to improve performance in a variety of classification tasks. Here, we suggest an approach to apply a popular boosting algorithm (called “AdaBoost.M2”) to Hidden Markov Model based speech recognizers, at the level of utterances. In a variety of recognition tasks we show that boosting significantly improves the best test error rates obtained with standard maximum likelihood training. In addition, results in several isolated word decoding experiments show that boosting may also provide further performance gains over discriminative training, when both training techniques are combined. In our experiments this also holds when comparing final classifiers with a similar number of parameters and when evaluating in decoding conditions with lexical and acoustic mismatch to the training conditions. Moreover, we present an extension of our algorithm to large vocabulary continuous speech recognition, allowing online recognition without further processing of N-best lists or word lattices. This is achieved by using a lexical approach for combining different acoustic models in decoding. In particular, we introduce a weighted summation over an extended set of alternative pronunciation models representing both the boosted models and the baseline model. In this way, arbitrarily long utterances can be recognized by the boosted ensemble in a single pass decoding framework. Evaluation results are presented on two tasks: a real-life spontaneous speech dictation task with a 60k word vocabulary and Switchboard.
Medical Imaging 2006: Image Processing | 2006
Hauke Schramm; Olivier Ecabert; Jochen Peters; Vasanth Philomin; Juergen Weese
An automatic procedure for detecting and segmenting anatomical objects in 3-D images is necessary for achieving a high level of automation in many medical applications. Since todays segmentation techniques typically rely on user input for initialization, they do not allow for a fully automatic workflow. In this work, the generalized Hough transform is used for detecting anatomical objects with well defined shape in 3-D medical images. This well-known technique has frequently been used for object detection in 2-D images and is known to be robust and reliable. However, its computational and memory requirements are generally huge, especially in case of considering 3-D images and various free transformation parameters. Our approach limits the complexity of the generalized Hough transform to a reasonable amount by (1) using object prior knowledge during the preprocessing in order to suppress unlikely regions in the image, (2) restricting the flexibility of the applied transformation to only scaling and translation, and (3) using a simple shape model which does not cover any inter-individual shape variability. Despite these limitations, the approach is demonstrated to allow for a coarse 3-D delineation of the femur, vertebra and heart in a number of experiments. Additionally it is shown that the quality of the object localization is in nearly all cases sufficient to initialize a successful segmentation using shape constrained deformable models.
Computer Science - Research and Development | 2011
Heike Ruppertshofen; Cristian Lorenz; Sarah Schmidt; Peter Beyerlein; Zein Salah; Georg Rose; Hauke Schramm
A fully automatic iterative training approach for the generation of discriminative shape models for usage in the Generalized Hough Transform (GHT) is presented. The method aims at capturing the shape variability of the target object contained in the training data as well as identifying confusable structures (anti-shapes) and integrating this information into one model. To distinguish shape and anti-shape points and to determine their importance, an individual positive or negative weight is estimated for each model point by means of a discriminative training technique. The model is built from edge points surrounding the target point and the most confusable structure as identified by the GHT. Through an iterative approach, the performance of the model is gradually improved by extending the training dataset with images, where the current model failed to localize the target point. The proposed method is successfully tested on a set of 670 long-leg radiographs, where it achieves a localization rate of 74–97% for the respective tasks.
international conference on acoustics, speech, and signal processing | 2000
Hauke Schramm; Xavier L. Aubert
The paper describes the improved handling of multiple pronunciations achieved in the Philips research decoder by (1) incorporating some prior information about their distributions and (2) combining the acoustic contributions of concurrent alternate word hypotheses. Starting from a baseline system where multiple pronunciations are treated as word copies without priors, an extension of the usual Viterbi decoding is presented which integrates unigram priors in a weighted sum of acoustic probabilities. Several approximations are discussed leading to new decoding aspects. Experimental results are presented for US broadcast news recordings. It is shown that the use of unigram priors has a clear positive impact on both error rate and decoding cost while the sum over multiple pronunciation contributions brings another small improvement. An overall 4% reduction of the error rate is achieved on the HUB-4 evaluation sets of 97 and 98.
IEEE Journal of Biomedical and Health Informatics | 2013
Markus Harmsen; Benedikt Fischer; Hauke Schramm; Thomas Seidl; Thomas Martin Deserno
Bone age assessment (BAA) on hand radiographs is a frequent and time-consuming task in radiology. We present a method for (semi)automatic BAA which is done in several steps: 1) extract 14 epiphyseal regions from the radiographs; 2) for each region, retain image features using the image retrieval in medical application framework; 3) use these features to build a classifier model (training phase); 4) evaluate performance on cross-validation schemes (testing phase); 5) classify unknown hand images (application phase). In this paper, we combine a support vector machine (SVM) with cross correlation to a prototype image for each class. These prototypes are obtained choosing one random hand per class. A systematic evaluation is presented comparing nominal- and real-valued SVM with k nearest neighbor classification on 1097 hand radiographs of 30 diagnostic classes (0-19 years). Mean error in age prediction is 1.0 and 0.83 years for 5-NN and SVM, respectively. Accuracy of nominal- and real-valued SVM based on six prominent regions (prototypes) is 91.57% and 96.16%, respectively, for accepting about two years age range.
Bildverarbeitung für die Medizin | 2011
Markus Brunk; Heike Ruppertshofen; Sarah Schmidt; Peter Beyerlein; Hauke Schramm
We present an approach for automatic bone age classification from hand x-ray images using the Discriminative Generalized Hough Transform (DGHT). To this end, a region, characteristic for the bone age (e.g. an epiphyseal plate), is localized and subsequently classified to determine the corresponding age. Both steps are realized using the DGHT, whereat the difference of the approaches lies within the employed models. The localization model is able to localize the target region over a broad age range and therefore focuses on the common features of all ages. The model for the classification, in contrast, focuses on the age discriminating features. The classification model consists of several submodels, one for each age class, where each submodel contains information about its age characteristics as well as discriminating features. In a first test the new method was applied to classify images into the two classes 11–12 and 14–15 years and achieved of 95% correct classifications.