Manfred Mücke | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Manfred Mücke is active.

Explore More

Publication

Featured researches published by Manfred Mücke.

international conference on conceptual structures | 2011

Effects of Reduced Precision on Floating-Point SVM Classification Accuracy

Bernd Lesser; Manfred Mücke; Wilfried N. Gansterer

Abstract There is growing interest in performing ever more complex classification tasks on mobile and embedded devices in real-time, which results in the need for e_cient implementations of the respective algorithms. Support vector machines (SVMs) represent a powerful class of nonlinear classifiers, and reducing the working precision represents a promising approach to achieving e_cient implementations of the SVM classification phase. However, the relationship between SVM classification accuracy and the arithmetic precision used is not yet su_ciently understood. We investigate this relationship in floating-point arithmetic and illustrate that often a large reduction in the working precision of the classification process is possible without loss in classification accuracy. Moreover, we investigate the adaptation of bounds on allowable SVM parameter perturbations in order to estimate the lowest possible working precision in floating-point arithmetic. Among the three representative data sets considered in this paper, none requires a precision higher than 15 bit, which is a considerable reduction from the 53 bit used in double precision floating-point arithmetic. Furthermore, we demonstrate analytic bounds on the working precision for SVMs with Gaussian kernel providing good predictions of possible reductions in the working precision without sacrificing classification accuracy.

european conference on machine learning | 2012

Bayesian network classifiers with reduced precision parameters

Sebastian Tschiatschek; Peter Reinprecht; Manfred Mücke; Franz Pernkopf

Bayesian network classifiers (BNCs) are probabilistic classifiers showing good performance in many applications. They consist of a directed acyclic graph and a set of conditional probabilities associated with the nodes of the graph. These conditional probabilities are also referred to as parameters of the BNCs. According to common belief, these classifiers are insensitive to deviations of the conditional probabilities under certain conditions. The first condition is that these probabilities are not too extreme, i.e. not too close to 0 or 1. The second is that the posterior over the classes is significantly different. In this paper, we investigate the effect of precision reduction of the parameters on the classification performance of BNCs. The probabilities are either determined generatively or discriminatively. Discriminative probabilities are typically more extreme. However, our results indicate that BNCs with discriminatively optimized parameters are almost as robust to precision reduction as BNCs with generatively optimized parameters. Furthermore, even large precision reduction does not decrease classification performance significantly. Our results allow the implementation of BNCs with less computational complexity. This supports application in embedded systems using floating-point numbers with small bit-width. Reduced bit-widths further enable to represent BNCs in the integer domain while maintaining the classification performance.

european conference on parallel processing | 2010

Peak performance model for a custom precision floating-point dot product on FPGAs

Manfred Mücke; Bernd Lesser; Wilfried N. Gansterer

FPGAs have the native feature that reduced resource usage of single operators can be directly translated in additional parallelism. For floating-point (FP) operators, such reduced resource usage can be achieved by reducing the mantissa bit width. The work presented here pursues two objectives: First, the maximum number of operands of a parallel dot product architecture is explored experimentally on an FPGA for different custom precision FP number formats. Given the resources of this FPGA, it is shown that based on non-pipelined basic FP operators, a dot product for input vector size 21, 57 and 123 can be implemented for double-, single- and half-precision, respectively. This corresponds to a respective peak performance of 1, 3.2 and 9.9 GFlop/s. Second, it is shown that the maximum dot product peak performance as a function of used precision can be modeled by a function of the form P(p) = c1 + c2 ċ pc3, given a certain type of FPGA, library and synthesis settings. Fitting experimental data to this model reveals similarities as well as differences among generations of devices.

international conference on acoustics, speech, and signal processing | 2011

Maximum margin structure learning of Bayesian network classifiers

Franz Pernkop; Michael Wohlmay; Manfred Mücke

Recently, the margin criterion has been successfully used for parameter optimization in graphical models. We introduce maximum margin based structure learning for Bayesian network classifiers and demonstrate its advantages in terms of classification performance compared to traditionally used discriminative structure learning methods. In particular, we provide empirical results for generative structure learning and two discriminative structure learning approaches on handwritten digit recognition tasks. We show that maximum margin structure learning outperforms other structure learning methods. Furthermore, we present classification results achieved with different bitwidth for representing the parameters of the classifiers.

Journal of Electrical and Computer Engineering | 2013

Holistic biquadratic IIR filter design for communication systems using differential evolution

Alexander Melzer; Andreas Pedross; Manfred Mücke

Digital IIR filter implementations are important building blocks of most communication systems. The chosen number format (fixed-point, floating-point; precision) has a major impact on achievable performance and implementation cost. Typically, filter design for communication systems is based on filter specifications in the frequency domain. We consider IIR filter design as an integral part of communication systemoptimisation with implicit filter specification in the time domain (via symbol/bit error rate). We present a holistic design flow with the systems bit error rate as the main objective.We consider a discrete search space spanned by the quantised filter coefficients. Differential Evolution is used for efficient sampling of this huge finite design space. We present communication system performance (based on bit-true simulations) and both measured and estimated receiver IIR chip areas. The results show that very small number formats are acceptable for complex filters and that the choice between fixed-point and floating-point number formats is nontrivial if precision is a free parameter.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Native Double Precision LINPACK Implementation on a Hybrid Reconfigurable CPU

Thang Viet Huynh; Manfred Mücke; Wilfried N. Gansterer

Applications requiring double precision (DP) arithmetic executed on embedded CPUs without native DP support suffer from prohibitively low performance and power efficiency. Hybrid reconfigurable CPUs, allowing for reconfiguration of the instruction set at runtime, appear as a viable computing platform for applications requiring instructions not supported by existing fixed architectures. Our experiments on a Stretch S6 as prototypical platform show that limited reconfigurable resources on such architectures are sufficient for providing native support of DP arithmetic. Our design using a DP fused multiply-accumulate (FMA) extension instruction achieves a peak performance of 200~MFlop/s and a sustained performance of 22.7~MFlop/s at a clock frequency of 100~MHz. It outperforms LINPACK using software-emulated DP floating-point arithmetic on the S6 by a factor of 5.7 while achieving slightly higher numerical accuracy. In single precision, multiple floating-point operators can be implemented in parallel on the S6.

autonomic and trusted computing | 2011

Error analysis and precision estimation for floating-point dot-products using affine arithmetic

Thang Viet Huynh; Manfred Mücke

One challenging task for VLSI and reconfigurable system design is the identification of the smallest number format possible to implement a given numerical algorithm guaranteeing some final accuracy while minimising area used, execution time and power. We apply affine arithmetic, an extension to interval arithmetic, to estimate the rounding error of different floating-point dot-product variants. The validity of the estimated error bounds is demonstrated using extensive simulations. We derive the analytical models for rounding errors over a wide range of parameters and show that affine arithmetic with a probabilistic bounding operator is able to provide a tighter bound compared to conventional forward error analysis. Due to the tight bounds, minimum mantissa bit width for hardware implementation can be determined and comparison of different dot-product variants is possible. Our presented models allow for an efficient design space exploration and are key to specialised code generators.

international conference on acoustics, speech, and signal processing | 2010

Evidence-based custom-precision estimation with applications to solving nonlinear approximation problems

Dmitriy Shutin; Manfred Mücke

Reconfigurable logic (FPGA) allows to implement custom-precision arithmetic units. In this work we propose an algorithm, which employs a Bayesian technique to determine the optimal amount of bits for representing the involved continuous variables. We restrict ourselves to the problem of nonlinear approximation, where an assumed data model consists of superimposed signals with unknown parameters. By fitting such models using a variational Bayesian EM-based algorithm, we can determine the importance of each signal component using a techniques inspired by the Bayesian evidence procedure. Due to the structure of the obtained variational update expressions, it becomes possible to show that the evidence value represents the combined effect of the relevance of a signal component for explaining the measurement data, and additive noise, associated with this component. This insight allows to interpret the value of the evidence parameters in terms of a Signal-to-Noise ratio, which is then used to develop an optimal discretization scheme. The effectiveness of the proposed approach is demonstrated with two synthetic examples, showing a bitwidth reduction of more than 70% at the cost of a relative mean squared error of 0.0036 and 0.012, respectively.

european conference on parallel processing | 2017

Linking Application Description with Efficient SIMD Code Generation for Low-Precision Signed-Integer GEMM

Günther Schindler; Manfred Mücke; Holger Fröning

The need to implement demanding numerical algorithms within a constrained power budget has led to a renewed interest in low-precision number formats. Exploration of the degrees of freedom provided both by better support for low-precision number formats on computer architectures and by the respective application domain remains a most demanding task, though.

international conference on conceptual structures | 2012

Evaluation of the Stretch S6 Hybrid Reconfigurable Embedded CPU Architecture for Power-Efficient Scientific Computing

Thang Viet Huynh; Manfred Mücke; Wilfried N. Gansterer

Abstract Embedded CPUs typically use much less power than desktop or server CPUs but provide limited or no support for floating-point arithmetic. Hybrid reconfigurable CPUs combine fixed and reconfigurable computing fabrics to balance better execution performance and power consumption. We show how a Stretch S6 hybrid reconfigurable CPU (S6) can be extended to natively support double precision floating-point arithmetic. For lower precision number formats, multiple parallel arithmetic units can be implemented. We evaluate if the superlinear performance improvement of floating-point multiplication on reconfigurable fabrics can be exploited in the framework of a hybrid reconfigurable CPU. We provide an in-depth investigation of data paths to and from the S6 reconfigurable fabric and present peak and sustained throughput as a function of wide registers used and total operand size. We demonstrate the effect of the given interface when using a floating-point fused multiply-accumulate (FMA) SIMD unit to accelerate the LINPACK benchmark. We identify a mismatch between the size of the S6s reconfigurable fabric and the available interface bandwidth as the major bottleneck limiting performance which makes it a poor choice for scientific workloads relying on native support for floating-point arithmetic.

Explore More