Muhammad Ali Tahir
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Muhammad Ali Tahir.
international conference on acoustics, speech, and signal processing | 2015
Zoltán Tüske; Muhammad Ali Tahir; Ralf Schlüter; Hermann Ney
In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By exploiting its equivalence with the log-linear mixture model (LMM), GMM can be transformed to a large softmax layer followed by a summation pooling layer. Theoretical and experimental results indicate that the jointly trained and optimally chosen GMM and bottleneck tandem features cannot perform worse than a hybrid model. Thus, the question “hybrid vs. tandem” simplifies to optimizing the output layer of a neural network. Speech recognition experiments are carried out on a broadcast news and conversations task using up to 12 feed-forward hidden layers with sigmoid and rectified linear unit activation functions. The evaluation of the LMM layer shows recognition gains over the classic softmax output.
international conference on image and graphics | 2004
Rehan Hafiz; Muhammad Ali Tahir; O. Arshad; Shoab A. Khan
Aerial video imagery is widely used in mapping, surveillance and monitoring applications. An aerial video can give sufficient information, but it does not offer the freedom and flexibility of working with a geo-registered image or map. Our paper provides a cost effective, robust and efficient solution for real time video registration. We propose a fast compact FPGA processing module. The FPGA module implements the computationally intensive routines of the algorithm. Three level pipelining is incorporated to enhance the clock speed to 30 MHz. We also present two applications of our proposed hardware using the core registration algorithm. The first is an air-to-ground target tracker. The second application is a jerky hand-held video stabilizer application. It can be used for improved targeting from telescope-mounted weapons e.g. sniper, while firing from a moving vehicle. C++ implementations of these algorithms have been tested to demonstrate encouraging results.
ieee automatic speech recognition and understanding workshop | 2011
Muhammad Ali Tahir; Ralf Schlüter; Hermann Ney
This paper presents a method to incorporate mixture density splitting into the acoustic model discriminative log-linear training. The standard method is to obtain a high resolution model by maximum likelihood training and density splitting, and then further training this model discriminatively. For a single Gaussian density per state the log-linear MMI optimization is a global maximum problem, and by further splitting and discriminative training of this model we can get a higher complexity model. The mixture training is not a global maximum problem, nevertheless experimentally we achieve large gains in the objective function and corresponding moderate gains in the word error rate on a large vocabulary corpus
2009 IEEE Workshop on Automatic Speech Recognition&Understanding | 2009
Muhammad Ali Tahir; Georg Heigold; Christian Plahl; Ralf Schlüter; Hermann Ney
In the past several decades, classifier-independent front-end feature extraction, where the derivation of acoustic features is lightly associated with the back-end model training or classification, has been prominently used in various pattern recognition tasks, including automatic speech recognition (ASR). In this paper, we present a novel discriminative feature transformation, named generalized likelihood ratio discriminant analysis (GLRDA), on the basis of the likelihood ratio test (LRT). It attempts to seek a lower dimensional feature subspace by making the most confusing situation, described by the null hypothesis, as unlikely to happen as possible without the homoscedastic assumption on class distributions. We also show that the classical linear discriminant analysis (LDA) and its well-known extension - heteroscedastic linear discriminant analysis (HLDA) can be regarded as two special cases of our proposed method. The empirical class confusion information can be further incorporated into GLRDA for better recognition performance. Experimental results demonstrate that GLRDA and its variant can yield moderate performance improvements over HLDA and LDA for the large vocabulary continuous speech recognition (LVCSR) task.
international conference on acoustics, speech, and signal processing | 2015
Muhammad Ali Tahir; Simon Wiesler; Ralf Schlüter; Hermann Ney
A Gaussian or log-linear mixture model trained by maximum likelihood may be trained further using discriminative training. It is desirable that the mixture splitting is also done during the discriminative training, to achieve better mixture density distribution. In previous work such a discriminative splitting approach was presented. Similarly, the resolution of a deep neural network may also be increased by splitting. In this paper, discriminative splitting is applied as a way of initializing a linear bottleneck between two layers of a DNN. Experiments for a single hidden layer and six hidden layer cases show the potential of this approach as an alternative method of pre-training for linear bottlenecks for MLP hidden layers.
international conference on networking | 2004
Muhammad Ali Tahir; A. Munawar; I.A. Taj
The paper focuses on a microprocessor implementation of the Hamming distance for binary correlation. It uses the fact that the binary correlation result can be derived from binary convolution (i.e., modeled with AND gates instead of XOR); as a result, convolution of multiple bits with multiple bits can be computed by a single multiplication instruction. This follows from a general proof for base-n convolution that is presented. Furthermore, using a hierarchical shift-addition approach, we can also reduce the number of additions in the subsequent step. The paper also shows that this approach can also be used in the frequency domain where an N/spl times/N point binary circular convolution can be modeled using an N/spl times/M double precision FFT, where M is a sub-multiple of N depending on the kernel size. Comparison of the time/frequency approaches is presented for different kernel/image sizes, with the help of benchmarking results.
Archive | 2009
Muhammad Ali Tahir; Georg Heigold; Christian Plahl; Ralf Schl; Hermann Ney
conference of the international speech communication association | 2011
Muhammad Ali Tahir; Ralf Schlüter; Hermann Ney
conference of the international speech communication association | 2015
M. Ali Basha Shaik; Zoltán Tüske; Muhammad Ali Tahir; Markus Nußbaum-Thom; Ralf Schlüter; Hermann Ney
conference of the international speech communication association | 2014
M. Ali Basha Shaik; Zoltán Tüske; Muhammad Ali Tahir; Markus Nußbaum-Thom; Ralf Schlüter; Hermann Ney