Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Milton Samirakshma Bepari is active.

Publication


Featured researches published by Milton Samirakshma Bepari.


international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013

Blind source separation: A review and analysis

Madhab Pal; Rajib Roy; Joyanta Basu; Milton Samirakshma Bepari

Blind Source Separation (BSS) refers to a problem where both the sources and the mixing methodology are unknown, only mixture signals are available for further separation process. In several situations it is desirable to recover all individual sources from the mixed signal, or at least to segregate a particular source. In laboratory condition, most of the algorithms works very fine where input signals, no. of source present in the mixture, mixing methodology etc are well known to the separation process. But in real-life scenario the problem is much more complicated and it begins with the input signal, a mixture where most of the parameters are unknown. This paper will try to summarize those approaches taken previously to solve this problem and an experiment of source separation which will mix using Independent Component Analysis (ICA) and then de-mix those source signals using ICA as the basic/prime approach.


Archive | 2013

Real Time Challenges to Handle the Telephonic Speech Recognition System

Joyanta Basu; Milton Samirakshma Bepari; Rajib Roy; Soma Khan

Present paper describes the real time challenges to design the telephonic Automatic Speech Recognition (ASR) System. Telephonic speech data are collected automatically from all geographical regions of West Bengal to cover major dialectal variations of Bangla spoken language. All incoming calls are handled by Asterisk Server i.e. Computer telephony interface (CTI). The system asks some queries and users’ spoken responses are stored and transcribed manually for ASR system training. At the time of application of telephonic ASR, users’ voice queries are passed through the Signal Analysis and Decision (SAD) Module and after getting its decision speech signal may enter into the back-end Automatic Speech Recognition (ASR) Engine and relevant information is automatically delivered to the user. In real time scenario, the telephonic speech contains channel drop, silence or no speech event, truncated speech signal, noisy signal etc. along with the desired speech event. This paper deals with some techniques which will handle such unwanted signals in case of telephonic speech to certain extent and able to provide almost desired speech signal for the ASR system. Real time telephonic ASR system performance is increased by 8.91 % after implementing SAD module.


PerMIn'12 Proceedings of the First Indo-Japan conference on Perception and Machine Intelligence | 2012

Performance evaluation of PBDP based real-time speaker identification system with normal MFCC vs MFCC of LP residual features

Soma Khan; Joyanta Basu; Milton Samirakshma Bepari

Present study compares, Mel Frequency Cepstral Coefficients (MFCC) of Linear Predictive (LP) Residuals with normal MFCC features using both VQ and GMM based speaker modeling approaches for performance evaluation of real- time Automatic Speaker Identification systems including both co-operative and non co-operative speaking scenario. Pitch Based Dynamic Pruning (PBDP) technique is applied regarding optimization of Speaker Identification process. System is trained and tested with voice samples of 62 speakers across different age groups. Residual of a signal contains information mostly about the source, which is speaker specific. Result shows that, in co-operative speaking, MFCC of LP residuals outperform normal MFCC features for both VQ and GMM based speaker modeling with an improvement of 7.6% and 6.8% in average accuracy respectively. But combined modeling of both features (source and vocal tract) is required for non co-operative speaking in real-time as it enhances the highest identification accuracy from 67.7% to 83%.


2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) | 2014

Bengali Basic Travel Expression Corpus: A statistical analysis

Soma Khan; Joyanta Basu; Tulika Basu; Milton Samirakshma Bepari; Madhab Pal; Rajib Roy

The Japanese-English aligned Basic Travel Expression Corpus (BTEC) has been used as a basic dataset for development of real-world Speech-to-Speech Translation (S2ST) systems in related prior studies. This paper presents a detailed statistical analysis on the Bengali translated BTEC text and its phonetic transcriptions for development of English-Bengali speech translation applications in travel domain. In different level of analysis hierarchy, the study focuses on the lexical and phonetical status of the analyzed corpus based on frequency spectrums, estimated population size, coverage ratio, goodness of fit of Large Number of Rare Events (LNRE) model and transition patterns. The experimental observations provide necessary insights on sufficiency of the analyzed corpus with respect to the travel domain as well as for building basic components of English-Bengali S2ST system.


2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) | 2016

An overview of speaker diarization: Approaches, resources and challenges

Joyanta Basu; Soma Khan; Rajib Roy; Madhab Pal; Tulika Basu; Milton Samirakshma Bepari; Tapan Kumar Basu

Speaker Diarization task consists of inferring “who spoke when” in an audio stream without any prior knoniedge. It is an important task in audio processing and retrieval A concise overview of speaker diarization problem and available solutions are presented in this paper. Efforts have been given to summarize different approaches and practices to speaker diarization highlighting existing resources like toolkits and standard datasets, evaluation matrices, main application areas, and associated challenges. The study can serve as the basic material to provide necessary preliminary idea on the topic Though most of the related research iniiatives have been reported in several prior studies, this study will be useful to focus on the real-time application development perspective.


international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013

SATT: Semi-automatic transcription tool

Joyanta Basu; Milton Samirakshma Bepari; Sushmita Nandi; Soma Khan; Rajib Roy

After collection and proper storage of speech data, transcription is the next essential task for speech researchers and engineers. Its also very well known to the speech community that speech data transcription is very laborious and time consuming job, though its the most important part of building Automatic Speech Recognition (ASR) systems. At this juncture, an automated transcription tool is necessary to resolve this problem. This paper presents a Semi-Automatic speech data Transcription Tool (SATT) developed mainly for resource creation purpose of ASR system building. The tool facilitates the manual checking and verification process of speech data transcription. Speech recognition engine is also included as a part of this tool. Detail functionalities with transcription remarks and working steps of the tool are elaborately discussed in this paper. SATT will speed up the transcription process, minimize manual errors, efforts and will create wealthy resources for statistical ASR work in Indian languages.


intelligent human computer interaction | 2012

Pitch based selection of optimal search space at runtime: Speaker recognition perspective

Soma Khan; Joyanta Basu; Milton Samirakshma Bepari; Rajib Roy

Large scale speaker recognition (SR) applications demand efficient design strategy with smart optimization technique to enhance the real-time usability. Runtime selection of optimal search space can reduce the computational cost involved in this respect. This paper describes a multilayer design layout with a novel Pitch Based Dynamic Pruning (PBDP) algorithm to optimize VQ and GMM based close-set SR process. The process involves runtime selection of most likely speakers based on percentage of cumulative pitch occurrence frequencies within certain pitch ranges selected from the test utterance followed by a spectral matching using MFCC features within the reduced search space. Experiments on YOHO and NIST2008 corpus reveal that nearly 40% of the total identification time is being saved with slight (below 0.5%) increase or even decrease in average error rate. Proposed pruning method can also be applicable for selection of most likely flexible background in unconstrained cohort normalization task of verification problem.


international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013

Evaluation and error recovery methods of an IVR based real time speech recognition application

Soma Khan; Joyanta Basu; Milton Samirakshma Bepari; Rajib Roy


workshop spoken language technologies for under resourced languages | 2018

Assessing Performance of Bengali Speech Recognizers Under Real World Conditions using GMM-HMM and DNN based Methods

Soma Khan; Madhab Pal; Joyanta Basu; Milton Samirakshma Bepari; Rajib Roy


workshop spoken language technologies for under resourced languages | 2018

Designing an IVR Based Framework for Telephony Speech Data Collection and Transcription in Under-Resourced Languages

Joyanta Basu; Soma Khan; Milton Samirakshma Bepari; Rajib Roy; Madhab Pal; Sushmita Nandi

Collaboration


Dive into the Milton Samirakshma Bepari's collaboration.

Top Co-Authors

Avatar

Joyanta Basu

Centre for Development of Advanced Computing

View shared research outputs
Top Co-Authors

Avatar

Rajib Roy

Centre for Development of Advanced Computing

View shared research outputs
Top Co-Authors

Avatar

Soma Khan

Centre for Development of Advanced Computing

View shared research outputs
Top Co-Authors

Avatar

Madhab Pal

Centre for Development of Advanced Computing

View shared research outputs
Top Co-Authors

Avatar

Sushmita Nandi

Centre for Development of Advanced Computing

View shared research outputs
Top Co-Authors

Avatar

Tulika Basu

Centre for Development of Advanced Computing

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Swanirbhar Majumder

North Eastern Regional Institute of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge