Sabir Ismail
Shahjalal University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sabir Ismail.
international conference on electrical engineering and information communication technology | 2014
Sabir Ismail; M. Shahidur Rahman
In this paper, we describe a method for producing Bangla word clusters based on semantic and contextual similarity. Word clustering is important for parts of speech (POS) tagging, word sense disambiguation, text classification, recommender system, spell checker, grammar checker, knowledge discover and for many others Natural Language Processing (NLP) applications. Computerization of Bangla language processing has been started a long ago, but still it is in neophyte stage and suffers from resource scarcity. We propose an unsupervised machine learning technique to develop Bangla word clusters based on their semantic and contextual similarity using N-gram language model. According to N-gram model, a word can be predicted based on its previous and next words sequence. N-gram model is applied successfully for word clustering in English and some other languages. As word clustering in Bangla is a new dimension in Bangla language processing research, so we think this process is good way to start and our assumption is true as our result is quite decent. We produced 456 clusters using a locally available large Bangla corpus. Subjective score derived from the clusters reveal strong similarity of the words in the same cluster.
international conference on informatics electronics and vision | 2016
Tapashee Tabassum Urmi; Jasmine Jahan Jammy; Sabir Ismail
In this paper, we propose a contextual similarity based approach for identification of stems or root forms of Bangla words using N-gram language model. The core purpose of our work is to build a big corpus of Bangla stems with their corresponding inflectional forms. Identification of stem form of a word is generally called stemming and the tool which identifies the stems is called stemmer. Stemmers are important mainly in information retrieval systems, recommending systems, spell checkers, search engines and other sectors of Natural Language Processing applications. We selected N-gram model for stem detection based on the assumption that if two words which exhibit a certain percentage of similarity in spelling and have a certain percentage of contextual similarity in many sentences then these words have higher probability of originating from the same root. We implemented 6-gram model for the stem identification procedure and we gained 40.18% accuracy for our corpus.
computer and information technology | 2010
Sabir Ismail; Abu Fazal Md Shumon; Ruhul Amin
Grid computing organizes geographically distributed resources under a single platform and let the users access this combined power. In this paper we have discussed the application of distributed memory caching system in the Grid computing environment to improve its computational environment. For our experiment, we used Alchemi, a .net based Grid computing framework and Memcached, a distributed memory caching technique. We completed couple of experiments in this environment and they demonstrated two very important outcomes. One of the outcomes outlined that distributed memory caching technique can provide fail safe computation for the Grid environment. The second result represented the reduction of the total computational time of the Grid applications. Based on the results of these current experiments and also previous experiments completed in our distributed computing laboratory we have proposed a new technique for the Grid computing environment that can provide performance improvement as well as the fail safe Grid computing environment.
Archive | 2019
Shamim Ehsan; Sadia Tasnim Swarna; Sabir Ismail
Automated Parts of Speech Tagging plays a vital role in the natural language processing. For computational Bangla Language Processing, we do not have large-scale Parts of Speech tagged corpus. There are two basic approaches to implement a corpus, by written rules or automated. To implement a rule-based corpus, we need experts in Bangla linguistics and it is also time-consuming. And for the automated corpus, we need a trained corpus, which is currently not available. Crowdsourcing can be served a vital role to fulfill these two requirements. So, in this paper, we proposed a crowd source-based approach to building Bangla Parts of Speech tagged corpus. We have used a standard tag set for Bangla. Raw documents are collected from various newspapers, books, and online site. We first give some example of Parts of Speech and then provide data to people for crowdsourcing. Finally, we analyze the result of the data, and its accuracy is 95%.
2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR) | 2017
Shuvanon Razik; Evan Hossain; Sabir Ismail; Saiful Islam
This paper presents the development process of the SUST-Bangla Handwritten Numeral Database (SUST-BHND). We extracted handwritten Bengali digits from twenty-one hundred pre-designed form filled by different people. After data retrieval, cleaning, processing and error analysis we have created a database consisting of 101065 sample images. It provides a basic database for Bangla OCR and script identification research field. Finally, a deep convolutional neural network was trained by the database which led to an accuracy of around 99.4%.
international conference on informatics electronics and vision | 2016
Ahnaf Farhan Rownak; Md. Fazle Rabby; Sabir Ismail; Md. Saiful Islam
The preeminent reason for poor output in Optical Character Recognition (OCR) for Bangla text is introduced by segmentation related error. Different shape of characters, connected characters, modifiers in top and bottom, overlapped region between consecutive characters are the main obstacle for effective segmentation for Bangla printed text. In this paper an efficient strategy is introduced to segment characters consisting overlapped region with other characters. The proposed strategy of our research have achieved 99.8% accuracy rate in line segmentation, 99.5% accuracy in word segmentation and 99% accuracy for character segmentation. The error introduced when two consecutive characters have multiple touching points.
international conference on electrical information and communication technologies | 2015
Prapti Das; Rishmita Tasmim; Sabir Ismail
Every writer has a different style of writing of their own. By analyzing various kinds of features we can identify and specify some characteristics in a writers writing which is known as stylogenetics. In this paper we gathered Bangla blogs written by four different Bangladeshi writers. Using machine learning methods we tried to identify special Stylometry features in their writing style. We analyzed various features in their writings, for example, percentage of unique words, word length, sentence length, and frequency of some parts of speech, number of suffix, frequency of first word, second word, second last word and last word of a sentence, counting average number of question marks per document, frequency of word by its position in a sentence etc. We gathered statistical data from analyzing those features and tried to find the variance among these writers using the statistical data.
computer and information technology | 2014
Sabir Ismail; M. Shahidur Rahman; Abdullah Al Mumin
computer and information technology | 2017
M. Tahmid Hossain; Md. Moshiur Rahman; Sabir Ismail; Saiful Islam
computer and information technology | 2017
Urmee Pal; Ayesha Siddika Nipu; Sabir Ismail