Bipul Syam Purkayastha
Assam University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bipul Syam Purkayastha.
International Journal of Computer Applications | 2012
Kh Raju Singha; Bipul Syam Purkayastha; Kh Dhiren Singha
The process of assigning morpho-syntactic categories of each morpheme including punctuation marks in a given text document according to the context is called Part of Speech (POS) tagging. In this paper we represent the rule-based Part of Speech Tagger of Manipuri by applying a set of hand written linguistic rules of Manipuri language. Nevertheless, it is very difficult to classify the lexical categories of Manipuri, an agglutinating Tibeto-Burman language of Northeast India. So, in this tagger we are using the affix stripping technique to segment the affixes from the root. As Manipuri has limited POS tagged corpus, the tagged output of this tagger will be very helpful to analyze Manipuri Part of speech by using many statistical models. General Terms Part of Speech Tagging, Manipuri Language, Natural Language Processing, Algorithms, Morpho-syntactic categories.
International Journal of Computer Applications | 2014
Amrita Bhattacharjee; Bipul Syam Purkayastha
Parikh matrix is a numerical property of a word on an ordered alphabet. It is used for studying word in terms of its sub words. It was introduced by Mateescu et al. in 2000. Since then it has been being studied for various ordered alphabets. In this paper Parikh Matrices over tertiary alphabet are investigated. Algorithm is developed to display Parikh Matrices of words over tertiary alphabet. This algorithm proves a good tool for further investigation of Parikh Matrices of words over tertiary alphabet. A set of equations for finding tertiary words from the respective Parikh matrix is introduced. These equations are useful to find tertiary words from the respective Parikh matrix. Examples are given. Some examples of larger tertiary words are given with their Parikh matrices as result analysis. A distance is defined on classes of Mambiguous words over tertiary ordered alphabet. It is named as stepping distance. One can compare words by this stepping distance.
Archive | 2014
Amrita Bhattacharjee; Bipul Syam Purkayastha
In this paper ratio property of words are investigated. Concept of ratio property and weak ratio property are extended for nth order alphabet. A relationship of ratio property with M-ambiguity is established. Various lemmas already proved about ratio property over ternary alphabet are investigated for tertiary alphabets. M-ambiguous words are formed by concatenating words satisfying ratio property.
2015 International Symposium on Advanced Computing and Communication (ISACC) | 2015
Abhijit Paul; Bipul Syam Purkayastha; Sunita Sarkar
Natural Language Processing (NLP) is mainly concerned with the development of computational models and tools of aspects of human (natural) language processing. Part of Speech Tagging (POS) is well studied topic and also one of the most fundamental preprocessing steps for any language in NLP. Natural language processing of Nepali is still lack significant research efforts in the area of NLP in India. POS tagging of Nepali is a necessary component for most NLP applications in Nepali, which analyses the construction of the language, behavior of the language and can be used to develop automated tools for language processing. From the literature survey and related works, it has been found that, not much work has been done previously on POS tagging for Nepali language in India due to lack of comprehensive set of tagged corpus or correct hand written rules. In this paper, Hidden Markov Model (HMM) based Part of Speech (POS) tagging for Nepali language has been discussed. HMM is the most popular used statistical model for POS tagging that uses little amount of knowledge about the language, apart from contextual information of the language. The evaluation of the tagger has been done using the corpora, which are collected from TDIL (Technology Development for Indian Languages) and the BIS tagset of 42 tags. Tagset has been designed to meet the morph-syntactic requirements of the Nepali language. Apart from corpora and the tagset, python programming language and the NLTKs (Natural Language Toolkit) library has been used for implementation. The tagger achieves accuracy over 96% for known words but for unknown words, the research is still continuing.
Archive | 2018
Md. Saiful Islam; Bipul Syam Purkayastha
In spite of its inclusion in the scheduled languages of India and being one of the official languages of Assam, significant research and work is yet to be reported for machine translation of Bodo. The primary objective of the proposed system in the paper, is to develop an English to Bodo Phrase-Based Statistical Machine Translation (SMT) system using Moses, and Tourism domain English to Bodo parallel corpora. The performance of the proposed system using the BLEU score is 65.09.
Archive | 2015
Amrita Bhattacharjee; Bipul Syam Purkayastha
In this paper, Parikh matrices over ternary alphabet are investigated. Algorithm is developed to display Parikh matrices of words over ternary alphabet. A set of equations for finding ternary words from the respective Parikh matrix is discussed. A theorem regarding the relations of the entries of the 4 × 4 Parikh matrices is proved. Some other results in this regard are also discussed. Significance of graphical representation of binary amiable words is given. Extension of this notion for ternary amiable words is introduced.
2015 International Symposium on Advanced Computing and Communication (ISACC) | 2015
S. Poireiton Meitei; Bipul Syam Purkayastha; H. Mamata Devi
Stemming is the process of removing the affixes from inflected words, without doing complete morphological analysis. A stemming Algorithm reduces all the Inflected words with the same stem to a common form. It is useful in many areas of computational linguistics and information-retrieval work. This technique is used by the various search engines to find the best solution for a problem. The algorithm is a basic building block for the stemmer. Stemmer is basically used in information retrieval system to improve the performance. The paper present a stemmer for Manipuri, which uses a brute force algorithm. We also use a suffix stripping technique in our stemmer. This stemmer can be use as an important tool in information retrieval system for Manipuri language.
international conference on telecommunications | 2010
Subrata Sinha; Smriti Kumar Sinha; Bipul Syam Purkayastha
The issue of synchronization of authorization flow with work object flow in a document production workflow environment is presented and discussed in this paper. We have shown how a work object flow is synchronized with the authorization flow using a central arbiter in Web service paradigms. The co-ordination of Web services is done using WS-BPEL which supports orchestration and XACML provides authorization for Web services. The synchronization is achieved by exploiting the obligation provisions in XACML.
Archive | 2018
Abhijit Paul; Bipul Syam Purkayastha
Machine Translation (MT), perhaps the earliest NLP applications, is the method of translating one human language sentence into another, using computer or any kind of machine. The aim of this research paper is to develop an MT system for Nepali language which can translate an English sentence to its most probable Nepali sentence using Statistical Machine Translation (SMT) approach. The system is implemented using three different tools like MOSES for decoding, GIZA++ for generating translation model and IRSTLM for estimating target model probability. Also for training the system, English-Nepali parallel corpus is used and for testing, English raw corpus is used. Both these two corpora are collected from TDIL (Technology Development for Indian Languages). The system has been manually evaluated using two parameters viz. fluency and adequacy and it gives an average accuracy of 2.7 out of 4 (level no), i.e., approximately 68%. Though the implemented system achieves an accuracy of 68% but for OoV (Out of Vocabulary) words the research still continuing. A small comparison has also been made with exiting English-Nepali MT system.
international conference on computing, communication and automation | 2015
Sunita Sarkar; Abhijit Paul; Arindam Roy; Bipul Syam Purkayastha
A key resource that aids in several NLP tasks is WordNet. Wordnet is used as the sense inventory for sense tagging of corpus. Sense tagging is the task of tagging each word in the sentence with the correct sense of the word in the given context. Sense tagging activity helps in validation of WordNet and improvement of Wordnet quality. Sense tagging is one of the toughest annotation works and this paper discusses about the Sense Tagging tool, procedures involved in sense tagging the Nepali corpus and the challenges involved in sense tagging. Nepali WordNet is used as the sense inventory for sense tagging of Nepali corpus. For accurately sense tagging voluminous data, a standard and definitive lexicon is required. In this work the corpus in Nepali language is taken from newspaper domain.