Diganta Saha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Diganta Saha is active.

Explore More

Publication

Featured researches published by Diganta Saha.

arXiv: Computation and Language | 2015

WORD SENSE DISAMBIGUATION: A SURVEY

Alok Ranjan Pal; Diganta Saha

In this paper, we made a survey on Word Sense Disambiguation (WSD). Near about in all major languages around the world, research in WSD has been conducted upto different extents. In this paper, we have gone through a survey regarding the different approaches adopted in different research works, the State of the Art in the performance in this domain, recent works in different Indian languages and finally a survey in Bengali language. We have made a survey on different competitions in this field and the bench mark results, obtained from those competitions.

international conference on computational linguistics | 2005

Unsupervised text classification using kohonen's self organizing network

Nirmalya Chowdhury; Diganta Saha

A text classification method using Kohonens Self Organizing Network is presented here. The proposed method can classify a set of text documents into a number of classes depending on their contents where the number of such classes is not known a priori. Text documents from various faculties of games are considered for experimentation. The method is found to provide satisfactory results for large size of data.

international conference on emerging applications of information technology | 2011

Web Text Classification Using a Neural Network

Diganta Saha

A web text classification method using a neural network is presented here. The proposed method can classify a set of English text documents into a number of given classes depending on their contents where the number of such classes is not known a priori. Text documents, internet edition of news paper, from various faculties of games and sports are considered for experimentation. The method is found to provide satisfactory results for three sets of data considered for experimentation.

ieee international conference on recent trends in information systems | 2015

Word sense disambiguation in Bengali: A lemmatized system increases the accuracy of the result

Alok Ranjan Pal; Diganta Saha; Sudip Kumar Naskar; Niladri Sekhar Dash

In the proposed approach, an attempt was made to disambiguate Bengali ambiguous words using Naïve Bayes Classification algorithm. The whole task was divided into two modules. Each module executes a specific task. In the first module, the algorithm was applied on a regular text, collected from the Bengali text corpus developed in the TDIL project of the Govt. of India and the accuracy of disambiguation process was obtained around 80%. In the second module, the whole training data and the test data were lemmatized and applying the same algorithm, around 85% accurate result was obtained. The output was verified with a previously tagged output file, generated with the help of a Bengali lexical dictionary. The implicational relevance of this study was attested in automatic text classification, machine learning, information extraction, and word sense disambiguation.

International Journal of Artificial Intelligence & Applications | 2013

Detection of Slang Words in e-Data using semi- Supervised Learning

Alok Ranjan Pal; Diganta Saha

The proposed algorithmic approach deals with finding the sense of a word in an electronic data. Now a day, in different communication mediums like internet, mobile services etc. people use few words, which are slang in nature. This approach detects those abusive words using supervised learning procedure. But in the real life scenario, the slang words are not used in complete word forms always. Most of the times, those words are used in different abbreviated forms like sounds alike forms, taboo morphemes etc. This proposed approach can detect those abbreviated forms also using semi supervised learning procedure. Using the synset and concept analysis of the text, the probability of a suspicious word to be a slang word is also evaluated.

Journal of Experimental and Theoretical Artificial Intelligence | 2018

Preorder-based triangle: a modified version of bilattice-based triangle for belief revision in nonmonotonic reasoning

Kumar S. Ray; Sandip Paul; Diganta Saha

Bilattice-based triangle provides an elegant algebraic structure for reasoning with vague and uncertain information. But the truth and knowledge ordering of intervals in bilattice-based triangle cannot handle repetitive belief revisions which is an essential characteristic of nonmonotonic reasoning. Moreover, the ordering induced over the intervals by the bilattice-based triangle is not sometimes intuitive. In this work, we construct an alternative algebraic structure, namely preorder-based triangle and we formulate proper logical connectives for this. It is also demonstrated that Preorder-based triangle serves to be a better alternative to the bilattice-based triangle for reasoning in application areas, that involve nonmonotonic fuzzy reasoning with uncertain information.

ieee region humanitarian technology conference | 2017

Improvement of electronic governance and mobile governance in multilingual countries with digital etymology using sanskrit grammar

Arijit Das; Diganta Saha

With huge improvement of digital connectivity (Wifi, 3G, 4G) and digital devices access to internet has reached in the remotest corners now a days. Rural people can easily access web or apps from PDAs, laptops, smartphones etc. This is an opportunity of the Government to reach to the citizen in large number, get their feedback, associate them in policy decision with egovernance without deploying huge man, material or resourses. But the Government of multilingual countries face a lot of problem in successful implementation of Government to Citizen (G2C) and Citizen to Government (C2G) governance as the rural people tend and prefer to interact in their native languages. Presenting equal experience over web or app to different language group of speakers is a real challenge. In this research we have sorted out the problems faced by Indo Aryan speaking netizens which is in general also applicable to any language family groups or subgroups. Then we have tried to give probable solutions using Etymology. Etymology is used to correlate the words using their ROOT forms. In 5th century BC Panini wrote Astadhyayi where he depicted sutras or rules-how a word is changed according to person, tense, gender, number etc. Later this book was followed in Western countries also to derive their grammar of comparatively new languages. We have trained our system for automatic root extraction from the surface level or morphed form of words using Panian Gramatical rules. We have tested our system over 10000 bengali Verbs and extracted the root form with 98% accuracy. We are now working to extend the program to successfully lemmatize any words of any language and correlate them by applying those rule sets in Artificial Neural Network.

2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT) | 2017

Word sense disambiguation in Bengali: A knowledge based approach using Bengali WordNet

Alok Ranjan Pal; Diganta Saha; Sudip Kumar Naskar

In this paper, a knowledge based approach for Word Sense Disambiguation (WSD) in Bengali language has been presented. Bengali WordNet, developed at ISI Kolkata has been used as a knowledge base and the input data set is prepared from the Bengali Text Corpus developed in the TDIL (Technology Development for Indian Language) project of the Government of India. The proposed approach resolute the exact sense of a Bengali ambiguous word based on the maximum overlap among the dictionary definitions of the ambiguous word, with its collocating words in that sentence and the synonymous words of these collocating words. The algorithm is tested on 9 (nine) mostly used Bengali ambiguous words. The accuracy of the output is achieved 75% which is verified by an expert. The challenges and the pitfalls of this approach are discussed in this report in detail.

2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT) | 2017

Word sense disambiguation in Bengali: An unsupervised approach

Alok Ranjan Pal; Diganta Saha

In the proposed approach, Word Sense Disambiguation (WSD) in Bengali language has been done using unsupervised methodology. This work is consisted of sequential two sub-tasks. First one is grouping of Bengali sentences into a certain number of clusters where a particular cluster contains the sentences of similar meaning and second one is labeling the clusters with its inner meanings with the help of a linguistic expert as these sense tagged clusters could be used as a knowledge reference for WSD task. In this work, clustering has been performed using weka-3-6-13 tool. The test sentences are collected from the Bengali text corpus developed in the TDIL (Technology Development for Indian Language) project of the Govt. of India. In this work, Type-based and Token-based distributional approaches have been developed for Bengali sentence clustering. In Type-based method, a feature vector of co-occurring words of a target word in a sentence has been considered and in Token-based method, synsets of the collocating words are also considered. The synsets of the collocating words are retrieved from the Bengali WordNet, developed at ISI, Kolkata. The base line result, achieved result and the pitfalls of the procedure are discussed in the report in detail.

Archive | 2016

Word Sense Disambiguation in Bengali: An Auto-updated Learning Set Increases the Accuracy of the Result

Alok Ranjan Pal; Diganta Saha

This work is implemented using the Naive Bayes probabilistic model. The whole task is implemented in two phases. First, the algorithm was tested on a dataset from the Bengali corpus, which was developed in the TDIL (Technology Development for the Indian Languages) project of the Govt. of India. In the first execution of the algorithm, the accuracy of result was nearly 80 %. In addition to the disambiguation task, the sense evaluated sentences were inserted into the related learning sets to take part in the next executions. In the second phase, after a small manipulation over the learning sets, a new input data set was tested using the same algorithm, and in this second execution, the algorithm produced a better result, around 83 %. The results were verified with the help of a standard Bengali dictionary.

Explore More