Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Somnath Banerjee is active.

Publication


Featured researches published by Somnath Banerjee.


text speech and dialogue | 2014

Bengali Named Entity Recognition Using Margin Infused Relaxed Algorithm

Somnath Banerjee; Sudip Kumar Naskar; Sivaji Bandyopadhyay

The present work describes the automatic recognition of named entities based on language independent and dependent features. Margin Infused Relaxed Algorithm is applied for the first time in order to learn named entities for Bengali language. We used openly available annotated corpora with twelve different tagset defined in IJCNLP-08 NERSSEAL shared task and obtained 91.23%, 87.29% and 89.69% precision, recall and F-measure respectively. The proposed work outperforms the existing models with satisfactory margin.


International Workshop of the Initiative for the Evaluation of XML Retrieval | 2011

A Hybrid QA System with Focused IR and Automatic Summarization for INEX 2011

Pinaki Bhaskar; Somnath Banerjee; Snehasis Neogi; Sivaji Bandyopadhyay

The article presents the experiments carried out as part of the participation in the QA track of INEX 2011. We have submitted two runs. The INEX QA task has two main sub tasks, Focused IR and Automatic Summarization. In the Focused IR system, we first preprocess the Wikipedia documents and then index them using Nutch. Stop words are removed from each query tweet and all the remaining tweet words are stemmed using Porter stemmer. The stemmed tweet words form the query for retrieving the most relevant document using the index. The automatic summarization system takes as input the query tweet along with the tweet’s text and the title from the most relevant text document. Most relevant sentences are retrieved from the associated document based on the TF-IDF of the matching query tweet, tweet’s text and title words. Each retrieved sentence is assigned a ranking score in the Automatic Summarization system. The answer passage includes the top ranked retrieved sentences with a limit of 500 words. The two unique runs differ in the way in which the relevant sentences are retrieved from the associated document. Our first run got the highest score of 432.2 in Relaxed metric of Readability evaluation among all the participants.


Proceedings of the 7th Workshop on Asian Language Resources | 2009

Bengali Verb Subcategorization Frame Acquisition - A Baseline Model

Somnath Banerjee; Dipankar Das; Sivaji Bandyopadhyay

Acquisition of verb subcategorization frames is important as verbs generally take different types of relevant arguments associated with each phrase in a sentence in comparison to other parts of speech categories. This paper presents the acquisition of different subcategorization frames for a Bengali verb Kara (do). It generates compound verbs in Bengali when combined with various noun phrases. The main hypothesis here is that the subcategorization frames for a Bengali verb are same with the subcategorization frames for its equivalent English verb with an identical sense tag. Syntax plays the main role in the acquisition of Bengali verb subcategorization frames. The output frames of the Bengali verbs have been compared with the frames of the equivalent English verbs identified using a Bengali-English bilingual lexicon. The flexible ordering of different phrases, additional attachment of optional phrases in Bengali sentences make this frames acquisition task challenging. This system has demonstrated precision and recall values of 77.11% and 88.23% respectively on a test set of 100 sentences.


forum for information retrieval evaluation | 2014

A Hybrid Approach for Transliterated Word-Level Language Identification: CRF with Post-Processing Heuristics

Somnath Banerjee; Alapan Kuila; Aniruddha Roy; Sudip Kumar Naskar; Paolo Rosso; Sivaji Bandyopadhyay

In this paper, we describe a hybrid approach for word-level language (WLL) identification of Bangla words written in Roman script and mixed with English words as part of our participation in the shared task on transliterated search at Forum for Information Retrieval Evaluation (FIRE) in 2014. A CRF based machine learning model and post-processing heuristics are employed for the WLL identification task. In addition to language identification, two transliteration systems were built to transliterate detected Bangla words written in Roman script into native Bangla script. The system demonstrated an overall token level language identification accuracy of 0.905. The token level Bangla and English language identification F-scores are 0.899, 0.920 respectively. The two transliteration systems achieved accuracies of 0.062 and 0.037. The word-level language identification system presented in this paper resulted in the best scores across almost all metrics among all the participating systems for the Bangla-English language pair.


forum for information retrieval evaluation | 2016

Overview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016

Somnath Banerjee; Kunal Chakma; Sudip Kumar Naskar; Amitava Das; Paolo Rosso; Sivaji Bandyopadhyay; Monojit Choudhury

The shared task on Mixed Script Information Retrieval (MSIR) was organized for the fourth year in FIRE-2016. The track had two subtasks. Subtask-1 was on question classification where questions were in code mixed Bengali-English and Bengali was written in transliterated Roman script. Subtask-2 was on ad-hoc retrieval of Hindi film song lyrics, movie reviews and astrology documents, where both the queries and documents were in Hindi either written in Devanagari script or in Roman transliterated form. A total of 33 runs were submitted by 9 participating teams, of which 20 runs were for subtask-1 by 7 teams and 13 runs for subtask-2 by 7 teams. The overview presents a comprehensive report of the subtasks, datasets and performances of the submitted runs.


ieee international conference on recent trends in information systems | 2015

Text normalization in code-mixed social media text

Sukanya Dutta; Tista Saha; Somnath Banerjee; Sudip Kumar Naskar

This paper addresses the problem of text normalization, an often overlooked problem in natural language processing, in code-mixed social media text. The objective of the work presented here is to correct English spelling errors in code-mixed social media text that contains English words as well as Romanized transliteration of words from another language, in this case Bangla. The targeted research problem also entails solving another problem, that of word-level language identification in code-mixed social media text. We employ a CRF based machine learning approach followed by post-processing heuristics for the word-level language identification task. For spelling correction, we used the noisy channel model of spelling correction. In addition, the spell checker model presented here tackles wordplay, contracted words and phonetic variations. Overall, the word-level language identification achieved 90.5% accuracy and the spell checker achieved 69.43% accuracy on the detected English words.


text speech and dialogue | 2014

BFQA: A Bengali Factoid Question Answering System

Somnath Banerjee; Sudip Kumar Naskar; Sivaji Bandyopadhyay

Question Answering (QA) research for factoid questions has recently achieved great success. Presently, QA systems developed for European, Middle Eastern and Asian languages are capable of providing answers with reasonable accuracy. However, Bengali being among the most spoken languages in the world, no factoid question answering system is available for Bengali till date. This paper describes the first attempt on building a factoid question answering system for Bengali language. The challenges in developing a question answering system for Bengali have been discussed. Extraction and ranking of relevant sentences have also been proposed. Also extraction strategy of the ranked answers from the relevant sentences are suggested for Bengali question answering system.


international conference natural language processing | 2014

The First Resource for Bengali Question Answering Research

Somnath Banerjee; Pintu Lohar; Sudip Kumar Naskar; Sivaji Bandyopadhyay

This paper reports the development of the first tagged resource for question answering research for a less computerized Indian language, namely Bengali. We developed a tagging scheme for annotating the questions based on their types. Expected answer type and question topical target are also marked to facilitate the answer search. Due to scarcity of canonical documents in the web for Bengali, we could not take the advantage of web as the resource and the major portion of the resource data was collected from authentic books. Six highly qualified annotators were involved in this rigorous work. At present, the resource contains 47 documents from three domains, namely history, geography and agriculture. Question answering based annotation was performed to prepare more than 2250 question-answer pairs. The inter-annotator agreement scores measured in non-weighted kappa statistics is satisfactory.


mexican international conference on artificial intelligence | 2016

CookingQA: A Question Answering System Based on Cooking Ontology

Riyanka Manna; Partha Pakray; Somnath Banerjee; Dipankar Das; Alexander F. Gelbukh

We present an approach to develop a Question Answering (QA) system over cooking recipes that makes use of Cooking Ontology management. QA systems are designed to satisfy the user’s specific information need whereas ontology is the conceptualization of knowledge and it exhibits the hierarchical structure. The system is an Information retrieval (IR) based system where the various tasks to be handled like question classification, answer pattern recognition, indexing, final answer generation. Our proposed QA System use Apache Lucene for document retrieval. All cooking related documents are indexed using Apache Lucene. Stop words are removed from each cooking related question and formed the query words which are identified to retrieve the most relevant document using Lucene. Relevant paragraphs are selected from the retrieved documents based on the tf-idf of the matching query words along with n-gram overlap of the paragraph with the original question. This paper also presents a way to develop an ontology model in such a way that the queries can be processed with the help of the ontology knowledge base and generate the exact answer.


CLEF (Notebook Papers/Labs/Workshop) | 2011

A Hybrid Question Answering System based on Information Retrieval and Answer Validation.

Partha Pakray; Pinaki Bhaskar; Somnath Banerjee; Bidhan Chandra Pal; Sivaji Bandyopadhyay; Alexander F. Gelbukh

Collaboration


Dive into the Somnath Banerjee's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Paolo Rosso

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexander F. Gelbukh

Instituto Politécnico Nacional

View shared research outputs
Top Co-Authors

Avatar

Partha Pakray

National Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kunal Chakma

National Institute of Technology Agartala

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge