Is this you? Create Your Porfile

Soma Paul

International Institute of Information Technology, Hyderabad

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Soma Paul is active.

Explore More

Publication

Featured researches published by Soma Paul.

international conference on computer engineering and technology | 2010

Automatic extraction and incorporation of purpose data into PurposeNet

P. Kiran Mayee; Rajeev Sangal; Soma Paul

PurposeNet is a knowledge base of objects and actions in which the knowledge is organized around purpose. Such knowledge also connects with language — namely, verbs for related actions. It can be used with an embedded reasoner, resulting in an effective system for QA, topic-listing, summarization and other tasks. However, extracting PurposeNet related data manually is time-consuming, labor-intensive, and expensive. This paper describes a framework for automatic purpose data extraction, given a corpus. It identifies a set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of purpose data. It also deals with the subsequent automatic incorporation of this data into the PurposeNet resource. The results are used to augment and critique the structure of a large hand-built resource. The cases where purpose data is incomplete has also been analyzed. The extent of success, in terms of richness of the resource, achieved in the process is also discussed.

international conference on computational linguistics | 2014

A hybrid approach for automatic clause boundary identification in Hindi

Rahul Sharma; Soma Paul

A complex sentence, divided into clauses, can be analyzed more easily than the complex sentence itself. We present here, the task of clauses identification in Hindi text. To the best of our knowledge, not much work has been done on clause boundary identification for Hindi, which makes this task more important. We have built a Hybrid system which gives 90.804% F1-scores and 94.697% F1-scores for identification of clauses’ start and end respectively.

international conference on computational linguistics | 2014

Constituency Parsing of Complex Noun Sequences in Hindi

Arpita Batra; Soma Paul; Amba Kulkarni

A complex noun sequence is one in which a head noun is recursively modified by one or more bare nouns and/or genitives Constituency analysis of complex noun sequence is a prerequisite for finding dependency relation semantic relation between components of the sequence. Identification of dependency relation is useful for various applications such as question answering, information extraction, textual entailment, paraphrasing. In Hindi, syntactic agreement rules can handle to a large extent the parsing of recursive genitives Sharma, 2012[12].This paper implements frequency based corpus driven approaches for parsing recursive genitive structures that syntactic rules cannot handle as well as recursive compound nouns and combination of gentive and compound noun sequences. Using syntactic rules and dependency global algorithm, an accuracy of 92.85% is obtained.

world congress on information and communication technologies | 2011

Action semantics in PurposeNet

P. Kiran Mayee; Rajeev Sangal; Soma Paul

PurposeNet is a semantic network of artifacts and actions related to artifacts. Since man tends to classify objects and artifacts around him in terms of some primary purpose attributed to them, purpose is taken as an organizing principle for the knowledge base. Actions play a pivotal role in describing artifacts. All processes that describe an artifact, namely, its creation, utilization, maintenance, repair and so on are describable through actions. Therefore, a schema of ‘action semantics’ has been made a constituent part of PurposeNet. This paper presents the architecture of action ontology and significance of the design. Action ontology has three major components: a) semantic roles which describe the action; b) ‘precondition’, a set of criteria to be fulfilled for any action to take place and ‘post-condition’, effect of the action; c) sub-actions, a set of actions which when performed in some order - sequential or parallel lead to the action being executed.

international conference on computational linguistics | 2012

Integration of a noun compound translator tool with moses for english-hindi machine translation and evaluation

Prashant Mathur; Soma Paul

Noun Compounds are a frequently occurring multiword expression in English written texts. English noun compounds are translated into varied syntactic constructs in Hindi. The performance of existing translation system makes the point clear that there exists no satisfactorily efficient Noun Compound translation tool from English to Hindi although the need of one is unprecedented in the context of machine translation. In this paper we integrate Noun Compound Translator [13], a statistical tool for Noun Compound translation, with the state-of-the-art machine translation tool, Moses [10]. We evaluate the integrated system on test data of 300 source language sentences which contain Noun Compounds and are translated manually into Hindi. A gain of 29% on BLEU score and 27% on Human evaluation has been observed on the test data.

international conference on asian language processing | 2011

Issues with the Unergative/Unaccusative Classification of the Intransitive Verbs

Nitesh Surtani; Khushboo Jha; Soma Paul

The paper abandons a strict two-way sub-classification of intransitive verbs into unaccuasative and unergative for Hindi and proposes a distribution plotting of the same in a diffusion chart. The diagnostics tests that Bhatt (2003) applied on Hindi data are ranked for their efficiency of attributing correct sub-class to verbs. The diffusion chart shows that a tripartite classification handles the issue of classification of intransitive verbs in a better manner than the classical binary approach. The tripartite classification is as follows: (1) Verbs that take animate subject and are compatible with adverb of volitionality; (2) Verbs that take animate subject but are not compatible with adverb of volitionality; and (3) Verbs that take inanimate subject. The classification is of immense advantage for various NLP tasks such as machine translation, natural language generation.

international conference natural language processing | 2010

Extraction of purpose data using surface text patterns

P. Kiran Mayee; Rajeev Sangal; Soma Paul

This paper presents the concept of surface text patterns for extracting purpose data from the web. In order to obtain an optimal set of patterns, we have developed a method for learning purpose patterns automatically. A corpus was downloaded from the Internet using bootstrapping by providing a few hand-crafted examples of each purpose pattern to a generic search engine. This corpus was then tagged and patterns were extracted from the returned documents by automated means and standardized. The precision of each pattern and the average precision for each group were computed. The extracted patterns were then used to extract purpose data. The results for extraction from the web have been reported.

pacific asia conference on language information and computation | 2013