Is this you? Create Your Porfile

Dipti Misra Sharma

International Institute of Information Technology, Hyderabad

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dipti Misra Sharma is active.

Explore More

Publication

Featured researches published by Dipti Misra Sharma.

linguistic annotation workshop | 2009

A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu

Rajesh Bhatt; Bhuvana Narasimhan; Martha Palmer; Owen Rambow; Dipti Misra Sharma; Fei Xia

This paper describes the simultaneous development of dependency structure and phrase structure treebanks for Hindi and Urdu, as well as a PropBank. The dependency structure and the PropBank are manually annotated, and then the phrase structure treebank is produced automatically. To ensure successful conversion the development of the guidelines for all three representations are carefully coordinated.

linguistic annotation workshop | 2009

The Hindi Discourse Relation Bank

Umangi Oza; Rashmi Prasad; Sudheer Kolachina; Dipti Misra Sharma; Aravind K. Joshi

We describe the Hindi Discourse Relation Bank project, aimed at developing a large corpus annotated with discourse relations. We adopt the lexically grounded approach of the Penn Discourse Treebank, and describe our classification of Hindi discourse connectives, our modifications to the sense classification of discourse relations, and some cross-linguistic comparisons based on some initial annotations carried out so far.

linguistic annotation workshop | 2009

Simple Parser for Indian Languages in a Dependency Framework

Akshar Bharati; Mridul Gupta; Vineet Yadav; Karthik Gali; Dipti Misra Sharma

This paper is an attempt to show that an intermediary level of analysis is an effective way for carrying out various NLP tasks for linguistically similar languages. We describe a process for developing a simple parser for doing such tasks. This parser uses a grammar driven approach to annotate dependency relations (both inter and intra chunk) at an intermediary level. Ease in identifying a particular dependency relation dictates the degree of analysis reached by the parser. To establish efficiency of the simple parser we show the improvement in its results over previous grammar driven dependency parsing approaches for Indian languages like Hindi. We also propose the possibility of usefulness of the simple parser for Indian languages that are similar in nature.

international conference natural language processing | 2010

Anusaaraka: An expert system based machine translation system

Sriram Chaudhury; Ankitha Rao; Dipti Misra Sharma

Most research in Machine translation is about having the computers completely bear the load of translating one human language into another. This paper looks at the machine translation problem afresh and observes that there is a need to share the load between man and machine, distinguish reliable knowledge from the heuristics, provide a spectrum of outputs to serve different strata of people, and finally make use of existing resources instead of reinventing the wheel. This paper describes a unique approach to develop machine translation system based on the insights of information dynamics from Paninian Grammar Formalism. Anusaaraka is a Language Accessor cum Machine Translation system based on the fundamental premise of sharing the load producing good enough results according to the needs of the reader. The system promises to give faithful representation of the translated text, no loss of information while translating and graceful degradation (robustness) in case of failure. The layered output provides an access to all the stages of translation making the whole process transparent. Thus, Anusaaraka differs from the Machine Translation systems in two respects: (1) its commitment to faithfulness and thereby providing a layer of 100% faithful output so that a user with some training can “access the source text” faithfully. (2) The system is so designed that a user can contribute to it and participate in improving its quality. Further Anusaaraka provides an eclectic combination of the Apertium architecture with the forward chaining expert system, allowing use of both the deep parser and shallow parser outputs to analyze the SL text. Existing language resources (parsers, taggers, chunkers) available under GPL are used instead of rewriting it again. Language data and linguistic rules are independent from the core programme, making it easy for linguists to modify and experiment with different language phenomena to improve the system. Users can become contributors by contributing new word sense disambiguation (WSD) rules of the ambiguous words through a web-interface available over internet. The system uses forward chaining of expert system to infer new language facts from the existing language data. It helps to solve the complex behavior of language translation by applying specific knowledge rather than specific technique creating a vast language knowledge base in electronic form. Or in other words, the expert system facilitates the transformation of subject matter experts (SME) knowledge available with humans into a computer processable knowledge base.

international conference on asian language processing | 2009

A Modular Cascaded Approach to Complete Parsing

Samar Husain; Phani Gadde; Bharat Ram Ambati; Dipti Misra Sharma; Rajeev Sangal

In this paper, we propose a modular cascaded approach to data driven dependency parsing. Each module or layer leading to the complete parse produces a linguistically valid partial parse. We do this by introducing an artificial root node in the dependency structure of a sentence and by catering to distinct dependency label sets that reflect the function of the set internal labels vis-à-vis a distinct and identifiable linguistic unit, at different layers. The linguistic unit in our approach is a clause. Output (partial parse) from each layer can be accessed independently. We applied this approach to Hindi, a morphologically rich free word order language using MST Parser. We did all our experiments on a part of Hyderabad Dependency Treebank. The final results show an increase of 1.35% in unlabeled attachment and 1.36% in labeled attachment accuracies over state-of-the-art data driven Hindi parser.

international conference on computational linguistics | 2011

Identification of conjunct verbs in hindi and its effect on parsing accuracy

Rafiya Begum; Karan Jindal; Ashish Jain; Samar Husain; Dipti Misra Sharma

This paper introduces a work on identification of conjunct verbs in Hindi. The paper will first focus on investigating which noun-verb combination makes a conjunct verb in Hindi using a set of linguistic diagnostics. We will then see which of these diagnostics can be used as features in a MaxEnt based automatic identification tool. Finally we will use this tool to incorporate certain features in a graph based dependency parser and show an improvement over previous best Hindi parsing accuracy.

international conference on computational linguistics | 2010

Issues in analyzing telugu sentences towards building a telugu treebank

Chaitanya Vempaty; Viswanatha Naidu; Samar Husain; Ravi Kiran; Lakshmi Bai; Dipti Misra Sharma; Rajeev Sangal

This paper describes an effort towards building a Telugu Dependency Treebank. We discuss the basic framework and issues we encountered while annotating. 1487 sentences have been annotated in Paninian framework. We also discuss how some of the annotation decisions would effect the development of a parser for Telugu.

international conference on information systems | 2011

Developing Oriya Morphological Analyzer Using Lt-Toolbox

Itisree Jena; Sriram Chaudhury; Himani Chaudhry; Dipti Misra Sharma

In this paper we present the work done on developing a Morphological Analyzer (MA) for Oriya language, following the paradigm approach. A paradigm defines all the word forms of a given stem, and also provides a feature structure associated with every word. It consists of various paradigms under which nouns, adjectives, indeclinables (avyaya) and finite verbs of Oriya are classified. Further, we discuss the construction of paradigms and the thought process that goes into their construction. The paradigms have been created using an XML based morphological dictionary from the Lt-toolbox package.

international conference on computational linguistics | 2013

An automatic approach to treebank error detection using a dependency parser

Bhasha Agrawal; Rahul Agarwal; Samar Husain; Dipti Misra Sharma

Treebanks play an important role in the development of various natural language processing tools. Amongst other things, they provide crucial language-specific patterns that are exploited by various machine learning techniques. Quality control in any treebanking project is therefore extremely important. Manual validation of the treebank is one of the steps that is generally necessary to ensure good annotation quality. Needless to say, manual validation requires a lot of human time and effort. In this paper, we present an automatic approach which helps in detecting potential errors in a treebank. We use a dependency parser to detect such errors. By using this tool, validators can validate a treebank in less time and with reduced human effort.

meeting of the association for computational linguistics | 2007

Simple Preposition Correspondence: A Problem in English to Indian Language Machine Translation

Samar Husain; Dipti Misra Sharma; Manohar Reddy

The paper describes an approach to automatically select from Indian Language the appropriate lexical correspondence of English simple preposition. The paper describes this task from a Machine Translation (MT) perspective. We use the properties of the head and complement of the preposition to select the appropriate sense in the target language. We later show that the results obtained from this approach are promising.

Explore More