Shashi Narayan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shashi Narayan is active.

Explore More

Publication

Featured researches published by Shashi Narayan.

meeting of the association for computational linguistics | 2014

Hybrid Simplification using Deep Semantics and Machine Translation

Shashi Narayan; Claire Gardent

We present a hybrid approach to sentence simplification which combines deep semantics and monolingual machine translation to derive simple sentences from complex ones. The approach differs from previous work in two main ways. First, it is semantic based in that it takes as input a deep semantic representation rather than e.g., a sentence or a parse tree. Second, it combines a simplification model for splitting and deletion with a monolingual translation model for phrase substitution and reordering. When compared against current state of the art methods, our model yields significantly simpler output that is both grammatical and meaning preserving.

Pattern Recognition Letters | 2010

A composite kernel for named entity recognition

Sujan Kumar Saha; Shashi Narayan; Sudeshna Sarkar; Pabitra Mitra

In this paper, we propose a novel kernel function for support vector machines (SVM) that can be used for sequential labeling tasks like named entity recognition (NER). Machine learning methods like support vector machines, maximum entropy, hidden Markov model and conditional random fields are the most widely used methods for implementing NER systems. The features used in machine learning algorithms for NER are mostly string based features. The proposed kernel is based on calculating a novel distance function between the string based features. In tasks like NER, the similarity between the contexts as well as the semantic similarity between the words play an important role. The goal is to capture the context and semantic information in NER like tasks. The proposed distance function makes use of certain statistics primarily derived from the training data and hierarchical clustering information. The kernel function is applied to the Hindi and biomedical NER tasks and the results are quite promising.

meeting of the association for computational linguistics | 2017

Creating Training Corpora for NLG Micro-Planners

Claire Gardent; Anastasia Shimorina; Shashi Narayan; Laura Perez-Beltrachini

In this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)s dataset. We show that while (Wen et al., 2016)s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data.

empirical methods in natural language processing | 2015

Diversity in Spectral Learning for Natural Language Parsing

Shashi Narayan; Shay B. Cohen

We describe an approach to create a diverse set of predictions with spectral learning of latent-variable PCFGs (L-PCFGs). Our approach works by creating multiple spectral models where noise is added to the underlying features in the training set before the estimation of each model. We describe three ways to decode with multiple models. In addition, we describe a simple variant of the spectral algorithm for L-PCFGs that is fast and leads to compact models. Our experiments for natural language parsing, for English and German, show that we get a significant improvement over baselines comparable to state of the art. For English, we achieve the F1 score of 90.18, and for German we achieve theF1 score of 83.38.

international conference on natural language generation | 2016

Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing

Shashi Narayan; Siva Reddy; Shay B. Cohen

One of the limitations of semantic parsing approaches to open-domain question answering is the lexicosyntactic gap between natural language questions and knowledge base entries -- there are many ways to ask a question, all with the same answer. In this paper we propose to bridge this gap by generating paraphrases of the input question with the goal that at least one of them will be correctly mapped to a knowledge-base query. We introduce a novel grammar model for paraphrase generation that does not require any sentence-aligned paraphrase corpus. Our key idea is to leverage the flexibility and scalability of latent-variable probabilistic context-free grammars to sample paraphrases. We do an extrinsic evaluation of our paraphrases by plugging them into a semantic parser for Freebase. Our evaluation experiments on the WebQuestions benchmark dataset show that the performance of the semantic parser significantly improves over strong baselines.

international conference on natural language generation | 2016

The WebNLG Challenge: Generating Text from DBPedia Data

Emilie Colin; Claire Gardent; Yassine Mrabet; Shashi Narayan; Laura Perez-Beltrachini

With the emergence of the linked data initiative and the rapid development of RDF (Resource Description Format) datasets, several approaches have recently been proposed for generating text from RDF data (Sun and Mellish, 2006; Duma and Klein, 2013; Bontcheva and Wilks, 2004; Cimiano et al., 2013; Lebret et al., 2016). To support the evaluation and comparison of such systems, we propose a shared task on generating text from DBPedia data. The training data will consist of Data/Text pairs where the data is a set of triples extracted from DBPedia and the text is a verbalisation of these triples. In essence, the task consists in mapping data to text. Specific subtasks include sentence segmentation (how to chunk the input data into sentences), lexicalisation (of the DBPedia properties), aggregation (how to avoid repetitions) and surface realisation (how to build a syntactically correct and natural sounding text).

meeting of the association for computational linguistics | 2016

Optimizing Spectral Learning for Parsing

Shashi Narayan; Shay B. Cohen

We describe a search algorithm for optimizing the number of latent states when estimating latent-variable PCFGs with spectral methods. Our results show that contrary to the common belief that the number of latent states for each nonterminal in an L-PCFG can be decided in isolation with spectral methods, parsing results significantly improve if the number of latent states for each nonterminal is globally optimized, while taking into account interactions between the different nonterminals. In addition, we contribute an empirical analysis of spectral algorithms on eight morphologically rich languages: Basque, French, German, Hebrew, Hungarian, Korean, Polish and Swedish. Our results show that our estimation consistently performs better or close to coarse-to-fine expectation-maximization techniques for these languages.

Computational Linguistics | 2015

Multiple adjunction in feature-based tree-adjoining grammar

Claire Gardent; Shashi Narayan

In parsing with Tree Adjoining Grammar (TAG), independent derivations have been shown by Schabes and Shieber (1994) to be essential for correctly supporting syntactic analysis, semantic interpretation, and statistical language modeling. However, the parsing algorithm they propose is not directly applicable to Feature-Based TAGs (FB-TAG). We provide a recognition algorithm for FB-TAG that supports both dependent and independent derivations. The resulting algorithm combines the benefits of independent derivations with those of Feature-Based grammars. In particular, we show that it accounts for a range of interactions between dependent vs. independent derivation on the one hand, and syntactic constraints, linear ordering, and scopal vs. nonscopal semantic dependencies on the other hand.

international conference on natural language generation | 2016

Unsupervised Sentence Simplification Using Deep Semantics

Shashi Narayan; Claire Gardent

We present a novel approach to sentence simplification which departs from previous work in two main ways. First, it requires neither hand written rules nor a training corpus of aligned standard and simplified sentences. Second, sentence splitting operates on deep semantic structure. We show (i) that the unsupervised framework we propose is competitive with four state-of-the-art supervised systems and (ii) that our semantic based approach allows for a principled and effective handling of sentence splitting.

international conference on computational linguistics | 2012