Is this you? Create Your Porfile

Manish Shrivastava

International Institute of Information Technology, Hyderabad

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Manish Shrivastava is active.

Explore More

Publication

Featured researches published by Manish Shrivastava.

meeting of the association for computational linguistics | 2016

Together we stand: Siamese Networks for Similar Question Retrieval

Arpita Das; Harish Yenala; Manoj Kumar Chinnakotla; Manish Shrivastava

Community Question Answering (cQA) services like Yahoo! Answers1, Baidu Zhidao2, Quora3, StackOverflow4 etc. provide a platform for interaction with experts and help users to obtain precise and accurate answers to their questions. The time lag between the user posting a question and receiving its answer could be reduced by retrieving similar historic questions from the cQA archives. The main challenge in this task is the “lexicosyntactic” gap between the current and the previous questions. In this paper, we propose a novel approach called “Siamese Convolutional Neural Network for cQA (SCQA)” to find the semantic similarity between the current and the archived questions. SCQA consist of twin convolutional neural networks with shared parameters and a contrastive loss function joining them. SCQA learns the similarity metric for question-question pairs by leveraging the question-answer pairs available in cQA forum archives. The model projects semantically similar question pairs nearer to each other and dissimilar question pairs farther away from each other in the semantic space. Experiments on large scale reallife “Yahoo! Answers” dataset reveals that SCQA outperforms current state-of-theart approaches based on translation models, topic models and deep neural network https://answers.yahoo.com/ http://zhidao.baidu.com/ http://www.quora.com/ http://stackoverflow.com/ based models which use non-shared parameters.

international world wide web conferences | 2015

Answer ka type kya he?: Learning to Classify Questions in Code-Mixed Language

Khyathi Chandu Raghavi; Manoj Kumar Chinnakotla; Manish Shrivastava

Code-Mixing (CM) is defined as the embedding of linguistic units such as phrases, words, and morphemes of one language into an utterance of another language. CM is a natural phenomenon observed in many multilingual societies. It helps in speeding-up communication and allows wider variety of expression due to which it has become a popular mode of communication in social media forums like Facebook and Twitter. However, current Question Answering (QA) research and systems only support expressing a question in a single language which is an unrealistic and hard proposition especially for certain domains like health and technology. In this paper, we take the first step towards the development of a full-fledged QA system in CM language which is building a Question Classification (QC) system. The QC system analyzes the user question and infers the expected Answer Type (AType). The AType helps in locating and verifying the answer as it imposes certain type-specific constraints. In this paper, we present our initial efforts towards building a full-fledged QA system for CM language. We learn a basic Support Vector Machine (SVM) based QC system for English-Hindi CM questions. Due to the inherent complexities involved in processing CM language and also the unavailability of language processing resources such POS taggers, Chunkers, Parsers, we design our current system using only word-level resources such as language identification, transliteration and lexical translation. To reduce data sparsity and leverage resources available in a resource-rich language, in stead of extracting features directly from the original CM words, we translate them commonly into English and then perform featurization. We created an evaluation dataset for this task and our system achieves an accuracy of 63% and 45% in coarse-grained and fine-grained categories of the question taxanomy. The idea of translating features into English indeed helps in improving accuracy over the unigram baseline.

international conference on mining intelligence and knowledge exploration | 2016

Multimodal Sentiment Analysis Using Deep Neural Networks

Harika Abburi; Rajendra Prasath; Manish Shrivastava; Suryakanth V. Gangashetty

Due to increase of online product reviews posted daily through various modalities such as video, audio and text, sentimental analysis has gained huge attention. Recent developments in web technologies have also enabled the increase of web content in Hindi. In this paper, an approach to detect the sentiment of an online Hindi product reviews based on its multi-modality natures (audio and text) is presented. For each audio input, Mel Frequency Cepstral Coefficients (MFCC) features are extracted. These features are used to develop a sentiment models using Gaussian Mixture Models (GMM) and Deep Neural Network (DNN) classifiers. From results, it is observed that DNN classifier gives better results compare to GMM. Further textual features are extracted from the transcript of the audio input by using Doc2vec vectors. Support Vector Machine (SVM) classifier is used to develop a sentiment model using these textual features. From experimental results it is observed that combining both the audio and text features results in improvement in the performance for detecting the sentiment of an online product reviews.

knowledge discovery and data mining | 2016

Mirror on the Wall: Finding Similar Questions with Deep Structured Topic Modeling

Arpita Das; Manish Shrivastava; Manoj Kumar Chinnakotla

Internet users today prefer getting precise answers to their questions rather than sifting through a bunch of relevant documents provided by search engines. This has led to the huge popularity of Community Question Answering cQA services like Yahoo! Answers, Baidu Zhidao, Quora, StackOverflowetc., where forum users respond to questions with precise answers. Over time, such cQA archives become rich repositories of knowledge encoded in the form of questions and user generated answers. In cQA archives, retrieval of similar questions, which have already been answered in some form, is important for improving the effectiveness of such forums. The main challenge while retrieving similar questions is the lexico-syntactic gap between the user query and the questions already present in the forum. In this paper, we propose a novel approach called Deep Structured Topic Model DSTM to bridge the lexico-syntactic gap between the question posed by the user and forum questions. DSTM employs a two-step process consisting of initially retrieving similar questions that lie in the vicinity of the query and latent topic vector space and then re-ranking them using a deep layered semantic model. Experiments on large scale real-life cQA dataset show that our approach outperforms the state-of-the-art translation and topic based baseline approaches.

forum for information retrieval evaluation | 2014

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search

Irshad Ahmad Bhat; Vandan Mujadia; Aniruddha Tammewar; Riyaz Ahmad Bhat; Manish Shrivastava

This paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics.n Query Word Labeling is on token level language identification of query words in code-mixed queries and back-transliteration of identified Indian language words into their native scripts. We have developed letter based language models for the token level language identification of query words and a structured perceptron model for back-transliteration of Indic words.n The second subtask for Mixed-script Ad hoc retrieval for Hindi Song Lyrics is to retrieve a ranked list of songs from a corpus of Hindi song lyrics given an input query in Devanagari or transliterated Roman script. We have used edit distance based query expansion and language modeling followed by relevance based reranking for the retrieval of relevant Hindi Song lyrics for a given query.

international symposium on neural networks | 2017

Significance of neural phonotactic models for large-scale spoken language identification

Brij Mohan Lal Srivastava; Hari Krishna Vydana; Anil Kumar Vuppala; Manish Shrivastava

Language identification (LID) is vital frontend for spoken dialogue systems operating in diverse linguistic settings to reduce recognition and understanding errors. Existing LID systems which use low-level signal information for classification do not scale well due to exponential growth of parameters as the classes increase. They also suffer performance degradation due to the inherent variabilities of speech signal. In the proposed approach, we model the language-specific phonotactic information in speech using recurrent neural network for developing an LID system. The input speech signal is tokenized to phone sequences by using a common language-independent phone recognizer with varying phonetic coverage. We establish a causal relationship between phonetic coverage and LID performance. The phonotactics in the observed phone sequences are modeled using statistical and recurrent neural network language models to predict language-specific symbol from a universal phonetic inventory. Proposed approach is robust, computationally light weight and highly scalable. Experiments show that the convex combination of statistical and recurrent neural network language model (RNNLM) based phonotactic models significantly outperform a strong baseline system of Deep Neural Network (DNN) which is shown to surpass the performance of i-vector based approach for LID. The proposed approach outperforms the baseline models in terms of mean F1 score over 176 languages. Further we provide significant information-theoretic evidence to analyze the mechanism of the proposed approach.

cross language evaluation forum | 2017

WebShodh: A Code Mixed Factoid Question Answering System for Web

Khyathi Raghavi Chandu; Manoj Kumar Chinnakotla; Alan W. Black; Manish Shrivastava

Code-Mixing (CM) is a natural phenomenon observed in many multilingual societies and is becoming the preferred medium of expression and communication in online and social media fora. In spite of this, current Question Answering (QA) systems do not support CM and are only designed to work with a single interaction language. This assumption makes it inconvenient for multi-lingual users to interact naturally with the QA system especially in scenarios where they do not know the right word in the target language. In this paper, we present WebShodh - an end-end web-based Factoid QA system for CM languages. We demonstrate our system with two CM language pairs: Hinglish (Matrix language: Hindi, Embedded language: English) and Tenglish (Matrix language: Telugu, Embedded language: English). Lack of language resources such as annotated corpora, POS taggers or parsers for CM languages poses a huge challenge for automated processing and analysis. In view of this resource scarcity, we only assume the existence of bi-lingual dictionaries from the matrix languages to English and use it for lexically translating the question into English. Later, we use this loosely translated question for our downstream analysis such as Answer Type(AType) prediction, answer retrieval and ranking. Evaluation of our system reveals that we achieve an MRR of 0.37 and 0.32 for Hinglish and Tenglish respectively. We hosted this system online and plan to leverage it for collecting more CM questions and answers data for further improvement.

ieee region 10 conference | 2016

Improved multimodal sentiment detection using stressed regions of audio

Harika Abburi; Manish Shrivastava; Suryakanth V. Gangashetty

Recent advancement of social media has led people to share the product reviews through various modalities such as audio, text and video. In this paper, an improved approach to detect the sentiment of an online spoken reviews based on its multi-modality natures (audio and text) is presented. To extract the sentiment from audio, Mel Frequency Cepstral Coefficients (MFCC) features are extracted at stressed significant regions which are detected based on the strength of excitation. Gaussian Mixture Models (GMM) classifier is employed to develop a sentiment model using these features. From results, it is observed that MFCC features extracted at stressed significance regions perform better than the features extracted from the whole audio input. Further from the transcript of the audio input, textual features are computed by Doc2vec vectors. Support Vector Machine (SVM) classifier is used to develop a sentiment model using these textual features. From experimental results it is observed that combining both the audio and text features results in improvement in the performance for detecting the sentiment of a review.

international conference on the theory of information retrieval | 2018

Towards Word Embeddings for Improved Duplicate Bug Report Retrieval in Software Repositories

Amar Budhiraja; Kartik Dutta; Manish Shrivastava; Raghu Reddy

A key part of software maintenance is bug reporting and rectification. Bug reporting is a major issue and due to its asynchronous nature, duplicate bug reporting is common. Detecting duplicate bug reports is an important task in software maintenance in order to avoid the assignment of the same bug to different developers. In this paper, we explore the notion of using word embeddings for retrieving duplicate bug report in large software repositories. We discuss an approach to model each bug report as a dense vector and retrieve its top-k most similar reports for duplicate bug report detection. Through experiments on two real world datasets, we show that word embeddings perform better than baselines and related approaches and have the potential to improve duplicate bug report retrieval.

international conference on software engineering | 2018

Poster: LWE: LDA Refined Word Embeddings for Duplicate Bug Report Detection

Amar Budhiraja; Raghu Reddy; Manish Shrivastava

Bug reporting is a major part of software maintenance and due to its inherently asynchronous nature, duplicate bug reporting has become fairly common. Detecting duplicate bug reports is an important task in order to avoid the assignment of a same bug to different developers. Earlier approaches have improved duplicate bug report detection by using the notions of word embeddings, topic models and other machine learning approaches. In this poster, we attempt to combine Latent Dirichlet Allocation (LDA) and word embeddings to leverage the strengths of both approaches for this task. As a first step towards this idea, we present initial analysis and an approach which is able to outperform both word embeddings and LDA for this task. We validate our hypothesis on a real world dataset of Firefox project and show that there is potential in combining both LDA and word embeddings for duplicate bug report detection.

Explore More