Manoj Kumar Chinnakotla

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Manoj Kumar Chinnakotla is active.

Explore More

Publication

Featured researches published by Manoj Kumar Chinnakotla.

international world wide web conferences | 2015

Answer ka type kya he?: Learning to Classify Questions in Code-Mixed Language

Khyathi Chandu Raghavi; Manoj Kumar Chinnakotla; Manish Shrivastava

Code-Mixing (CM) is defined as the embedding of linguistic units such as phrases, words, and morphemes of one language into an utterance of another language. CM is a natural phenomenon observed in many multilingual societies. It helps in speeding-up communication and allows wider variety of expression due to which it has become a popular mode of communication in social media forums like Facebook and Twitter. However, current Question Answering (QA) research and systems only support expressing a question in a single language which is an unrealistic and hard proposition especially for certain domains like health and technology. In this paper, we take the first step towards the development of a full-fledged QA system in CM language which is building a Question Classification (QC) system. The QC system analyzes the user question and infers the expected Answer Type (AType). The AType helps in locating and verifying the answer as it imposes certain type-specific constraints. In this paper, we present our initial efforts towards building a full-fledged QA system for CM language. We learn a basic Support Vector Machine (SVM) based QC system for English-Hindi CM questions. Due to the inherent complexities involved in processing CM language and also the unavailability of language processing resources such POS taggers, Chunkers, Parsers, we design our current system using only word-level resources such as language identification, transliteration and lexical translation. To reduce data sparsity and leverage resources available in a resource-rich language, in stead of extracting features directly from the original CM words, we translate them commonly into English and then perform featurization. We created an evaluation dataset for this task and our system achieves an accuracy of 63% and 45% in coarse-grained and fine-grained categories of the question taxanomy. The idea of translating features into English indeed helps in improving accuracy over the unigram baseline.

knowledge discovery and data mining | 2016

Mirror on the Wall: Finding Similar Questions with Deep Structured Topic Modeling

Arpita Das; Manish Shrivastava; Manoj Kumar Chinnakotla

Internet users today prefer getting precise answers to their questions rather than sifting through a bunch of relevant documents provided by search engines. This has led to the huge popularity of Community Question Answering cQA services like Yahoo! Answers, Baidu Zhidao, Quora, StackOverflowetc., where forum users respond to questions with precise answers. Over time, such cQA archives become rich repositories of knowledge encoded in the form of questions and user generated answers. In cQA archives, retrieval of similar questions, which have already been answered in some form, is important for improving the effectiveness of such forums. The main challenge while retrieving similar questions is the lexico-syntactic gap between the user query and the questions already present in the forum. In this paper, we propose a novel approach called Deep Structured Topic Model DSTM to bridge the lexico-syntactic gap between the question posed by the user and forum questions. DSTM employs a two-step process consisting of initially retrieving similar questions that lie in the vicinity of the query and latent topic vector space and then re-ranking them using a deep layered semantic model. Experiments on large scale real-life cQA dataset show that our approach outperforms the state-of-the-art translation and topic based baseline approaches.

pacific-asia conference on knowledge discovery and data mining | 2017

Convolutional Bi-directional LSTM for Detecting Inappropriate Query Suggestions in Web Search

Harish Yenala; Manoj Kumar Chinnakotla; Jay Kumar Goyal

A web search query is considered inappropriate if it may cause anger, annoyance to certain users or exhibits lack of respect, rudeness, discourteousness towards certain individuals/communities or may be capable of inflicting harm to oneself or others. A search engine should regulate its query completion suggestions by detecting and filtering such queries as it may hurt the user sentiments or may lead to legal issues thereby tarnishing the brand image. Hence, automatic detection and pruning of such inappropriate queries from completions and related search suggestions is an important problem for most commercial search engines. The problem is rendered difficult due to unique challenges posed by search queries such as lack of sufficient context, natural language ambiguity and presence of spelling mistakes and variations.

cross language evaluation forum | 2017

WebShodh: A Code Mixed Factoid Question Answering System for Web

Khyathi Raghavi Chandu; Manoj Kumar Chinnakotla; Alan W. Black; Manish Shrivastava

Code-Mixing (CM) is a natural phenomenon observed in many multilingual societies and is becoming the preferred medium of expression and communication in online and social media fora. In spite of this, current Question Answering (QA) systems do not support CM and are only designed to work with a single interaction language. This assumption makes it inconvenient for multi-lingual users to interact naturally with the QA system especially in scenarios where they do not know the right word in the target language. In this paper, we present WebShodh - an end-end web-based Factoid QA system for CM languages. We demonstrate our system with two CM language pairs: Hinglish (Matrix language: Hindi, Embedded language: English) and Tenglish (Matrix language: Telugu, Embedded language: English). Lack of language resources such as annotated corpora, POS taggers or parsers for CM languages poses a huge challenge for automated processing and analysis. In view of this resource scarcity, we only assume the existence of bi-lingual dictionaries from the matrix languages to English and use it for lexically translating the question into English. Later, we use this loosely translated question for our downstream analysis such as Answer Type(AType) prediction, answer retrieval and ranking. Evaluation of our system reveals that we achieve an MRR of 0.37 and 0.32 for Hinglish and Tenglish respectively. We hosted this system online and plan to leverage it for collecting more CM questions and answers data for further improvement.

international acm sigir conference on research and development in information retrieval | 2018

Lessons from Building a Large-scale Commercial IR-based Chatbot for an Emerging Market

Manoj Kumar Chinnakotla; Puneet Agrawal

In this work, we highlight some interesting challenges faced when trying to build a large-scale commercial IR-based chatbot, Ruuh, for an emerging market like India which has unique characteristics such as high linguistic and cultural diversity, large section of young population and the second largest mobile market in the world. We set out to build a human-like AI agent which aspires to become the trusted friend of every Indian youth. To meet this objective, we realised that we need to think beyond the utilitarian notion of merely generating relevant responses and enable the agent to comprehend and meet a wider range of user social needs, like expressing happiness when users favourite team wins, sharing a cute comment on showing the pictures of the users pet and so on. The agent should also be well-versed with the informal language of the urban Indian youth which often includes slang and code-mixing across two or more languages (English and their native language). Finally, in order to be their trusted friend, the agent has to communicate with respect without offending their sentiments and emotions. Some of the above objectives pose significant research challenges in the areas of NLP, IR and AI. We take the audience through our journey of how we tackled some of the above challenges while building a large-scale commercial IR-based conversational agent. Our attempts to solve some of the above challenges have also resulted in some interesting research contributions in the form of publications and patents in the above areas. Our chat-bot currently has more than 1M users who have engaged in more than 70M conversations.

Journal of data science | 2017

Deep learning for detecting inappropriate content in text

Harish Yenala; Ashish Jhanwar; Manoj Kumar Chinnakotla; Jay Kumar Goyal

Today, there are a large number of online discussion fora on the internet which are meant for users to express, discuss and exchange their views and opinions on various topics. For example, news portals, blogs, social media channels such as youtube. typically allow users to express their views through comments. In such fora, it has been often observed that user conversations sometimes quickly derail and become inappropriate such as hurling abuses, passing rude and discourteous comments on individuals or certain groups/communities. Similarly, some virtual agents or bots have also been found to respond back to users with inappropriate messages. As a result, inappropriate messages or comments are turning into an online menace slowly degrading the effectiveness of user experiences. Hence, automatic detection and filtering of such inappropriate language has become an important problem for improving the quality of conversations with users as well as virtual agents. In this paper, we propose a novel deep learning-based technique for automatically identifying such inappropriate language. We especially focus on solving this problem in two application scenarios—(a) Query completion suggestions in search engines and (b) Users conversations in messengers. Detecting inappropriate language is challenging due to various natural language phenomenon such as spelling mistakes and variations, polysemy, contextual ambiguity and semantic variations. For identifying inappropriate query suggestions, we propose a novel deep learning architecture called “Convolutional Bi-Directional LSTM (C-BiLSTM) which combines the strengths of both Convolution Neural Networks (CNN) and Bi-directional LSTMs (BLSTM). For filtering inappropriate conversations, we use LSTM and Bi-directional LSTM (BLSTM) sequential models. The proposed models do not rely on hand-crafted features, are trained end-end as a single model, and effectively capture both local features as well as their global semantics. Evaluating C-BiLSTM, LSTM and BLSTM models on real-world search queries and conversations reveals that they significantly outperform both pattern-based and other hand-crafted feature-based baselines.

international conference on artificial intelligence | 2015

Did you know?: mining interesting trivia for entities from wikipedia

Abhay Prakash; Manoj Kumar Chinnakotla; Dhaval Patel; Puneet Garg

international acm sigir conference on research and development in information retrieval | 2018

Puneet Agrawal and Manoj Kumar Chinnakotla. Lessons from Building a Large-scale Commercial IR-based Chatbot for an Emerging Market

Puneet Agrawal; Manoj Kumar Chinnakotla

arXiv: Computation and Language | 2018

Ruuh: A Deep Learning Based Conversational Social Agent.

Sonam Damani; Nitya Raviprakash; Umang Gupta; Ankush Chatterjee; Meghana Joshi; Khyatti Gupta; Kedhar Nath Narahari; Puneet Agrawal; Manoj Kumar Chinnakotla; Sneha Magapu; Abhishek Mathur

arXiv: Information Retrieval | 2017