Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nitendra Rajput is active.

Publication


Featured researches published by Nitendra Rajput.


acm workshop on networked systems for developing regions | 2007

WWTW: the world wide telecom web

Arun Kumar; Nitendra Rajput; Dipanjan Chakraborty; Sheetal K. Agarwal; Amit Anil Nanavati

The World Wide Web (WWW) enabled quick and easy information dissemination and brought about fundamental changes to various aspects of our lives. However, a very large number of people, mostly in developing regions, are still untouched by this revolution. Compared to PCs, the primary access mechanism to WWW, mobile phones have made a phenomenal penetration into this population segment. Low cost of ownership, the simple user interface consisting of a small keyboard, limited menu and voice-based access contribute to the success of mobile phones with the less literate. However, apart from basic voice communication, these people are not being able to exploit the benefits of information and services available to WWW users. In this paper, we present the World Wide Telecom Web (WWTW) --- our vision of a voice-driven ecosystem parallel to that of the WWW. WWTW is a network of interconnected voice sites that are voice driven applications created by users and hosted in the network. It has the potential to enable the underprivileged population to become a part of the next generation converged networked world. We present a whole gamut of existing technology enablers for our vision as well as present research directions and open challenges that need to be solved to not only realize a WWTW but also to enable the two Webs to cross leverage each other.


international conference on multimedia and expo | 2000

Translingual visual speech synthesis

Tanveer A. Faruquie; Chalapathy Neti; Nitendra Rajput; L.V. Subramaniam; Abhishek Verma

Audio-driven facial animation is an interesting and evolving technique for human-computer interaction. Based on an incoming audio stream, a face image is animated with full lip synchronization. This requires a speech recognition system in the language in which audio is provided to get the time alignment for the phonetic sequence of the audio signal. However, building a speech recognition system is data intensive and is a very tedious and time consuming task. We present a novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in our case, English. The method presented here can also be used for text to audio-visual speech synthesis.


human factors in computing systems | 2009

A comparative study of speech and dialed input voice interfaces in rural India

Neil Patel; Sheetal K. Agarwal; Nitendra Rajput; Amit Anil Nanavati; Paresh Dave; Tapan S. Parikh

In this paper we present a study comparing speech and dialed input voice user interfaces for farmers in Gujarat, India. We ran a controlled, between-subjects experiment with 45 participants. We found that the task completion rates were significantly higher with dialed input, particularly for subjects under age 30 and those with less than an eighth grade education. Additionally, participants using dialed input demonstrated a significantly greater performance improvement from the first to final task, and reported less difficulty providing input to the system.


Ibm Journal of Research and Development | 2004

A large-vocabulary continuous speech recognition system for Hindi

Mohit Kumar; Nitendra Rajput; Ashish Verma

In this paper we present two new techniques that have been used to build a large-vocabulary continuous Hindi speech recognition system. We present a technique for fast bootstrapping of initial phone models of a new language. The training data for the new language is aligned using an existing speech recognition engine for another language. This aligned data is used to obtain the initial acoustic models for the phones of the new language. Following this approach requires less training data. We also present a technique for generating baseforms (phonetic spellings) for phonetic languages such as Hindi. As is inherent in phonetic languages, rules generally capture the mapping of spelling to phonemes very well. However, deep linguistic knowledge is required to write all possible rules, and there are some ambiguities in the language that are difficult to capture with rules. On the other hand, pure statistical techniques for base and generation require large amounts of training data that are not readily available. We propose a hybrid approach that combines rule-based and statistical approaches in a two-step fashion. We evaluate the performance of the proposed approaches through various phonetic classification and recognition experiments.


information and communication technologies and development | 2009

Content creation and dissemination by-and-for users in rural areas

Sheetal K. Agarwal; Arun Kumar; Amit Anil Nanavati; Nitendra Rajput

83% of the world population does not have access to Internet. Therefore there is a need for a simple and affordable interaction technology that can enable easy content creation and dissemination for this population. In this paper, we present the design, development and usage pattern of a VoiKiosk system that provides a voice-based kiosk solution for people in rural areas. This system is accessible by phone and thus meets the affordability and low literacy requirements. We present usability results gathered from usage by more than 900 villagers during four month of the on-field deployment of the system. The on-field experiments suggest the importance of locally created content in their own language for this population. The system provides interesting insights about the manner in which this community can create and manage information. Based on the use of the system in the four months, the VoiKiosk also suggests a mechanism to enable social networking for the rural population.


international conference on acoustics, speech, and signal processing | 2012

The Spoken Web Search Task at MediaEval 2011

Florian Metze; Nitendra Rajput; Xavier Anguera; Marelie H. Davel; Guillaume Gravier; Charl Johannes van Heerden; Gautam Varma Mantena; Armando Muscariello; Kishore Prahallad; Igor Szöke; Javier Tejedor

In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from “spoken web” material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and using different approaches, compare them, and provide analysis and directions for future research.


world of wireless mobile and multimedia networks | 2007

VOISERV: Creation and Delivery of Converged Services through Voice for Emerging Economies

Arun Kumar; Nitendra Rajput; Dipanjan Chakraborty; Sheetal K. Agarwal; Amit Anil Nanavati

WWW has made information accessible to computer users in various ways not imagined before. However; there is a huge pool ofpeople, especially in emerging economies, still untouched by this revolution and are either unaware of, or are unable or to join this bandwagon. Mobile phones are increasingly empowering the under-privileged to utilize data and services beyond the basic voice communication. However; factors such as high illiteracy rate, cost sensitivity, and user interface issues prevent these users from deriving benefits of available infrastructure and services. We have developed a novel system ¿ VOISERV that enables ordinary telephone subscribers to create, deploy and offer their own customized voice-driven applications called Voic-eSites. The generated VoiceSites get hosted in the network for low cost of ownership and maintenance, and are integrated with advanced services available in the converged networks of today.


international world wide web conferences | 2008

Organizing the unorganized - employing IT to empower the under-privileged

Arun Kumar; Nitendra Rajput; Sheetal K. Agarwal; Dipanjan Chakraborty; Amit Anil Nanavati

Various sectors in developing countries are typically dominated by the presence of a large number of small and micro-businesses that operate in an informal, unorganized manner. Many of these are single person run micro-businesses and cannot afford to buy and maintain their own IT infrastructure. For others, easy availability of cheap labour provides a convenient alternative even though it results in inefficiency, as little or no records are maintained, and only manual, paper-based processes are followed. This results in high response times for customers, no formal accountability and higher charges. For the businesses this translates to lower earnings and losses due to inefficiencies. In this paper, we look at few such micro-business segments and explore their current models of operation, while identifying existing inefficiencies and pain points. We build upon the findings and propose an approach for delivering benefits of IT solutions to such micro-business segments. Finally, we present technology that realizes the proposed approach in the specific context of two such segments.


multimedia signal processing | 1999

Audio-visual large vocabulary continuous speech recognition in the broadcast domain

Sankar Basu; Chalapathy Neti; Nitendra Rajput; Andrew W. Senior; L. Subramaniam; Ashish Verma

Considers the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of speech. Although significant progress has been made in the machine transcription of large-vocabulary continuous speech (LVCSR) over the last few years, the technology to date is most effective only under controlled conditions, such as low noise, speaker-dependent recognition, read speech (as opposed to conversational speech), etc. On the other hand, while augmenting the recognition of speech utterances with visual cues has attracted the attention of researchers over the last couple of years, most efforts in this domain can be considered to be only preliminary in the sense that, unlike LVCSR efforts, tasks have been limited to small vocabularies (e.g. commands, digits) and often to speaker-dependent training or isolated word speech, where word boundaries are artificially well-defined.


acm conference on hypertext | 2007

HSTP: hyperspeech transfer protocol

Sheetal K. Agarwal; Dipanjan Chakraborty; Arun Kumar; Amit Anil Nanavati; Nitendra Rajput

HTTP provides a mechanism to connect web sites. Almost all sites have a large amount of hypertext content that provides connection to other sites in the World Wide Web. The success of the WWW can be partly attributed to the seamlessly browsable web that is formed through this connectivity. However, navigation of hypermedia content through non-visual interfaces has not received as much attention. Specifically, telephony voice applications offer immense usability and penetration benefits and can act as alternate information access and delivery mechanisms. Connectivity across voice applications poses interesting and novel challenges. In this paper, we define Hyperspeech as a voice fragment in a voice application that is a hyperlink to a voice fragment in another voice application. Further, we present Hyperspeech Transfer Protocol (HSTP) - a protocol to seamlessly connect telephony voice applications. HSTP enables the users to browse across voice applications by navigating the Hyperspeech content in a voice application. HSTP can also be used for developing cross-enterprise applications that allow a user to transact across two or more voice applications.

Researchain Logo
Decentralizing Knowledge