L. V. Subramaniam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where L. V. Subramaniam is active.

Explore More

Publication

Featured researches published by L. V. Subramaniam.

Proceedings of the 1st international workshop on Multimodal crowd sensing | 2012

Harnessing the crowds for smart city sensing

Haggai Roitman; Jonathan Mamou; Sameep Mehta; Aharon Satt; L. V. Subramaniam

In this work we discuss the challenge of harnessing the crowd for smart city sensing. Within a citys context, such reports by citizen or city visitor eye witnesses may provide important information to city officials, additionally to more traditional data gathered by other means (e.g., through the citys control center, emergency services, sensors spread across the city, etc). We present an high-level overview of a novel crowd sensing system that we develop in IBM for the smart cities domain. As a proof of concept, we present some preliminary results using public safety as our example usecase.

extending database technology | 2013

Processing multi-way spatial joins on map-reduce

Himanshu Gupta; Bhupesh Chawda; Sumit Negi; Tanveer A. Faruquie; L. V. Subramaniam; Mukesh K. Mohania

In this paper we investigate the problem of processing multi-way spatial joins on map-reduce platform. We look at two common spatial predicates - overlap and range. We address these two classes of join queries, discuss the challenges and outline novel approaches for executing these queries on a map-reduce framework. We then discuss how we can process join queries involving both overlap and range predicates. Specifically we present a Controlled-Replicate framework using which we design the approaches presented in this paper. The Controlled-Replicate framework is carefully engineered to minimize the communication among cluster nodes. Through experimental evaluations we discuss the complexity of the problem under investigation, details of Controlled-Replicate framework and demonstrate that the proposed approaches comfortably outperform naive approaches.

analytics for noisy unstructured text data | 2011

Adapting a WSJ trained part-of-speech tagger to noisy text: preliminary results

Phani Gadde; L. V. Subramaniam; Tanveer A. Faruquie

With the increase in the number of people communicating through internet, there has been a steady increase in the amount of text available online. Most such text is different from the standard language, as people try to use various kinds of short forms for words to save time and effort. We call that noisy text. Part-Of-Speech (POS) tagging has reached high levels of accuracy enabling the use of automatic POS tags in various language processing tasks, however, tagging performance on noisy text degrades very fast. This paper is an attempt to adapt a state-of-the-art English POS tagger, which is trained on the Wall-Street-Journal (WSJ) corpus, to noisy text. We classify the noise in text into different types and evaluate the tagger with respect to each type of noise. The problem of tagging noisy text is attacked in two ways; a) Trying to overcome noise as a post processing step to the tagging b) Cleaning the noise and then doing tagging. We propose techniques to solve the problem in both the ways and critically compare them based on the error analysis. We demonstrate the working of the proposed models on a Short Message Service (SMS) dataset which achieve a significant improvement over the baseline accuracy of tagging noisy words by a state-of-the-art English POS tagger.

european conference on information retrieval | 2013

Discovery and analysis of evolving topical social discussions on unstructured microblogs

Kanika Narang; Seema Nagar; Sameep Mehta; L. V. Subramaniam; Kuntal Dey

Social networks have emerged as hubs of user generated content. Online social conversations can be used to retrieve users interests towards given topics and trends. Microblogging platforms like Twitter are primary examples of social networks with significant volumes of topical message exchanges between users. However, unlike traditional online discussion forums, blogs and social networking sites, explicit discussion threads are absent from microblogging networks like Twitter. This inherent absence of any conversation framework makes it challenging to distinguish conversations from mere topical interests. In this work, we explore semantic, social and temporal relationships of topical clusters formed in Twitter to identify conversations. We devise an algorithm comprising of a sequence of steps such as text clustering, topical similarity detection using TF-IDF and Wordnet, and intersecting social, semantic and temporal graphs to discover social conversations around topics. We further qualitatively show the presence of social localization of discussion threads. Our results suggest that discussion threads evolve significantly over social networks on Twitter. Our algorithm to find social discussion threads can be used for settings such as social information spreading applications and information diffusion analyses on microblog networks.

analytics for noisy unstructured text data | 2011

Experiments with artificially generated noise for cleansing noisy text

Phani Gadde; Rahul Goutam; Rakshit Shah; Hemanth Sagar Bayyarapu; L. V. Subramaniam

Recent works show that the problem of noisy text normalization can be treated as a machine translation (MT) problem with convincing results. There have been supervised MT approaches which use noisy-regular parallel data for training an MT model, as well as unsupervised models which learn the translation probabilities in alternative ways and try to mimic the MT-based approach. While the supervised approaches suffer from data annotation and domain adaptation difficulties, the unsupervised models lack a holistic approach catering to all types of noise. In this paper, we propose an algorithm to artificially generate noisy text in a controlled way, from any regular English text. We see this approach as an alternative to the unsupervised approaches while getting the advantages of a parallel corpus based MT approach. We generate parallel noisy text from two widely used regular English datasets and test the MT-based approach for text normalization. Semi-supervised approaches were also tried to explore different ways of improving the parallel corpus (manually annotated) based MT approach by using the generated noisy text. An extensive analysis based on comparison of our approaches with both the supervised as well as unsupervised approaches is presented.

web information systems engineering | 2013

Topical Discussions on Unstructured Microblogs: Analysis from a Geographical Perspective

Seema Nagar; Kanika Narang; Sameep Mehta; L. V. Subramaniam; Kuntal Dey

Social networks today have emerged as hotbeds of online user conversations. Social microblog sites like Twitter have become favorite portals for users to discuss and express opinions on events and topics. Established event detection techniques on microblog streams today are capable of detecting events early in their lifecycle, amidst the volumes of user message exchanges. Techniques have been proposed in literature to identify topical conversations on microblogging portals comprising of unstructured data with no explicit discussion thread, distinguishing such conversations from isolated expressions of topical interests. However, evolutions of discussion topics have not been studied in a geographical context before. In the current work, we identify and characterize topical discussions at different geographical granularities, such as countries and cities. We observe geographical localization of evolution of topical discussions. Experimental results suggest that these discussion threads tend to evolve more strongly over geographically finer granularities: they evolve more at city levels compared to country levels, and more at country levels compared to globally. Our algorithm to find geographical evolution of discussion sequences and the derived insights can be used for information spread analyses and related applications on microblogging networks.

international conference on service operations and logistics, and informatics | 2013

Edge analytics as service — A service oriented framework for real time and personalised recommendation analytics

Soujanya Soni; Kanika Narang; Tanveer A. Faruquie; Vishal S. Batra; L. V. Subramaniam

Due to the advent of technology and internet over the past few years, significant number of customers have started shopping online and accessing their bank account through various channels like Netbanking, Mobile banking etc. In this paper, we describe Edge Analytics framework which deliver analytics as a service that can be hosted by a financial institute like Bank for delivering personalized offer in real time on thier netbanking portals or on other channels. This edge analytics service can be accessed through edge APIs plugged into netbanking portals. It access the recent transactions in the log to determine offers for the customer and therefore, saving a lot of resources and effort by fetching the data from the main warehouse. Edge analytics server capability has been enhanced by incorprating knowledge such as users intent and interest from their social media profile by identifying their identity on the Online social network. This information is then feed into rule engine to generate customised offers for each user using both enterprise and social information. Edge Analytics service has been hosted on cloud which renders it scalability factor and allows different third party provders to make use of the service easily..

Expert Systems With Applications | 2017

AnnoFin–A hybrid algorithm to annotate financial text

Ananda Swarup Das; Sameep Mehta; L. V. Subramaniam

Abstract In this work, we study the problem of annotating a large volume of Financial text by learning from a small set of human-annotated training data. The training data is prepared by randomly selecting some text sentences from the large corpus of financial text. Conventionally, bootstrapping algorithm is used to annotate large volume of unlabeled data by learning from a small set of annotated data. However, the small set of annotated data have to be carefully chosen as seed data. Thus, our approach is a digress from the conventional approach of bootstrapping as we let the users randomly select the seed data. We show that our proposed algorithm has an accuracy of 73.56% in classifying the financial texts into the different categories (“Accounting”, “Cost”, “Employee”, “Financing”, “Sales”, “Investments”, “Operations”, “Profit”, “Regulations” and “Irrelevant”) even when the training data is just 30% of the total data set. Additionally, the accuracy improves by an approximate average of 2% for an increase of the training data by 10% and the accuracy of our system is 77.91% when the training data is about 50% of the total data set. As a dictionary of hand chosen keywords prepared by domain experts are often used for financial text extraction, we assumed the existence of almost linearly separable hyperplanes between the different classes and therefore, we have used Linear Support Vector Machine along with a modified version of Label Propagation Algorithm which exploits the notion of neighborhood (in Euclidean space) for classification. We believe that our proposed techniques will be of help to Early Warning Systems used in banks where large volumes of unstructured texts need to be processed for better insights about a company.

european conference on artificial intelligence | 2016

Leveraging Stratification in Twitter Sampling

Vikas Joshi; Deepak S. Padmanabhan; L. V. Subramaniam

With Tweet volumes reaching 500 million a day, sampling is inevitable for any application using Twitter data. Realizing this, data providers such as Twitter, Gnip and Boardreader license sampled data streams priced in accordance with the sample size. Big Data applications working with sampled data would be interested in working with a large enough sample that is representative of the universal dataset. Previous work focusing on the representativeness issue has considered ensuring that global occurrence rates of key terms, be reliably estimated from the sample. Present technology allows sample size estimation in accordance with probabilistic bounds on occurrence rates for the case of uniform random sampling. In this paper, we consider the problem of further improving sample size estimates by leveraging stratification in Twitter data. We analyze our estimates through an extensive study using simulations and real-world data, establishing the superiority of our method over uniform random sampling. Our work provides the technical knowhow for data providers to expand their portfolio to include stratified sampled datasets, whereas applications are benefited by being able to monitor more topics/events at the same data and computing cost.

empirical methods in natural language processing | 2010