Is this you? Create Your Porfile

Anindya Datta

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anindya Datta is active.

Explore More

Publication

Featured researches published by Anindya Datta.

Management Science | 2014

Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures

Yang Bao; Anindya Datta

Managers and researchers alike have long recognized the importance of corporate textual risk disclosures. Yet it is a nontrivial task to discover and quantify variables of interest from unstructured text. In this paper, we develop a variation of the latent Dirichlet allocation topic model and its learning algorithm for simultaneously discovering and quantifying risk types from textual risk disclosures. We conduct comprehensive evaluations in terms of both conventional statistical fit and substantive fit with respect to the quality of discovered information. Experimental results show that our proposed method outperforms all competing methods, and could find more meaningful topics risk types. By taking advantage of our proposed method for measuring risk types from textual data, we study how risk disclosures in 10-K forms affect the risk perceptions of investors. Different from prior studies, our results provide support for all three competing arguments regarding whether and how risk disclosures affect the risk perceptions of investors, depending on the specific risk types disclosed. We find that around two-thirds of risk types lack informativeness and have no significant influence. Moreover, we find that the informative risk types do not necessarily increase the risk perceptions of investors---the disclosure of three types of systematic and liquidity risks will increase the risk perceptions of investors, whereas the other five types of unsystematic risks will decrease them. Data, as supplemental material, are available at http://dx.doi.org/10.1287/mnsc.2014.1930 . This paper was accepted by Alok Gupta, special issue on business analytics.

Management Information Systems Quarterly | 2012

A cost-based database request distribution technique for online e-commerce applications

Debra E. VanderMeer; Kaushik Dutta; Anindya Datta

E-commerce is growing to represent an increasing share of overall sales revenue, and online sales are expected to continue growing for the foreseeable future. This growth translates into increased activity on the supporting infrastructure, leading to a corresponding need to scale the infrastructure. This is difficult in an era of shrinking budgets and increasing functional requirements. Increasingly, IT managers are turning to virtualized cloud providers, drawn by the pay-for-use business model. As cloud computing becomes more popular, it is important for data center managers to accomplish more with fewer dollars (i.e., to increase the utilization of existing resources). Advanced request distribution techniques can help ensure both high utilization and smart request distribution, where requests are sent to the service resources best able to handle them. While such request distribution techniques have been applied to the web and application layers of the traditional online application architecture, request distribution techniques for the data layer have focused primarily on online transaction processing scenarios. However, online applications often have a significant read-intensive workload, where read operations constitute a significant percentage of workloads (up to 95 percent or higher). In this paper, we propose a cost-based database request distribution (C-DBRD) strategy, a policy to distribute requests, across a cluster of commercial, off-the-shelf databases, and discuss its implementation. We first develop the intuition behind our approach, and describe a high-level architecture for database request distribution. We then develop a theoretical model for database load computation, which we use to design a method for database request distribution and build a software implementation. Finally, following a design science methodology, we evaluate our artifacts through experimental evaluation. Our experiments, in the lab and in production-scale systems, show significant improvement of database layer resource utilization, demonstrating up to a 45 percent improvement over existing request distribution techniques.

acm transactions on management information systems | 2013

Fast, Scalable, and Context-Sensitive Detection of Trending Topics in Microblog Post Streams

Nargis Pervin; Fang Fang; Anindya Datta; Kaushik Dutta; Debra E. VanderMeer

Social networks, such as Twitter, can quickly and broadly disseminate news and memes across both real-world events and cultural trends. Such networks are often the best sources of up-to-the-minute information, and are therefore of considerable commercial and consumer interest. The trending topics that appear first on these networks represent an answer to the age-old query “what are people talking about?” Given the incredible volume of posts (on the order of 45,000 or more per minute), and the vast number of stories about which users are posting at any given time, it is a formidable problem to extract trending stories in real time. In this article, we describe a method and implementation for extracting trending topics from a high-velocity real-time stream of microblog posts. We describe our approach and implementation, and a set of experimental results that show that our system can accurately find “hot” stories from high-rate Twitter-scale text streams.

conference on information and knowledge management | 2013

A partially supervised cross-collection topic model for cross-domain text classification

Yang Bao; Nigel Collier; Anindya Datta

Cross-domain text classification aims to automatically train a precise text classifier for a target domain by using labelled text data from a related source domain. To this end, one of the most promising ideas is to induce a new feature representation so that the distributional difference between domains can be reduced and a more accurate classifier can be learned in this new feature space. However, most existing methods do not explore the duality of the marginal distribution of examples and the conditional distribution of class labels given labeled training examples in the source domain. Besides, few previous works attempt to explicitly distinguish the domain-independent and domain-specific latent features and align the domain-specific features to further improve the cross-domain learning. In this paper, we propose a model called Partially Supervised Cross-Collection LDA topic model (PSCCLDA) for cross-domain learning with the purpose of addressing these two issues in a unified way. Experimental results on nine datasets show that our model outperforms two standard classifiers and four state-of-the-art methods, which demonstrates the effectiveness of our proposed model.

mobile computing applications and services | 2011

Mobilewalla: A Mobile Application Search Engine

Anindya Datta; Kaushik Dutta; Sangar Kajanan; Nargin Pervin

With the popularity of mobile apps on mobile devices based on iOS, Android, Blackberry and Windows Phone operating systems, the number of mobile apps in each of the respective native app stores are increasing in leaps and bounds. Currently there are almost 700,000 mobile apps across these four major native app stores. Due to such enormous number of apps, both the constituents in the app ecosytem, consumers and app developers, face problems in terms of ‘app discovery’. For consumers, it is a daunting task to discover the apps they like and need among the huge number of available apps. Likewise, for developers, making it possible for users to discover their apps in the large number of available apps is a challenge. To address these issues, Mobilewalla(MW), provides an independent unbiased search engine for mobile apps with semantic search capabilities. It has also developed an objective scoring mechanism based on user and developer involvement with an app. The scoring mechanism enables MW to provide a number of other ways to discover apps - such as dynamically maintained ‘hot’ lists and ‘fast rising’ lists. In this paper, we describe the challenges of developing the MW platform and how these challenges have been mitigated. Lastly, we demonstrate some of the key functionalities of MW.

asia information retrieval symposium | 2013

Serendipitous Recommendation for Mobile Apps Using Item-Item Similarity Graph

Upasna Bhandari; Kazunari Sugiyama; Anindya Datta; Rajni Jindal

Recommender systems can provide users with relevant items based on each user’s preferences. However, in the domain of mobile applications (apps), existing recommender systems merely recommend apps that users have experienced (rated, commented, or downloaded) since this type of information indicates each user’s preference for the apps. Unfortunately, this prunes the apps which are releavnt but are not featured in the recommendation lists since users have never experienced them. Motivated by this phenomenon, our work proposes a method for recommending serendipitous apps using graph-based techniques. Our approach can recommend apps even if users do not specify their preferences. In addition, our approach can discover apps that are highly diverse. Experimental results show that our approach can recommend highly novel apps and reduce over-personalization in a recommendation list.

IEEE Transactions on Knowledge and Data Engineering | 2013

Building a Scalable Database-Driven Reverse Dictionary

Ryan Shaw; Anindya Datta; Debra E. VanderMeer; Kaushik Dutta

In this paper, we describe the design and implementation of a reverse dictionary. Unlike a traditional forward dictionary, which maps from words to their definitions, a reverse dictionary takes a user input phrase describing the desired concept, and returns a set of candidate words that satisfy the input phrase. This work has significant application not only for the general public, particularly those who work closely with words, but also in the general field of conceptual search. We present a set of algorithms and the results of a set of experiments showing the retrieval accuracy of our methods and the runtime response time performance of our implementation. Our experimental results show that our approach can provide significant improvements in performance scale without sacrificing the quality of the result. Our experiments comparing the quality of our approach to that of currently available reverse dictionaries show that of our approach can provide significantly higher quality over either of the other currently available implementations.

Mobile Networks and Applications | 2013

A Mobile App Search Engine

Anindya Datta; Sangaralingam Kajanan; Nargis Pervin

With the popularity of mobile apps on mobile devices based on iOS, Android, Blackberry and Windows Phone operating systems, the numbers of mobile apps in each of the respective native app stores are increasing in leaps and bounds. Currently there are close to one million mobile apps across these four major native app stores. Due to the enormous number of apps, both the constituents in the app ecosytem, consumers and app developers, face problems in ‘app discovery’. For consumers, it is a daunting task to discover the apps they like and need among the huge number of available apps. Likewise, for developers, enabling their apps to be discovered is a challenge. To address these issues, Mobilewalla (MW) an app search engine provides an independent unbiased search for mobile apps with semantic search capabilities. It has also developed an objective scoring mechanism based on user and developer involvement with an app. The scoring mechanism enables MW to provide a number of other ways to discover apps—such as dynamically maintained ‘hot’ lists and ‘fast rising’ lists. In this paper, we describe the challenges of developing the MW platform and how these challenges have been mitigated. Lastly, we demonstrate some of the key functionalities of MW.

international conference on social computing | 2015

Hashtag Popularity on Twitter: Analyzing Co-occurrence of Multiple Hashtags

Nargis Pervin; Tuan Quang Phan; Anindya Datta; Hideaki Takeda; Fujio Toriumi

Hashtags increase the reachability of a tweet to manifolds and consequently, has the potential to create a wider market for brands. The frequent use of a hashtag features it in the Twitter trending list. In this study we want to understand what contributes to the popularity of a hashtag. Further, hashtags generally come in groups in a tweet. In fact, an investigation on a real world dataset of Great Eastern Japan Earthquake reveals that 50 % of hashtags appear in a tweet with at least another hashtag. How this co-occurrence of hashtags affects its popularity is also not addressed heretofore, which is the focus herein. Results indicate that if a hashtag appears with one or more other similar hashtags, popularity of the hashtag increases. In contrast, if a hashtag appears with dissimilar hashtags, popularity of the focal hashtag decreases. The results reverse when dissimilar hashtags come along with a URL.

Information Systems Research | 2012

SOA Performance Enhancement Through XML Fragment Caching

Anindya Datta; Kaushik Dutta; Qianhui Liang; Debra E. VanderMeer

Organizations are increasingly choosing to implement service-oriented architectures to integrate distributed, loosely coupled applications. These architectures are implemented as services, which typically use XML-based messaging to communicate between service consumers and service providers across enterprise networks. We propose a scheme for caching fragments of service response messages to improve performance and service quality in service-oriented architectures. In our fragment caching scheme, we decompose responses into smaller fragments such that reusable components can be identified and cached in the XML routers of an XML overlay network within an enterprise network. Such caching mitigates processing requirements on providers and moves content closer to users, thus reducing bandwidth requirements on the network as well as improving service times. We describe the system architecture and caching algorithm details for our caching scheme, develop an analysis of the expected benefits of our scheme, and present the results of both simulation and case study-based experiments to show the validity and performance improvements provided by our caching scheme. Our simulation experimental results show an up to 60% reduction in bandwidth consumption and up to 50% response time improvement. Further, our case study experiments demonstrate that when there is no resource bottleneck, the cache-enabled case reduces average response times by 40%--50% and increases throughput by 150% compared to the no-cache and full message caching cases. In experiments contrasting fragment caching and full message caching, we found that full message caching provides benefits when the number of possible unique responses is low while the benefits of fragment caching increase as the number of possible unique responses increases. These experimental results clearly demonstrate the benefits of our approach.

Explore More