Anwitaman Datta | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anwitaman Datta is active.

Explore More

Publication

Featured researches published by Anwitaman Datta.

international conference on computer communications | 2011

Self-repairing homomorphic codes for distributed storage systems

Frédérique E. Oggier; Anwitaman Datta

Erasure codes provide a storage efficient alternative to replication based redundancy in (networked) storage systems. They however entail high communication overhead for maintenance, when some of the encoded fragments are lost and need to be replenished. Such overheads arise from the fundamental need to recreate (or keep separately) first a copy of the whole object before any individual encoded fragment can be generated and replenished. There has recently been intense interest to explore alternatives, most prominent ones being regenerating codes (RGC) and hierarchical codes (HC). We propose as an alternative a new family of codes to improve the maintenance process, called self-repairing codes (SRC), with the following salient features: (a) encoded fragments can be repaired directly from other subsets of encoded fragments by downloading less data than the size of the complete object, ensuring that (b) a fragment is repaired from a fixed number of encoded fragments, the number depending only on how many encoded blocks are missing and independent of which specific blocks are missing. These properties allow for not only low communication overhead to recreate a missing fragment, but also independent reconstruction of different missing fragments in parallel, possibly in different parts of the network. The fundamental difference between SRCs and HCs is that different encoded fragments in HCs do not have symmetric roles (equal importance). Consequently the number of fragments required to replenish a specific fragment in HCs depends on which specific fragments are missing, and not solely on how many. Likewise, object reconstruction may need different number of fragments depending on which fragments are missing. RGCs apply network coding over (n, k) erasure codes, and provide network information flow based limits on the minimal maintenance overheads. RGCs need to communicate with at least k other nodes to recreate any fragment, and the minimal overhead is achieved if only one fragment is missing, and information is downloaded from all the other n−1 nodes. We analyze the static resilience of SRCs with respect to erasure codes, and observe that SRCs incur marginally larger storage overhead in order to achieve the aforementioned properties. The salient SRC properties naturally translate to low communication overheads for reconstruction of lost fragments, and allow reconstruction with lower latency by facilitating repairs in parallel. These desirable properties make SRC a practical candidate for networked distributed storage systems.

conference on information and knowledge management | 2012

Twevent: segment-based event detection from tweets

Chenliang Li; Aixin Sun; Anwitaman Datta

Event detection from tweets is an important task to understand the current events/topics attracting a large number of common users. However, the unique characteristics of tweets (e.g. short and noisy content, diverse and fast changing topics, and large data volume) make event detection a challenging task. Most existing techniques proposed for well written documents (e.g. news articles) cannot be directly adopted. In this paper, we propose a segment-based event detection system for tweets, called Twevent. Twevent first detects bursty tweet segments as event segments and then clusters the event segments into events considering both their frequency distribution and content similarity. More specifically, each tweet is split into non-overlapping segments (i.e. phrases possibly refer to named entities or semantically meaningful information units). The bursty segments are identified within a fixed time window based on their frequency patterns, and each bursty segment is described by the set of tweets containing the segment published within that time window. The similarity between a pair of bursty segments is computed using their associated tweets. After clustering bursty segments into candidate events, Wikipedia is exploited to identify the realistic events and to derive the most newsworthy segments to describe the identified events. We evaluate Twevent and compare it with the state-of-the-art method using 4.3 million tweets published by Singapore-based users in June 2010. In our experiments, Twevent outperforms the state-of-the-art method by a large margin in terms of both precision and recall. More importantly, the events detected by Twevent can be easily interpreted with little background knowledge because of the newsworthy segments. We also show that Twevent is efficient and scalable, leading to a desirable solution for event detection from tweets.

wireless on demand network systems and service | 2009

A case for P2P infrastructure for social networks - opportunities & challenges

Sonja Buchegger; Anwitaman Datta

Online Social Networks like Facebook, MySpace, Xing, etc. have become extremely popular. Yet they have some limitations that we want to overcome for a next generation of social networks: privacy concerns and requirements of Internet connectivity, both of which are due to web-based applications on a central site whose owner has access to all data. To overcome these limitations, we envision a paradigm shift from client-server to a peer-to-peer infrastructure coupled with encryption so that users keep control of their data and can use the social network also locally, without Internet access. This shift gives rise to many research questions intersecting networking, security, distributed systems and social network analysis, leading to a better understanding of how technology can support social interactions. This paper is an attempt to identify the core functionalities necessary to build social networking applications and services, and the research challenges in realizing them in a decentralized setting. In the tradition of research-path defining papers in the peer-to-peer community [5, 14], we highlight some challenges and opportunities for peer-to-peer in the era of social networks. We also present our own approach at realizing peer-to-peer social networks.

Handbook of Social Network Technologies | 2010

Decentralized Online Social Networks

Anwitaman Datta; Sonja Buchegger; Le Hung Vu; Thorsten Strufe; Krzysztof Rzadca

Current Online social networks (OSN) are web services run on logically centralized infrastructure. Large OSN sites use content distribution networks and thus distribute some of the load by caching for performance reasons, nevertheless there is a central repository for user and application data. This centralized nature of OSNs has several drawbacks including scalability, privacy, dependence on a provider, need for being online for every transaction, and a lack of locality. There have thus been several efforts toward decentralizing OSNs while retaining the functionalities offered by centralized OSNs. A decentralized online social network (DOSN) is a distributed system for social networking with no or limited dependency on any dedicated central infrastructure. In this chapter we explore the various motivations of a decentralized approach to online social networking, discuss several concrete proposals and types of DOSN as well as challenges and opportunities associated with decentralization.

IEEE Transactions on Knowledge and Data Engineering | 2004

Efficient, self-contained handling of identity in peer-to-peer systems

Karl Aberer; Anwitaman Datta; Manfred Hauswirth

Identification is an essential building block for many services in distributed information systems. The quality and purpose of identification may differ, but the basic underlying problem is always to bind a set of attributes to an identifier in a unique and deterministic way. Name/directory services, such as DNS, X.500, or UDDI, are a well-established concept to address this problem in distributed information systems. However, none of these services addresses the specific requirements of peer-to-peer systems with respect to dynamism, decentralization, and maintenance. We propose the implementation of directories using a structured peer-to-peer overlay network and apply this approach to support self-contained maintenance of routing tables with dynamic IP addresses in structured P2P systems. Thus, we keep routing tables intact without affecting the organization of the overlay networks, making it logically independent of the underlying network infrastructure. Even though the directory is self-referential, since it uses its own service to maintain itself, we show that it is robust due to a self-healing capability. For security, we apply a combination of PGP-like public key distribution and a quorum-based query scheme. We describe the algorithm as implemented in the P-Grid P2P lookup system (http:// www.p-grid.org/) and give a detailed analysis and simulation results demonstrating the efficiency and robustness of our approach.

IEEE Transactions on Information Forensics and Security | 2016

Privacy-Preserving-Outsourced Association Rule Mining on Vertically Partitioned Databases

Lichun Li; Rongxing Lu; Kim-Kwang Raymond Choo; Anwitaman Datta; Jun Shao

Association rule mining and frequent itemset mining are two popular and widely studied data analysis techniques for a range of applications. In this paper, we focus on privacy-preserving mining on vertically partitioned databases. In such a scenario, data owners wish to learn the association rules or frequent itemsets from a collective data set and disclose as little information about their (sensitive) raw data as possible to other data owners and third parties. To ensure data privacy, we design an efficient homomorphic encryption scheme and a secure comparison scheme. We then propose a cloud-aided frequent itemset mining solution, which is used to build an association rule mining solution. Our solutions are designed for outsourced databases that allow multiple data owners to efficiently share their data securely without compromising on data privacy. Our solutions leak less information about the raw data than most existing solutions. In comparison to the only known solution achieving a similar privacy level as our proposed solutions, the performance of our proposed solutions is three to five orders of magnitude higher. Based on our experiment findings using different parameters and data sets, we demonstrate that the run time in each of our solutions is only one order higher than that in the best non-privacy-preserving data mining algorithms. Since both data and computing work are outsourced to the cloud servers, the resource consumption at the data owner end is very low.

international conference on peer-to-peer computing | 2011

An empirical study of availability in friend-to-friend storage systems

Rajesh Sharma; Anwitaman Datta; Matteo DeH'Amico; Pietro Michiardi

Friend-to-friend networks, i.e. peer-to-peer networks where data are exchanged and stored solely through nodes owned by trusted users, can guarantee dependability, privacy and uncensorability by exploiting social trust. However, the limitation of storing data only on friends can come to the detriment of data availability: if no friends are online, then data stored in the system will not be accessible. In this work, we explore the tradeoffs between redundancy (i.e., how many copies of data are stored on friends), data placement (the choice of which friend nodes to store data on) and data availability (the probability of finding data online). We show that the problem of obtaining maximal availability while minimizing redundancy is NP-complete; in addition, we perform an exploratory study on data placement strategies, and we investigate their performance in terms of redundancy needed and availability obtained. By performing a trace-based evaluation, we show that nodes with as few as 10 friends can already obtain good availability levels.

information theory workshop | 2011

Self-Repairing Codes for distributed storage — A projective geometric construction

Frédérique E. Oggier; Anwitaman Datta

Self-Repairing Codes (SRC) are codes designed to suit the need of coding for distributed networked storage: they not only allow stored data to be recovered even in the presence of node failures, they also provide a repair mechanism where as little as two live nodes can be contacted to regenerate the data of a failed node. In this paper, we propose a new instance of self-repairing codes, based on constructions of spreads coming from projective geometry. We study some of their properties to demonstrate the suitability of these codes for distributed networked storage.

Foundations and Trends in Communications and Information Theory | 2013

Coding Techniques for Repairability in Networked Distributed Storage Systems

Frédérique E. Oggier; Anwitaman Datta

The most commonly deployed multi-storage device systems are RAID housed in a single computing unit. The idea of distributing data across multiple disks has been naturally extended to multiple storage nodes which are interconnected over a network and are called Networked Distributed Storage Systems (NDSS). The simplest coding techniques based on replication are often used to ensure redundancy in these systems, but given the sheer volume of data that needs to be stored and the overheads of replication, other coding techniques are being developed. Coding Techniques for Repairability in Networked Distributed Storage Systems (NDSS) surveys coding techniques for NDSS, which aim at achieving (1) fault tolerance efficiently and (2) good repairability characteristics to replenish the lost redundancy, and ensure data durability over time. This is a vibrant are of research and this book is the first overview which presents the background required to understand the problems as well as covering the most important techniques currently being developed. Coding Techniques for Repairability in Networked Distributed Storage Systems is essential reading for all researchers and engineers involved in designing and researching computer storage systems.

international conference of distributed computing and networking | 2011

GoDisco: selective gossip based dissemination of information in social community based overlays

Anwitaman Datta; Rajesh Sharma

We propose and investigate a gossip based, social principles and behavior inspired decentralized mechanism (GoDisco) to disseminate information in online social community networks, using exclusively social links and exploiting semantic context to keep the dissemination process selective to relevant nodes. Such a designed dissemination scheme using gossiping over a egocentric social network is unique and is arguably a concept whose time has arrived, emulating word of mouth behavior and can have interesting applications like probabilistic publish/subscribe, decentralized recommendation and contextual advertisement systems, to name a few. Simulation based experiments show that despite using only local knowledge and contacts, the system has good global coverage and behavior.

Explore More