Is this you? Create Your Porfile

Soumyadeb Mitra

University of Illinois at Urbana–Champaign

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Soumyadeb Mitra is active.

Explore More

Publication

Featured researches published by Soumyadeb Mitra.

international parallel and distributed processing symposium | 2006

Bitmap indexes for large scientific data sets: a case study

Rishi Rakesh Sinha; Soumyadeb Mitra; Marianne Winslett

The data used by todays scientific applications are often very high in dimensionality and staggering in size. These characteristics necessitate the use of a good multidimensional indexing strategy to provide efficient access to the data. Researchers have previously proposed the use of bitmap indexes for high-dimension scientific data as a way of overcoming the drawbacks of traditional multidimensional indexes such as R-trees and KD-trees, which are bulky and whose performance does not scale well as the number of dimensions increases. However, the techniques proposed in previous work on bitmap indexes are not sufficient to address all problems that arise in practice. In experiments with real datasets, we experienced problems with index size and query performance. To overcome these shortcomings, we propose the use of adaptive, multilevel, multi-resolution bitmap indexes, and evaluate their performance in two scientific domains. Our preliminary experiments with a parallel query processor and index creator also show that it is very easy to parallelize a bitmap index

international conference on management of data | 2008

Query-based partitioning of documents and indexes for information lifecycle management

Soumyadeb Mitra; Marianne Winslett; Windsor W. Hsu

Regulations require businesses to archive many electronic documents for extended periods of time. Given the sheer volume of documents and the response time requirements, documents that are unlikely to ever be accessed should be stored on an inexpensive device (such as tape), while documents that are likely to be accessed should be placed on a more expensive, higher-performance device. Unfortunately, traditional data partitioning techniques either require substantial manual involvement, or are not suitable for read-rarely workloads. In this paper, we present a novel technique to address this problem. We estimate the future access likelihood for a document based on past workloads of keyword queries and the click-through behavior for top-K query answers, then use this information to drive partitioning decisions. Our overall best scheme, the document-split inverted index, does not require any parameter tuning and yet performs close to the optimal partitioning strategy. Experiments show that document-split partitioning improves performance on a large intranet query workload by a factor of 4 when we add a fast storage server that holds 20% of the data.

workshop on storage security and survivability | 2006

Secure deletion from inverted indexes on compliance storage

Soumyadeb Mitra; Marianne Winslett

Recent litigation and intense regulatory focus on secure retention of electronic records have spurred a rush to introduce Write-Once-Read-Many (WORM) storage devices for retaining business records such as electronic mail. A file committed to a WORM device cannot be deleted even by a super-user and hence is secure from attacks originating from company insiders. Secure retention, however, is only a part of a documents lifecycle: It is often crucial to delete documents after its mandatory retention period is over. Since most of the modern WORM devices are built on top of magnetic media, they also support a secure deletion operation by associating expiration time with files. However, for the deleted document to be truly unrecoverable, it must also be deleted from any index structure built over it.This paper studies the problem of securely deleting entries from an inverted index. We first formalize the concept of secure deletion by defining two deletion semantics: strongly and weakly secure deletions. We then analyze some of the deletion schemes that have been proposed in literature and show that they only achieve weakly secure deletion. Furthermore, such schemes have poor space efficiency and/or are inflexibe. We then propose a novel technique for hiding index entries for deleted documents, based on the concept of ambiguating deleted entries. The proposed technique also achieves weakly secure deletion, but is more space efficient and flexible.

extending database technology | 2008

Deleting index entries from compliance storage

Soumyadeb Mitra; Marianne Winslett; Nikita Borisov

In response to regulatory focus on secure retention of electronic records, businesses are using magnetic disks configured as write-once read-many (WORM) compliance storage devices to store business documents such as electronic mail for their mandated retention periods. A document committed to a compliance storage device cannot be altered or deleted even by a superuser until its retention period is over, and hence is secure from attacks originating from company insiders. Secure retention, however, is only a part of a documents lifecycle: it is often crucial to properly delete documents once their retention period ends. It is relatively simple to delete a document, but much harder to remove its index entries from WORM. Yet if these entries are not obliterated, the contents of the deleted document can often be reconstructed. In this paper, we formally define secure deletion of document entries from an inverted index on compliance storage. We show that previously proposed deletion schemes for compliance storage index entries do not meet the objectives of secure deletion. On the other hand, the naive approach to secure deletion results in very poor query performance. To provide secure deletion of index entries without compromising lookup efficiency, we propose a novel indexing technique that employs noise terms, merged posting lists, and deletion epochs. Experiments with real-life data show that lookups in our scheme are 5 times faster than the naive approach.

international conference on data engineering | 2009

An Architecture for Regulatory Compliant Database Management

Soumyadeb Mitra; Marianne Winslett; Richard T. Snodgrass; Shashank Yaduvanshi; Sumedh Ambokar

Spurred by financial scandals and privacy concerns, governments worldwide have moved to ensure confidence in digital records by regulating their retention and deletion. These requirements have led to a huge market for compliance storage servers, which ensure that data are not shredded or altered before the end of their mandatory retention period. These servers preserve unstructured and semi-structured data at a file-level granularity: email, spreadsheets, reports, instant messages. In this paper, we extend this level of protection to structured data residing in relational databases. We propose a compliant DBMS architecture and two refinements that illustrate the additional security that one can gain with only a slight performance penalty, with almost no modifications to the DBMS kernel. We evaluate our proposed architecture through experiments with TPC-C on a high-performance DBMS, and show that the runtime overhead for transaction processing is approximately 10\% in typical configurations.

very large data bases | 2008

Trustworthy keyword search for compliance storage

Soumyadeb Mitra; Marianne Winslett; Windsor Wee Sun Hsu; Kevin Chen Chuan Chang

Intense regulatory focus on secure retention of electronic records has led to a need to ensure that records are trustworthy, i.e., able to provide irrefutable proof and accurate details of past events. In this paper, we analyze the requirements for a trustworthy index to support keyword-based search queries. We argue that trustworthy index entries must be durable—the index must be updated when new documents arrive, and not periodically deleted and rebuilt. To this end, we propose a scheme for efficiently updating an inverted index, based on judicious merging of the posting lists of terms. Through extensive simulations and experiments with two real world data sets and workloads, we demonstrate that the scheme achieves online update speed while maintaining good query performance. We also present and evaluate jump indexes, a novel trustworthy and efficient index for join operations on posting lists for multi-keyword queries. Jump indexes support insert, lookup and range queries in time logarithmic in the number of indexed documents.

Handbook of Database Security | 2008

Trustworthy Records Retention

Ragib Hasan; Marianne Winslett; Soumyadeb Mitra; Windsor W. Hsu; Radu Sion

Trustworthy retention of electronic records has become a necessity to ensure compliance with laws and regulations in business and the public sector. Among other features, these directives foster accountability by requiring organizations to secure the entire life cycle of their records, so that records are created, kept accessible for an appropriate period of time, and deleted, without tampering or interference from organizational insiders or outsiders. In this chapter, we discuss existing techniques for trustworthy records retention and explore the open problems in the area.

international conference on cluster computing | 2005

An Efficient, Nonintrusive, Log-Based I/O Mechanism for Scientific Simulations on Clusters

Soumyadeb Mitra; Rishi Rakesh Sinha; Marianne Winslett

Scientific simulations are often very I/O intensive, requiring high I/O bandwidth to store the data generated by the simulation. Traditional supercomputers have specialized I/O systems with multiple I/O nodes and specialized interconnects to handle such high I/O loads. However, with the increased availability of inexpensive clusters of workstations, more and more simulations are now run on clusters. Unfortunately, cluster supercomputers are usually not very well equipped for I/O, making I/O a serious bottleneck for such applications. To address this problem, we propose log-based I/O (LBIO), an approach that can substantially increase the I/O performance of simulations on clusters by utilizing free space on the clusters local disks to stage data on its way to remote storage. LBIO uses local disks to create a log of all I/O calls, and uses a background thread to replay the log at the rate that best utilizes the server and network resources. LBIO is implemented as an easy-to-use, non-intrusive library - a user can turn on LBIO by adding a single initialization call to the simulation code. LBIO also works with existing scientific I/O libraries like HDF, as well as collective libraries like ROMIO. Our performance studies on microbenchmarks and a real-world scientific simulation code show that LBIO can provide upto 35% improvement in I/O performance for raw I/O and over 50% for I/O through libraries like ROMIO or HDF

field programmable gate arrays | 2005

SMPS: an FPGA-based prototyping environment for multiprocessor embedded systems (abstract only)

Ankit Mathur; Mayank Agarwal; Soumyadeb Mitra; Anup Gangwar; M. Balakrishnan; Subhashis Banerjee

Streaming media applications represent an important class of applications for embedded systems. Recent advances in design-space exploration of architectures for such applications have pointed towards the suitability of Multiprocessor System on Chip (SoC) solutions. Multiprocessor SoCs not only offer higher performance, but can also lead to solutions which are cheaper cost wise. A typical synthesis methodology for such architectures would require a validation stage at the end of final system integration. The wide availability of cheap and large FPGA devices, advances in automatic synthesis from VHDL/Verilog and abundance of high performance computing platforms enables the design of a generic validation system for such Multiprocessor SoCs.In this paper we present the design and implementation of Srijan Multiprocessor Prototyping System (SMPS). SMPS is a system for rapid prototyping and validation of single chip application specific multiprocessor systems. The individual computing elements are RISC processors, coprocessors which lie in the processor pipeline, and ASICs which connect directly to system bus. The system is a tightly coupled multiprocessor with shared memory and shared address space. A Real-time Operating System (RTOS) provides task scheduling and access to shared resources. The system is presented as a parameterized VHDL based on the open source Sparc~V8 compliant LEON processor and a homegrown light-weight RTOS, RtKer-MP. The entire VHDL is configurable using a GUI, has support for cache coherency, choice of arbitration policy and easy integration of custom processing engines. RtKer-MP allows for a pluggable scheduler, dynamic and static scheduling policies, static and dynamic task migrations domains and variable interruption frequencies for separate processors. The pluggable scheduler interface allows for quick exploration of various scheduling policies for a feedback to the estimation systems.

extending database technology | 2008