Niloy Mukherjee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Niloy Mukherjee is active.

Explore More

Publication

Featured researches published by Niloy Mukherjee.

international conference on data engineering | 2015

Oracle Database In-Memory: A dual format in-memory database

Tirthankar Lahiri; Shasank Chavan; Maria Colgan; Dinesh Das; Amit Ganesh; Michael J. Gleeson; Sanket Hase; Allison L. Holloway; Jesse Kamp; Teck-Hua Lee; Juan R. Loaiza; Neil Macnaughton; Vineet Marwah; Niloy Mukherjee; Atrayee Mullick; Sujatha Muthulingam; Vivekanandhan Raja; Marty Roth; Ekrem Soylemez; Mohamed Zait

The Oracle Database In-Memory Option allows Oracle to function as the industry-first dual-format in-memory database. Row formats are ideal for OLTP workloads which typically use indexes to limit their data access to a small set of rows, while column formats are better suited for Analytic operations which typically examine a small number of columns from a large number of rows. Since no single data format is ideal for all types of workloads, our approach was to allow data to be simultaneously maintained in both formats with strict transactional consistency between them.

international conference on management of data | 2011

Oracle database filesystem

Krishna Kunchithapadam; Wei Zhang; Amit Ganesh; Niloy Mukherjee

Modern enterprise, web, and multimedia applications are generating unstructured content at unforeseen volumes in the form of documents, texts, and media files. Such content is generally associated with relational data such as user names, location tags, and timestamps. Storage of unstructured content in a relational database would guarantee the same robustness, transactional consistency, data integrity, data recoverability and other data management features consolidated across files and relational contents. Although database systems are preferred for relational data management, poor performance of unstructured data storage, limited data transformation functionalities, and lack of interfaces based on filesystem standards may keep more than eighty five percent of non-relational unstructured content out of databases in the coming decades.n We introduce Oracle Database Filesystem (DBFS) as a consolidated solution that unifies state-of-the-art network filesystem features with relational database management ones. DBFS is a novel shared-storage network filesystem developed in the RDBMS kernel that allows content management applications to transparently store and organize files using standard filesystem interfaces, in the same database that stores associated relational content. The server component of DBFS is based on Oracle SecureFiles, a novel unstructured data storage engine within the RDBMS that provides filesystem like or better storage performance for files within the database while fully leveraging relational data management features such as transaction atomicity, isolation, read consistency, temporality, and information lifecycle management. n We present a preliminary performance evaluation of DBFS that demonstrates more than 10TB/hr throughput of filesystem read and write operations consistently over a period of 12 hours on an Oracle Exadata Database cluster of four server nodes. In terms of file storage, such extreme performance is equivalent to ingestion of more than 2500 million 100KB document files a single day. The set of initial results look very promising for DBFS towards becoming the universal storage solution for both relational and unstructured content.

very large data bases | 2015

Distributed architecture of Oracle database in-memory

Niloy Mukherjee; Shasank Chavan; Maria Colgan; Dinesh Das; Michael J. Gleeson; Sanket Hase; Allison L. Holloway; Hui Jin; Jesse Kamp; Kartik Kulkarni; Tirthankar Lahiri; Juan R. Loaiza; Neil Macnaughton; Vineet Marwah; Atrayee Mullick; Andy Witkowski; Jiaqi Yan; Mohamed Zait

Over the last few years, the information technology industry has witnessed revolutions in multiple dimensions. Increasing ubiquitous sources of data have posed two connected challenges to data management solutions -- processing unprecedented volumes of data, and providing ad-hoc real-time analysis in mainstream production data stores without compromising regular transactional workload performance. In parallel, computer hardware systems are scaling out elastically, scaling up in the number of processors and cores, and increasing main memory capacity extensively. The data processing challenges combined with the rapid advancement of hardware systems has necessitated the evolution of a new breed of main-memory databases optimized for mixed OLTAP environments and designed to scale. n nThe Oracle RDBMS In-memory Option (DBIM) is an industry-first distributed dual format architecture that allows a database object to be stored in columnar format in main memory highly optimized to break performance barriers in analytic query workloads, simultaneously maintaining transactional consistency with the corresponding OLTP optimized row-major format persisted in storage and accessed through database buffer cache. In this paper, we present the distributed, highly-available, and fault-tolerant architecture of the Oracle DBIM that enables the RDBMS to transparently scale out in a database cluster, both in terms of memory capacity and query processing throughput. We believe that the architecture is unique among all mainstream in-memory databases. It allows complete application-transparent, extremely scalable and automated distribution of Oracle RDBMS objects in-memory across a cluster, as well as across multiple NUMA nodes within a single server. It seamlessly provides distribution awareness to the Oracle SQL execution framework through affinitized fault-tolerant parallel execution within and across servers without explicit optimizer plan changes or query rewrites.

very large data bases | 2008

Oracle SecureFiles System

Niloy Mukherjee; Bharath Aleti; Amit Ganesh; Krishna Kunchithapadam; Scott Lynn; Sujatha Muthulingam; Kam Shergill; Shaoyu Wang; Wei Zhang

Over the last decade, the nature of content stored on computer storage systems has evolved from being relational to being semi-structured, i.e., unstructured data accompanied by relational metadata. Average data volumes have increased from a few hundred megabytes to hundreds of terabytes. Simultaneously, data feed rates have also increased with increase in processor, storage and network bandwidths. Data growth trends seem to be following Moores law and thereby imply an exponential explosion in content volumes and rates in the years to come. The near future poses requirements for data management systems to provide solutions that provide unlimited scalability in execution, availability, recoverability and storage usage of semi-structured content. n nTraditionally, filesystems have been preferred over database management systems for providing storage solutions for unstructured data, while databases have been the preferred choice to manage relational data. Lack of consolidated semi-structured content management architecture compromises security, availability, recoverability, and manageability among other features. We introduce a system without compromises, the Oracle SecureFiles System, designed to provide highly scalable storage and access execution of unstructured and structured content as first-class objects within the Oracle relational database management system. Oracle SecureFiles breaks the performance barrier that has kept such content out of databases. The architecture provides capability to maximize utilization of storage usage through compression and de-duplication and achieves robustness by preserving transactional atomicity, durability, availability, read-consistent query-ability and security of the database management system.

very large data bases | 2015

Query optimization in Oracle 12c database in-memory

Dinesh Das; Jiaqi Yan; Mohamed Zait; Satyanarayana R. Valluri; Nirav Vyas; Ramarajan Krishnamachari; Prashant Gaharwar; Jesse Kamp; Niloy Mukherjee

Traditional on-disk row major tables have been the dominant storage mechanism in relational databases for decades. Over the last decade, however, with explosive growth in data volume and demand for faster analytics, has come the recognition that a different data representation is needed. There is widespread agreement that in-memory column-oriented databases are best suited to meet the realities of this new world. n nOracle 12c Database In-memory, the industrys first dual-format database, allows existing row major on-disk tables to have complementary in-memory columnar representations. The new storage format brings new data processing techniques and query execution algorithms and thus new challenges for the query optimizer. Execution plans that are optimal for one format may be sub-optimal for the other. n nIn this paper, we describe the changes made in the query optimizer to generate execution plans optimized for the specific format -- row major or columnar -- that will be scanned during query execution. With enhancements in several areas -- statistics, cost model, query transformation, access path and join optimization, parallelism, and cluster-awareness -- the query optimizer plays a significant role in unlocking the full promise and performance of Oracle Database In-Memory.

very large data bases | 2016

Operational analytics data management systems

Alexander Böhm; Jens Dittrich; Niloy Mukherjee; Ippokrantis Pandis; Rajkumar Sen

Prior to mid-2000s, the space of data analytics was mainly confined within the area of decision support systems. It was a long era of isolated enterprise data ware houses curating information from live data sources and of business intelligence software used to query such information. Most data sets were small enough in volume and static enough invelocity to be segregated in warehouses for analysis. Data analysis was not ad-hoc; it required pre-requisite knowledge of underlying data access patterns for the creation of specialized access methods (e.g. covering indexes, materialized views) in order to efficiently execute a set of few focused queries.

very large data bases | 2016

Accelerating analytics with dynamic in-memory expressions

Aurosish Mishra; Shasank Chavan; Allison L. Holloway; Tirthankar Lahiri; Zhen Hua Liu; Sunil Chakkappen; Dennis Lui; Vinita Subramanian; Ramesh Kumar; Maria Colgan; Jesse Kamp; Niloy Mukherjee; Vineet Marwah

Oracle Database In-Memory (DBIM) accelerates analytic workload performance by orders of magnitude through an in-memory columnar format utilizing techniques such as SIMD vector processing, in-memory storage indexes, and optimized predicate evaluation and aggregation. With Oracle Database 12.2, Database In-Memory is further enhanced to accelerate analytic processing through a novel lightweight mechanism known as Dynamic In-Memory Expressions (DIMEs). The DIME mechanism automatically detects frequently occurring expressions in a query workload, and then creates highly optimized, transactionally consistent, in-memory columnar representations of these expression results. At runtime, queries can directly access these DIMEs, thus avoiding costly expression evaluations. Furthermore, all the optimizations introduced in DBIM can apply directly to DIMEs. Since DIMEs are purely in-memory structures, no changes are required to the underlying tables. We show that DIMEs can reduce query elapsed times by several orders of magnitude without the need for costly pre-computed structures such as computed columns or materialized views or cubes.

international conference on data engineering | 2016

Fault-tolerant real-time analytics with distributed Oracle Database In-memory

Niloy Mukherjee; Shasank Chavan; Maria Colgan; Michael J. Gleeson; Xiaoming He; Allison L. Holloway; Jesse Kamp; Kartik Kulkarni; Tirthankar Lahiri; Juan R. Loaiza; Neil Macnaughton; Atrayee Mullick; Sujatha Muthulingam; Vivekanandhan Raja; Raunak Rungta

Modern data management systems are required to address new breeds of OLTAP applications. These applications demand real time analytical insights over massive data volumes not only on dedicated data warehouses but also on live mainstream production environments where data gets continuously ingested and modified. Oracle introduced the Database In-memory Option (DBIM) in 2014 as a unique dual row and column format architecture aimed to address the emerging space of mixed OLTAP applications along with traditional OLAP workloads. The architecture allows both the row format and the column format to be maintained simultaneously with strict transactional consistency. While the row format is persisted in underlying storage, the column format is maintained purely in-memory without incurring additional logging overheads in OLTP. Maintenance of columnar data purely in memory creates the need for distributed data management architectures. Performance of analytics incurs severe regressions in single server architectures during server failures as it takes non-trivial time to recover and rebuild terabytes of in-memory columnar format. A distributed and distribution aware architecture therefore becomes necessary to provide real time high availability of the columnar format for glitch-free in-memory analytic query execution across server failures and additions, besides providing scale out of capacity and compute to address real time throughput requirements over large volumes of in-memory data. In this paper, we will present the high availability aspects of the distributed architecture of Oracle DBIM that includes extremely scaled out application transparent column format duplication mechanism, distributed query execution on duplicated in-memory columnar format, and several scenarios of fault tolerant analytic query execution across the in-memory column format at various stages of redistribution of columnar data during cluster topology changes.

very large data bases | 2009

Oracle SecureFiles: prepared for the digital deluge

Niloy Mukherjee; Amit Ganesh; Vinayagam Djegaradjane; Sujatha Muthulingam; Wei Zhang; Krishna Kunchithapadam; Scott Lynn; Bharath Aleti; Kam Shergill; Shaoyu Wang

Digital unstructured data volumes across enterprise, Internet and multimedia applications are predicted to surpass 6.023x1023 (Avogadros number) bits a year in the next fifteen years. This poses tremendous scalability challenges for data management solutions in the coming decades. Filesystems seem to be preferred by data management application designers for providing storage solutions for such unstructured data volumes. n nOracle SecureFiles is emerging as the database solution to break the performance barrier that has kept unstructured content out of database management systems and to provide advanced filesystem functionality, while letting applications fully leverage the strengths of the RDBMS from transactions to partitioning to rollforward recovery. A set of preliminary performance results was presented at the 34th International Conference on Very Large Data Bases (VLDB 2008). It was claimed that SecureFiles would scale maximally as physical storage systems scale up. We legitimize our claims on SecureFiles scalability through this paper, presenting the scalability aspects of SecureFiles through a performance evaluation of I/O bound filesystem like operations on one of the latest high performance cluster of servers and storage. n nWe are presenting benchmark results that we believe represent a world record database insertion rate for any published result - at over 4.4GB/S using a cluster of seven servers. For 100 byte rows, that represents an insertion rate of 45 billion records a second in relational terms. In terms of unstructured data storage, the scale represents an insertion rate of more than 3.7 million 100 MB high-resolution multimedia videos a day.

international conference on software engineering | 2015

How does Oracle Database In-Memory scale out?

Niloy Mukherjee; Kartik Kulkarni; Hui Jin; Jesse Kamp; Tirthankar Lahiri

The Oracle RDBMS In-memory Option (DBIM), introduced in 2014, is an industry-first distributed dual format in-memory RDBMS that allows a database object to be stored in columnar format purely in-memory, simultaneously maintaining transactional consistency with the corresponding row-major format persisted in storage and accessed through in-memory database buffer cache. The in-memory columnar format is highly optimized to break performance barriers in analytic query workloads while the row format is most suitable for OLTP workloads. In this paper, we present the distributed architecture of the Oracle Database In-memory Option that enables the in-memory RDBMS to transparently scale out across a set of Oracle database server instances in an Oracle RAC cluster, both in terms of memory capacity and query processing throughput. The architecture allows complete application-transparent, extremely scalable and automated in-memory distribution of Oracle RDBMS objects across multiple instances in a cluster. It seamlessly provides distribution awareness to the Oracle SQL execution framework, ensuring completely local memory scans through affinitized fault-tolerant parallel execution within and across servers without explicit optimizer plan changes or query rewrites.

Explore More