Ani Thakar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ani Thakar is active.

Explore More

Publication

Featured researches published by Ani Thakar.

international conference on management of data | 2002

The SDSS skyserver: public access to the sloan digital sky server data

Alexander S. Szalay; Jim Gray; Ani Thakar; Peter Z. Kunszt; Tanu Malik; Jordan Raddick; Christopher Stoughton; Jan Vandenberg

The SkyServer provides Internet access to the public Sloan Digital Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and architecture. It also describes our experience operating the SkyServer on the Internet. The SDSS data is public and well-documented so it makes a good test platform for research on database algorithms and performance.

Astronomical Telescopes and Instrumentation | 2002

Online scientific data curation, publication, and archiving

Jim Gray; Alexander S. Szalay; Ani Thakar; Christopher Stoughton; Jan Vandenberg

Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that published scientific data needs to be available forever -- this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation.

hawaii international conference on system sciences | 2009

GrayWulf: Scalable Clustered Architecture for Data Intensive Computing

Alexander S. Szalay; Gordon Bell; Jan Vandenberg; Alainna Wonders; Randal C. Burns; Dan Fay; J. N. Heasley; Tony Hey; Maria A. Nieto-santisteban; Ani Thakar; Richard Wilton

Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commodity component cluster designed for a range of data intensive computations operating on petascale data sets named GrayWulf. The design goal is a balanced system in terms of IO performance and memory size, according to Amdahls Laws. The hardware currently installed at JHU exceeds one petabyte of storage and has 0.5 bytes/sec of I/O and 1 byte of memory for each CPU cycle. The GrayWulf provides almost an order of magnitude better balance than existing systems. The paper covers its architecture and reference applications. The software design is presented in a companion paper.

Computing in Science and Engineering | 2008

CasJobs and MyDB: A Batch Query Workbench

Nolan Li; Ani Thakar

Catalog archive server jobs (CasJobs) is an asynchronous query workbench service that lets users run unrestricted SQL queries against scientific catalog archives. After running queries in batch mode, users can save their results to a personal database called MyDB before downloading them, letting users manage their query workloads, results, and histories without causing network overloads.

high performance distributed computing | 2010

Migrating a (large) science database to the cloud

Ani Thakar; Alexander S. Szalay

We report on attempts to put an existing scientific (astronomical) database -- the Sloan Digital Sky Survey (SDSS) science archive [1] - in the cloud. Based on our experience, it is either very frustrating or impossible at this time to migrate an existing, complex SQL Server database into current cloud service offerings such as Amazon (EC2) and Microsoft (SQL Azure). Certainly it is impossible to migrate a large database in excess of a TB, but even with (much) smaller databases, the limitations of cloud services make it very difficult to migrate the data to the cloud without making changes to the schema and settings (for example, inability to migrate a spatial indexing library, and several other user-defined functions and stored procedures) that would invalidate performance comparisons between cloud and on-premise versions. So it is not surprising that our preliminary performance comparisons show a very large (an order of magnitude) performance discrepancy with the Amazon cloud version of the SDSS database. We have also not yet investigated the performance tweaks that could be possible within the cloud. Although we managed to successfully migrate (a subset of) the SDSS catalog database to Amazon EC2, we were not able to access the database in a meaningful way from the outside world. Even though this was advertised as a public dataset on the AWS blog, it was not clear how other users or the public would be able to access this data in a meaningful way, if at all. These difficulties suggest that much work and coordination needs to occur between cloud service providers and their potential database clients before science databases can successfully and effectively be deployed in the cloud. This is true not just for large scientific databases but all databases that make extensive use of advanced database management system (DBMS) features for performance and user convenience.

Computing in Science and Engineering | 2008

The Catalog Archive Server Database Management System

Ani Thakar; Alexander S. Szalay; George Fekete; Jim Gray

The multiterabyte Sloan Digital Sky Surveys (SDSSs) catalog data is stored in a commercial relational database management system with SQL query access and a built-in query optimizer. The SDSS catalog archive server adds advanced data mining features to the DBMS to provide fast online access to the data.

international conference on e science | 2006

Distributing the Sloan Digital Sky Survey Using UDT and Sector

Yunhong Gu; Robert L. Grossman; Alexander S. Szalay; Ani Thakar

In this paper, we describe a peer-to-peer storage system called Sector that is designed to access and transport large data sets over wide area high performance networks. We also describe our recent experience using Sector to distribute the Sloan Digital Sky Survey BESTDR4 catalog data.

Proceedings of SPIE | 2006

Designing a Multi-Petabyte Database for LSST

Jacek Becla; Andrew Hanushevsky; Sergei Nikolaev; Ghaleb Abdulla; Alexander S. Szalay; Maria A. Nieto-santisteban; Ani Thakar; Jim Gray

The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are being evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database technologies against LSST requirements, and the proposed database architecture to meet the data challenges.

Computing in Science and Engineering | 2003

Migrating a multiterabyte archive from object to relational databases

Ani Thakar; Alexander S. Szalay; Peter Z. Kunszt; Jim Gray

A commercial, object-oriented database engine with custom tools for data-mining the multiterabyte Sloan Digital Sky Survey archive did not meet its performance objectives. We describe the problems, technical issues, and process of migrating this large data set project, to relational database technology.

Computing in Science and Engineering | 2008

The Sloan Digital Sky Survey: Drinking from the Fire Hose

Ani Thakar

The Sloan Digital Sky Survey science archive represents a thousand-fold increase in the total amount of data that astronomers have collected to date. The pioneering instrumentation technology that made this possible is matched by groundbreaking tools that let anyone in the world access terabytes of SDSS data online.

Explore More