Sumita Barahmand | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sumita Barahmand is active.

Explore More

Publication

Featured researches published by Sumita Barahmand.

conference on information and knowledge management | 2013

A comparison of two physical data designs for interactive social networking actions

Sumita Barahmand; Shahram Ghandeharizadeh; Jason Yap

This paper compares the performance of an SQL solution that implements a relational data model with a document store named MongoDB. We report on the performance of a single node configuration of each data store and assume the database is small enough to fit in main memory. We analyze utilization of the CPU cores and the network bandwidth to compare the two data stores. Our key findings are as follows. First, for those social networking actions that read and write a small amount of data, the join operator of the SQL solution is not slower than the JSON representation of MongoDB. Second, with a mix of actions, the SQL solution provides either the same performance as MongoDB or outperforms it by 20%. Third, a middle-tier cache enhances the performance of both data stores as query result look up is significantly faster than query processing with either system.

international workshop on testing database systems | 2013

D-Zipfian: a decentralized implementation of Zipfian

Sumita Barahmand; Shahram Ghandeharizadeh

Zipfian distribution is used extensively to generate workloads to test, tune, and benchmark data stores. This paper presents a decentralized implementation of this technique, named D-Zipfian, using N parallel generators to issue requests. A request is a reference to a data item from a fixed population of data items. The challenge is for each generator to reference a disjoint set of data items. Moreover, they should finish at approximately the same time by performing work proportional to their processing capability. Intuitively, D-Zipfian assigns a total probability of 1/N to each of the N generators and requires each generator to reference data items with a scaled probability. In the case of heterogeneous generators, the total probability of each generator is proportional to its processing capability. We demonstrate the effectiveness of D-Zipfian using empirical measurements of the chi-square statistic.

modeling, analysis, and simulation on computer and telecommunication systems | 2014

Benchmarking Correctness of Operations in Big Data Applications

Sumita Barahmand; Shahram Ghandeharizadeh

With a wide variety of big data applications, the past few years have witnessed an increasing number of data stores with novel design decisions that may sacrifice the correctness of an applications operations to enhance performance. This paper presents our work-in-progress on a framework that generates a validation component. The input to the framework is the characteristics of an application. Its output is a validation module that plugs-in to either an application or a benchmark to measure the amount of unpredictable data produced by a data store.

conference on information and knowledge management | 2013

Expedited rating of data stores using agile data loading techniques

Sumita Barahmand; Shahram Ghandeharizadeh

To benchmark and rate a data store, one must repeat experiments that impose a different amount of load on the data store. Workloads that modify the benchmark database may require the same database to be loaded repeatedly. This may constitute a significant portion of the time to rate a data store. This paper presents several agile data loading techniques to expedite the rating process. These techniques include generating the disk image of the database once and re-using it, restoring the updated data items to their original value, maintaining in-memory state of the database across different experiments to avoid repeated loading of the database all together, and a hybrid of the third technique in combination with the other two. These techniques are general purpose and apply to a variety of cloud benchmarks. We investigate their implementation and evaluation in the context of one, the BG benchmark. Obtained results show a factor of two to twelve speedup in the rating process. As an example, when evaluating MongoDB with a million member BG database, we show these techniques expedite BGs rating from 4 months (123 days) of continuous running to less than 11 days for the first rating experiment. Subsequent ratings of MongoDB with different workloads using the same database is much faster, in the order of hours.

Workshop on Big Data Benchmarks | 2013

A Mid-Flight Synopsis of the BG Social Networking Benchmark

Shahram Ghandeharizadeh; Sumita Barahmand

BG is a benchmark that rates the performance of a data store for processing interactive social networking actions such as view a member’s profile, invite a member to be friends, accept a friend request, and others. It is motivated by a proliferation of data stores from a variety of academic and industrial contributors including social networking companies, e.g., Voldemort by LinkedIn. BG is designed to provide a system architect with insights into alternative design principles such as the use of a weak consistency technique instead of a strong one, different physical data models such as relational and JSON, factors that impact vertical and horizontal scalability of a data store, the consistency versus availability tradeoff in the CAP theorem, among others. While BG is a recently introduced benchmark (less than a year old as of this writing), it combines elements of maturer benchmarks and extends them to simplify its use by the practitioners and experimentalists. This paper provides a synopsis of the BG benchmark by identifying its strengths and limitations in our daily use cases. The identified limitations shape our research activities and the obtained solutions shall be incorporated into future BG releases. Thus, this workshop paper is a mid-flight glimpse into our current research efforts with BG.

Technology Conference on Performance Evaluation and Benchmarking | 2014

An Evaluation of Alternative Physical Graph Data Designs for Processing Interactive Social Networking Actions

Shahram Ghandeharizadeh; Reihane Boghrati; Sumita Barahmand

This study quantifies the tradeoff associated with alternative physical representations of a social graph for processing interactive social networking actions. We conduct this evaluation using a graph data store named Neo4j deployed in a client-server (REST) architecture using the BG benchmark. In addition to the average response time of a design, we quantify its SoAR defined as the highest observed throughput given the following service level agreement: 95 % of actions to observe a response time of 100 ms or faster. For an action such as computing the shortest distance between two members, we observe a tradeoff between speed and accuracy of the computed result. With this action, a relational data design provides a significantly faster response time than a graph design. The graph designs provide a higher SoAR than a relational one when the social graph includes large member profile images stored in the data store.

Proceedings of the 3rd ACM SIGMM international workshop on Social media | 2011

Chase display of social live streams (SOLISs)

Sumita Barahmand; Shahram Ghandeharizadeh

Advances in networking, processing, and mass storage devices have enabled social live streams (SOLISs) and their chase display. This paper focuses on public SOLISs and presents the user interface of RAYS and its system architecture. In addition, we present several novel memory management techniques that produce summary data to minimize the likelihood of cache misses. One technique, named Data-Aware CLRU (DA-CLRU), stands out for both enhancing the cache hit rate and utility of data. This technique is parallelizable and ideal for multi-core CPUs.

Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV - Volume 9620 | 2015

On Expedited Rating of Data Stores

Sumita Barahmand; Shahram Ghandeharizadeh

To rate a data store is to compute a value that describes the performance of the data store with a database and a workload. A common performance metric of interest is the highest throughput provided by the data store given a pre-specified service level agreement such as 95i??% of requests observing a response time faster than 100i??ms. This is termed the action rating of the data store. This paper presents a framework consisting of two search techniques with slightly different characteristics to compute the action rating. With both, to expedite the rating process, the framework employs agile data loading techniques and strategies that reduce the duration of conducted experiments. We show these techniques enhance the rating of a data store by one to two orders of magnitude. The rating framework and its optimization techniques are implemented using a social networking benchmark named BG.

Proceedings of the 2010 ACM workshop on Advanced video streaming techniques for peer-to-peer networks and social networking | 2010

Three highly available data streaming techniques and their tradeoffs

Sumita Barahmand; Shahram Ghandeharizadeh; Anurag Ojha; Jason Yap

The Recall All Your Senses (RAYS) project envisions a social networking system that empowers its users to store, retrieve, and share data produced by streaming devices. An example device is the popular Apple iPhone that produces continuous media, audio and video clips. This paper focuses on the stream manager of RAYS, RAYS-SM, and its peer-to-peer overlay network. For a request that streams data from a device, RAYS-SM initiates more than one stream in order to minimize loss of data when nodes in its network fail. We present the design of 3 data availability techniques, quantifying their throughput and Mean Time To Data Loss (MTTDL). These two metrics highlight the tradeoff between the resource usage of each technique during normal mode of operation in order to minimize loss of data in the presence of node failures.

Future Generation Computer Systems | 2018

BG: A scalable benchmark for interactive social networking actions

Yazeed Alabdulkarim; Sumita Barahmand; Shahram Ghandeharizadeh

Abstract BG is a benchmark that rates a data store for processing interactive social networking actions such as view a member profile and extend a friend invitation to a member. It elevates the amount of stale, inconsistent, and erroneous (termed unpredictable) data produced by a data store to a first class metric, quantifying it as a part of the benchmarking phase. It summarizes the performance of a data store in one metric, Social Action Rating (SoAR). SoAR is defined as the highest throughput provided by a data store while satisfying a pre-specified service level agreement, SLA. To rate the fastest data stores, BG scales both vertically and horizontally, generating a higher number of requests per second as a function of additional CPU cores and nodes. This is realized using a shared-nothing architecture in combination with two multi-node execution paradigms named Integrated DataBase (IDB) and Disjoint DataBase (D2B). An evaluation of these paradigms shows the following tradeoffs. While the D2B scales superlinearly as a function of nodes, it may not evaluate data stores that employ client-side caching objectively. IDB provides two alternative execution paradigms, Retain and Delegate, that might be more appropriate. However, they fail to scale as effectively as D2B. We show elements of these two paradigms can be combined to realize a hybrid framework that scales almost as effectively as D2B while exercising the capabilities of certain classes of data stores as objectively as IDB.

Explore More