Aravind Mohan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aravind Mohan is active.

Explore More

Publication

Featured researches published by Aravind Mohan.

international acm sigir conference on research and development in information retrieval | 2015

Parametric and Non-parametric User-aware Sentiment Topic Models

Zaihan Yang; Alexander Kotov; Aravind Mohan; Shiyong Lu

The popularity of Web 2.0 has resulted in a large number of publicly available online consumer reviews created by a demographically diverse user base. Information about the authors of these reviews, such as age, gender and location, provided by many on-line consumer review platforms may allow companies to better understand the preferences of different market segments and improve their product design, manufacturing processes and marketing campaigns accordingly. However, previous work in sentiment analysis has largely ignored these additional user meta-data. To address this deficiency, in this paper, we propose parametric and non-parametric User-aware Sentiment Topic Models (USTM) that incorporate demographic information of review authors into topic modeling process in order to discover associations between market segments, topical aspects and sentiments. Qualitative examination of the topics discovered using USTM framework in the two datasets collected from popular online consumer review platforms as well as quantitative evaluation of the methods utilizing those topics for the tasks of review sentiment classification and user attribute prediction both indicate the utility of accounting for demographic information of review authors in opinion mining.

international conference on big data | 2015

BDAP: A Big Data Placement Strategy for Cloud-Based Scientific Workflows

Mahdi Ebrahimi; Aravind Mohan; Andrey Kashlev; Shiyong Lu

In this new era of Big Data, there is a growing need to enable scientific workflows to perform computations at a scale far exceeding a single workstations capabilities. When running such data intensive workflows in the cloud distributed across several physical locations, the execution time and the resource utilization efficiency highly depends on the initial placement and distribution of the input datasets across these multiple virtual machines in the Cloud. In this paper, we propose BDAP (Big DAta Placement strategy), a strategy that improves workflow performance by minimizing data movement across multiple virtual machines. In this work, we 1) formalize the data placement problem in scientific workflows, 2) propose a data placement algorithm that considers both initial input dataset and intermediate datasets obtained during workflow run, and 3) perform extensive experiments in the distributed environment to verify that our proposed strategy provides an effective data placement solution to distribute and place big datasets at the appropriate virtual machines in the Cloud within reasonable time.

ieee international conference on services computing | 2014

Addressing the Shimming Problem in Big Data Scientific Workflows

Aravind Mohan; Shiyong Lu; Alexander Kotov

Substantial amount of research has been done recently to address the shimming problem in scientific workflows, in which a special kind of adaptors, called shims, are inserted between workflow tasks to resolve the data type incompatibility issue. Recently, scientific workflows are increasingly used for big data analysis and processing, which poses additional challenges, such as volume, velocity and variety of data to the shimming problem. One issue is to scale the registration and configuration procedure to a large number of workflow tasks. Another issue is the ease of integrating a large number of remote Web services and other heterogeneous task components that can consume and produce data in various formats and models into a uniform and interoperable workflow. Existing approaches fall short in usability and scalability in addressing these issues. In this paper we 1) propose a new simplified single-component based task model based on extensive experiences and lessons learned from our original multiple-component based task model. The new model separates registration from configuration and eases the process of registering external functional components (such as Web services) into p-workflows, 2) propose a shim generation algorithm that elegantly solves the shimming problem raised by Web service based scientific workflows, and 3) we integrate MongoDB, a NoSQL document-oriented database system for storing and managing large-scale unstructured documents. A new version of the DATAVIEW system has been developed to support the proposed techniques and a case study has been conducted to show the feasibility and usability of our proposed techniques.

ieee international conference on services computing | 2012

A User-Defined Exception Handling Framework in the VIEW Scientific Workflow Management System

Dong Ruan; Shiyong Lu; Aravind Mohan; Xubo Fei; Jia Zhang

With the advances of e-Science, scientific workflow has become an important tool for researchers to explore scientific discoveries. Although several scientific workflow management systems (SWFMSs) have been developed, their support of exception handling is still limited. In this paper, we introduce our approach of exception handling in the VIEW scientific workflow management system. We propose an exception handling language for scientific workflows based on our workflow model. Both syntax and semantics rules of our language are presented. Different exception handling primitives, such as retry, alternative, and repeat, are supported in our language with flexibility for their composition to provide a sophisticated and flexible exception handling mechanism. Moreover, two exception handling algorithms and the architecture design for exception handling in VIEW are also presented.

international conference on big data | 2015

TPS: A task placement strategy for big data workflows

Mahdi Ebrahimi; Aravind Mohan; Shiyong Lu; Robert G. Reynolds

Workflow makespan is the total execution time for running a workflow in the Cloud. The workflow makespan significantly depends on how the workflow tasks and datasets are allocated and placed in a distributed computing environment such as Clouds. Incorporating data and task allocation strategies to minimize makespan delivers significant benefits to scientific users in receiving their results in time. The main goal of a task placement algorithm is to minimize the total amount of data movement between virtual machines during the execution of the workflows. In this paper, we do the following: 1) formalize the task placement problem in big data workflows; 2) propose a task placement strategy (TPS) that considers both initial input datasets and intermediate datasets to calculate the dependency between workflow tasks; and 3) perform extensive experiments in the distributed environment to demonstrate that the proposed strategy provides an effective task distribution and placement tool.

international congress on big data | 2016

A NoSQL Data Model for Scalable Big Data Workflow Execution

Aravind Mohan; Mahdi Ebrahimi; Shiyong Lu; Alexander Kotov

While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure, 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets, 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community.

european conference on information retrieval | 2016

Feedback or Research: Separating Pre-purchase from Post-purchase Consumer Reviews

Mehedi Hasan; Alexander Kotov; Aravind Mohan; Shiyong Lu; Paul M. Stieg

Consumer reviews provide a wealth of information about products and services that, if properly identified and extracted, could be of immense value to businesses. While classification of reviews according to sentiment polarity has been extensively studied in previous work, more focused types of review analysis are needed to assist companies in making business decisions. In this work, we introduce a novel text classification problem of separating post-purchase from pre-purchase review fragments that can facilitate identification of immediate actionable insights based on the feedback from the customers, who actually purchased and own a product. To address this problem, we propose the features, which are based on the dictionaries and part-of-speech (POS) tags. Experimental results on the publicly available gold standard indicate that the proposed features allow to achieve nearly 75 % accuracy for this problem and improve the performance of classifiers relative to using only lexical features.

international conference on big data | 2016

Scheduling big data workflows in the cloud under budget constraints

Aravind Mohan; Mahdi Ebrahimi; Shiyong Lu; Alexander Kotov

Big data is fast becoming a ubiquitous term in both academia and industry and there is a strong need for new data-centric workflow tools and techniques to process and analyze large-scale complex datasets that are growing exponentially. On the other hand, the unbound resource leasing capability foreseen in the cloud facilitates data scientists to wring actionable insights from the data in a time and cost efficient manner. In the data-centric workflow environment, scheduling data processing tasks onto appropriate resources are often driven by the constraints provided by the users. Enforcing a constraint while executing the workflow in the cloud adds a new optimization challenge on how to meet the objective while satisfying the given constraint. In this paper, we propose a new Big dAta woRkflow schEduler uNder budgeT constraint known as BARENTS that supports high-performance workflow scheduling in a heterogeneous cloud computing environment with a single objective to minimize the workflow makespan under a provided budget constraint. Our case study and experiments show the competitive advantages of our proposed scheduler. The proposed BARENTS scheduler is implemented in a new release of DATA VIEW, one of the most usable big data workflow systems in the community.

ieee international conference on services computing | 2015