Mohak Shah | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohak Shah is active.

Explore More

Publication

Featured researches published by Mohak Shah.

international conference on big data | 2016

An architecture for the deployment of statistical models for the big data era

Juergen Heit; Jiayi Liu; Mohak Shah

Statistical models are commonly fit to bulk datasets, and they are applied in quasi real-time to previously unseen data. Challenges lie not only in fitting these models to data, but also in keeping track of their development and deployment process. It is common practice to re-engineer data pre-processing functions that were created during model development in order to build a version for deployment that works on streams of data. This approach is error-prone and inefficient. In this paper, we present our Model Deployment and Execution Framework (MDEF), to tackle these challenges in response to the volume, velocity, and variety of big data.

international conference on big data | 2015

ADMM based scalable machine learning on Spark

Sauptik Dhar; Congrui Yi; Naveen Ramakrishnan; Mohak Shah

Most machine learning algorithms involve solving a convex optimization problem. Traditional in-memory convex optimization solvers do not scale well with the increase in data. This paper identifies a generic convex problem for most machine learning algorithms and solves it using the Alternating Direction Method of Multipliers (ADMM). Finally such an ADMM problem transforms to an iterative system of linear equations, which can be easily solved at scale in a distributed fashion. We implement this framework in Apache Spark and compare it with the widely used Machine Learning LIBrary (MLLIB) in Apache Spark 1.3.

Archive | 2015

Performance Evaluation in Machine Learning

Nathalie Japkowicz; Mohak Shah

Performance evaluation is an important aspect of the machine learning process. However, it is a complex task. It, therefore, needs to be conducted carefully in order for the application of machine learning to radiation oncology or other domains to be reliable. This chapter introduces the issue and discusses some of the most commonly used techniques that have been applied to it. The focus is on the three main subtasks of evaluation: measuring performance, resampling the data, and assessing the statistical significance of the results. In the context of the first subtask, the chapter discusses some of the confusion matrix-based measures (accuracy, precision, recall or sensitivity, and false alarm rate) as well as receiver operating characteristic (ROC) analysis; several error estimation or resampling techniques belonging to the cross-validation family as well as bootstrapping are involved in the context of the second subtask. Finally, a number of nonparametric statistical tests including McNemar’s test, Wilcoxon’s signed-rank test, and Friedman’s test are covered in the context of the third subtask. The chapter concludes with a discussion of the limitations of the evaluation process.

international conference on systems for energy efficient built environments | 2017

A clustering-based rule-mining approach for monitoring long-term energy use and understanding system behavior

Seyed Hamid Mirebrahim; Mohammad Shokoohi-Yekta; Unmesh Kurup; Torsten Welfonder; Mohak Shah

We describe a data mining approach to discover possible explanations for long-term energy consumption patterns in commercial and residential buildings. Our approach uses clustering to identify interesting patterns in energy data and correlates these patterns to other sensor information. These correlations, written in the form of rules, provide potential explanations for the patterns. Our approach is different from existing approaches in a number of ways: First, we apply these techniques to producing explanatory rules in long-term energy usage for large datasets. Second, we use clustering to find interesting patterns and provide explanatory rules about these patterns by applying rule mining on a dataset made up of secondary information (including temporal ranges and other building sensors) that include these cluster ids. Finally, we include in our analysis the list of rules that are exclusive to each cluster. We show that our approach for finding the rules is capable of finding useful explanatory rules for a real dataset.

international conference on acoustics, speech, and signal processing | 2017

Deep learning on symbolic representations for large-scale heterogeneous time-series event prediction

Shengdong Zhang; Soheil Bahrampour; Naveen Ramakrishnan; Lukas Schott; Mohak Shah

In this paper, we consider the problem of event prediction with multi-variate time series data consisting of heterogeneous (continuous and categorical) variables. The complex dependencies between the variables combined with asynchronicity and sparsity of the data makes the event prediction problem particularly challenging. Most state-of-art approaches address this either by designing hand-engineered features or breaking up the problem over homogeneous variates. In this work, we formulate the (rare) event prediction task as a classification problem with a novel asymmetric loss function and propose an end-to-end deep learning algorithm over symbolic representations of time-series. Symbolic representations are fed into an embedding layer and a Long Short Term Memory Neural Network (LSTM) layer which are trained to learn discriminative features. We also propose a simple sequence chopping technique to speed-up the training of LSTM for long temporal sequences. Experiments on real-world industrial datasets demonstrate the effectiveness of the proposed approach.

machine learning and data mining in pattern recognition | 2016

Can Software Project Maturity Be Accurately Predicted Using Internal Source Code Metrics

Mark Grechanik; Nitin Prabhu; Daniel Graham; Denys Poshyvanyk; Mohak Shah

Predicting a level of maturity (LoM) of a software project is important for multiple reasons including planning resource allocation, evaluating the cost, and suggesting delivery dates for software applications. It is not clear how well LoM can be actually predicted – mixed results are reported that are based on studying small numbers of subject software applications and internal software metrics. Thus, a fundamental problem and question of software engineering is if LoM can be accurately predicted using internal software metrics alone?

arXiv: Learning | 2015