Mohak Shah
Bosch
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohak Shah.
international conference on big data | 2016
Juergen Heit; Jiayi Liu; Mohak Shah
Statistical models are commonly fit to bulk datasets, and they are applied in quasi real-time to previously unseen data. Challenges lie not only in fitting these models to data, but also in keeping track of their development and deployment process. It is common practice to re-engineer data pre-processing functions that were created during model development in order to build a version for deployment that works on streams of data. This approach is error-prone and inefficient. In this paper, we present our Model Deployment and Execution Framework (MDEF), to tackle these challenges in response to the volume, velocity, and variety of big data.
international conference on big data | 2015
Sauptik Dhar; Congrui Yi; Naveen Ramakrishnan; Mohak Shah
Most machine learning algorithms involve solving a convex optimization problem. Traditional in-memory convex optimization solvers do not scale well with the increase in data. This paper identifies a generic convex problem for most machine learning algorithms and solves it using the Alternating Direction Method of Multipliers (ADMM). Finally such an ADMM problem transforms to an iterative system of linear equations, which can be easily solved at scale in a distributed fashion. We implement this framework in Apache Spark and compare it with the widely used Machine Learning LIBrary (MLLIB) in Apache Spark 1.3.
Archive | 2015
Nathalie Japkowicz; Mohak Shah
Performance evaluation is an important aspect of the machine learning process. However, it is a complex task. It, therefore, needs to be conducted carefully in order for the application of machine learning to radiation oncology or other domains to be reliable. This chapter introduces the issue and discusses some of the most commonly used techniques that have been applied to it. The focus is on the three main subtasks of evaluation: measuring performance, resampling the data, and assessing the statistical significance of the results. In the context of the first subtask, the chapter discusses some of the confusion matrix-based measures (accuracy, precision, recall or sensitivity, and false alarm rate) as well as receiver operating characteristic (ROC) analysis; several error estimation or resampling techniques belonging to the cross-validation family as well as bootstrapping are involved in the context of the second subtask. Finally, a number of nonparametric statistical tests including McNemar’s test, Wilcoxon’s signed-rank test, and Friedman’s test are covered in the context of the third subtask. The chapter concludes with a discussion of the limitations of the evaluation process.
international conference on systems for energy efficient built environments | 2017
Seyed Hamid Mirebrahim; Mohammad Shokoohi-Yekta; Unmesh Kurup; Torsten Welfonder; Mohak Shah
We describe a data mining approach to discover possible explanations for long-term energy consumption patterns in commercial and residential buildings. Our approach uses clustering to identify interesting patterns in energy data and correlates these patterns to other sensor information. These correlations, written in the form of rules, provide potential explanations for the patterns. Our approach is different from existing approaches in a number of ways: First, we apply these techniques to producing explanatory rules in long-term energy usage for large datasets. Second, we use clustering to find interesting patterns and provide explanatory rules about these patterns by applying rule mining on a dataset made up of secondary information (including temporal ranges and other building sensors) that include these cluster ids. Finally, we include in our analysis the list of rules that are exclusive to each cluster. We show that our approach for finding the rules is capable of finding useful explanatory rules for a real dataset.
international conference on acoustics, speech, and signal processing | 2017
Shengdong Zhang; Soheil Bahrampour; Naveen Ramakrishnan; Lukas Schott; Mohak Shah
In this paper, we consider the problem of event prediction with multi-variate time series data consisting of heterogeneous (continuous and categorical) variables. The complex dependencies between the variables combined with asynchronicity and sparsity of the data makes the event prediction problem particularly challenging. Most state-of-art approaches address this either by designing hand-engineered features or breaking up the problem over homogeneous variates. In this work, we formulate the (rare) event prediction task as a classification problem with a novel asymmetric loss function and propose an end-to-end deep learning algorithm over symbolic representations of time-series. Symbolic representations are fed into an embedding layer and a Long Short Term Memory Neural Network (LSTM) layer which are trained to learn discriminative features. We also propose a simple sequence chopping technique to speed-up the training of LSTM for long temporal sequences. Experiments on real-world industrial datasets demonstrate the effectiveness of the proposed approach.
machine learning and data mining in pattern recognition | 2016
Mark Grechanik; Nitin Prabhu; Daniel Graham; Denys Poshyvanyk; Mohak Shah
Predicting a level of maturity (LoM) of a software project is important for multiple reasons including planning resource allocation, evaluating the cost, and suggesting delivery dates for software applications. It is not clear how well LoM can be actually predicted – mixed results are reported that are based on studying small numbers of subject software applications and internal software metrics. Thus, a fundamental problem and question of software engineering is if LoM can be accurately predicted using internal software metrics alone?
arXiv: Learning | 2015
Soheil Bahrampour; Naveen Ramakrishnan; Lukas Schott; Mohak Shah
arXiv: Learning | 2016
Soheil Bahrampour; Naveen Ramakrishnan; Lukas Schott; Mohak Shah
arXiv: Learning | 2016
Sauptik Dhar; Naveen Ramakrishnan; Vladimir Cherkassky; Mohak Shah
arXiv: Learning | 2018
Jiayi Liu; Samarth Tripathi; Unmesh Kurup; Mohak Shah