Varun Mithal
University of Minnesota
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Varun Mithal.
international conference on data mining | 2008
Varun Chandola; Varun Mithal; Vipin Kumar
We present a comparative evaluation of a large number of anomaly detection techniques on a variety of publicly available as well as artificially generated data sets. Many of these are existing techniques while some are slight variants and/or adaptations of traditional anomaly detection techniques to sequence data.
ACM Transactions on Intelligent Systems and Technology | 2011
Varun Mithal; Ashish Garg; Shyam Boriah; Michael Steinbach; Vipin Kumar; Christopher Potter; Steven A. Klooster; Juan Carlos Castilla-Rubio
Forests are a critical component of the planets ecosystem. Unfortunately, there has been significant degradation in forest cover over recent decades as a result of logging, conversion to crop, plantation, and pasture land, or disasters (natural or man made) such as forest fires, floods, and hurricanes. As a result, significant attention is being given to the sustainable use of forests. A key to effective forest management is quantifiable knowledge about changes in forest cover. This requires identification and characterization of changes and the discovery of the relationship between these changes and natural and anthropogenic variables. In this article, we present our preliminary efforts and achievements in addressing some of these tasks along with the challenges and opportunities that need to be addressed in the future. At a higher level, our goal is to provide an overview of the exciting opportunities and challenges in developing and applying data mining approaches to provide critical information for forest and land use management.
conference on intelligent data understanding | 2012
James H. Faghmous; Luke Styles; Varun Mithal; Shyam Boriah; Stefan Liess; Vipin Kumar; Frode Vikebø; Michel D. S. Mesquita
Rotating coherent structures of water known as ocean eddies are the oceanic analog of storms in the atmosphere and a crucial component of ocean dynamics. In addition to dominating the oceans kinetic energy, eddies play a significant role in the transport of water, salt, heat, and nutrients. Therefore, understanding current and future eddy activity is a central challenge to address future sustainability of marine ecosystems. The emergence of sea surface height observations from satellite radar altimeter has recently enabled researchers to track eddies at a global scale. The majority of studies that identify eddies from observational data employ highly parametrized connected component algorithms using expert filtered data, effectively making reproducibility and scalability challenging. In this paper, we improve upon the state-of-the-art connected component eddy monitoring algorithms to track eddies globally. This work makes three main contributions: first, we do not pre-process the data therefore minimizing the risk of wiping out important signals within the data. Second, we employ a physically-consistent convexity requirement on eddies based on theoretical and empirical studies to improve the accuracy and computational complexity of our method from quadratic to linear time in the size of each eddy. Finally, we accurately separate eddies that are in close spatial proximity, something existing methods cannot accomplish. We compare our results to those of the state of the art and discuss the impact of our improvements on the difference in results.
Computational Sustainability | 2016
Ankush Khandelwal; Xi Chen; Varun Mithal; James H. Faghmous; Vipin Kumar
Inland water is an important natural resource that is critical for sustaining marine and terrestrial ecosystems as well as supporting a variety of human needs. Monitoring the dynamics of inland water bodies at a global scale is important for: (a) devising effective water management strategies, (b) assessing the impact of human actions on water security, (c) understanding the interplay between the spatio-temporal dynamics of surface water and climate change, and (d) near-real time mitigation and management of disaster events such as floods. Remote sensing datasets provide opportunities for global-scale monitoring of the extent or surface area of inland water bodies over time. We present a survey of existing remote sensing based approaches for monitoring the extent of inland water bodies and discuss their strengths and limitations. We further present an outline of the major challenges that need to be addressed for monitoring the extent and dynamics of water bodies at a global scale. Potential opportunities for overcoming some of these challenges are discussed using illustrative examples, laying the foundations for promising directions of future research in global monitoring of water dynamics.
conference on intelligent data understanding | 2012
Xi C. Chen; Yashu Chamber; Varun Mithal; Michael Lau; Karsten Steinhaeuser; Shyam Boriah; Michael Steinbach; Vipin Kumar; Christopher Potter; Steven A. Klooster; Teji Abraham; J.D. Stanley; Juan Carlos Castilla-Rubio
Forests are an important natural resource that support economic activity and play a significant role in regulating the climate and the carbon cycle, yet forest ecosystems are increasingly threatened by fires caused by a range of natural and anthropogenic factors. Mapping these fires, which can range in size from less than an acre to hundreds of thousands of acres, is an important task for supporting climate and carbon cycle studies as well as informing forest management. Currently, there are two primary approaches to fire mapping: field- and aerial-based surveys, which are costly and limited in their extent; and remote sensing-based approaches, which are more cost-effective but pose several interesting methodological and algorithmic challenges. In this paper, we introduce a new framework for mapping forest fires based on satellite observations. Specifically, we develop unsupervised spatio-temporal data mining methods for Moderate Resolution Imaging Spectroradiometer (MODIS) data to generate a history of forest fires. A systematic comparison with alternate approaches in two diverse geographic regions demonstrates that our algorithmic paradigm is able to overcome some of the limitations in both data and methods employed by prior efforts.
international geoscience and remote sensing symposium | 2011
Ashish Garg; Varun Mithal; Yashu Chamber; Ivan Brugere; Vijay Chaudhari; Marc Dunham; Vikrant Krishna; Sairam Krishnamurthy; Sruthi Vangala; Shyam Boriah; Michael Steinbach; Vipin Kumar; Albert Cho; J.D. Stanley; Teji Abraham; Juan Carlos Castilla-Rubio; Christopher Potter; Steven A. Klooster
Land cover change, especially deforestation, is a priority issue for policymakers at the local, national and international scale. Deforestations contribution of up to 20% of global green-house gas emissions is already well known; the loss of biodiversity from land conversion is also well established [7, 13]. Policymakers at the UN Framework Convention on Climate Change negotiations are addressing land use change by developing a framework for Reducing Emissions from Deforestation and Degradation (REDD). In parallel, a number of technical groups have been working to identify strategies for reliably monitoring, reporting and verifying land use change and emissions [3, 2].
conference on intelligent data understanding | 2012
Varun Mithal; Zachary O'Connor; Karsten Steinhaeuser; Shyam Boriah; Vipin Kumar; Christopher Potter; Steven A. Klooster
Segmentation of a time series attempts to divide it into homogeneous subsequences, such that each of these segments are different from each other. A typical segmentation framework involves selecting a model that is used to represent the segment. In this paper, we investigate segmentation scores based on difference between models and propose two approaches for normalizing the difference based score. The first approach uses permutation testing to assign a p-value to model difference. The second approach builds on bootstrapping methodology used in statistics which estimates the null distribution of complex statistics whose standard errors are not analytically derivable by generating alternative versions of the data by a resampling strategy. More specifically, given a time series with either a single or two segments, we propose a method to estimate the distribution of model difference statistic for each segment. The proposed approach allows normalizing model difference statistic when complex models are being used in the segmentation algorithm. We study the strengths and weaknesses of the two normalizing approaches in the context of characteristics of land cover data such as seasonality and noise using synthetic and real data sets. We show that relative performance of normalization approaches can vary significantly depending on the characteristics of the data. We illustrate the utility of these approaches for detection of deforestation in Mato Grosso (Brazil).
international conference on data mining | 2015
Ankush Khandelwal; Varun Mithal; Vipin Kumar
Classification of instances into different categories in various real world applications suffer from inaccuracies due to lack of representative training data, limitations of classification models, noise and outliers in the input data etc. In this paper we propose a new post classification label refinement method for the scenarios where data instances have an inherent ordering among them that can be leveraged to correct inconsistencies in class labels. We show that by using the ordering constraint, more robust algorithms can be developed than traditional methods. Moreover in most applications where this ordering among instances exists, it is not directly observed. The proposed approach simultaneously estimates the latent ordering among instances and corrects the class labels. We demonstrate the utility of the approach for the application of monitoring the dynamics of lakes and reservoirs. The proposed approach has been evaluated on synthetic datasets with different noise structures and noise levels.
Managing and Mining Sensor Data | 2013
James H. Faghmous; Jaya Kawale; Luke Styles; Mace Blank; Varun Mithal; Xi C. Chen; Ankush Khandelwal; Shyam Boriah; Karsten Steinhaeuser; Michael Steinbach; Vipin Kumar; Stefan Liess
Advances in earth observation technologies have led to the acquisition of vast volumes of accurate, timely and reliable environmental data which encompass a multitude of information about the land, ocean and atmosphere of the planet. Earth science sensor datasets capture multiple facets of information about natural processes and human activities that shape the physical landscape and environmental quality of our planet, and thus, offer an opportunity to monitor and understand the diverse phenomena affecting earth’s complex system. The monitoring, analysis and understanding of these rich sensor datasets is thus of prime importance for the efficient planning and management of critical resources, since the societal costs of mitigation or adaptation decisions for natural or human-induced adverse events are significant. Hence, a thorough understanding of earth science sensor datasets has a direct impact on a range of societally relevant issues. Moreover, earth science sensor datasets possess unique domain-specific properties that distinguish them from sensor datasets used in other domains, and thus demand the need for novel tools and techniques to be developed for their analysis, adhering to their characteristic issues and challenges.
IEEE Transactions on Knowledge and Data Engineering | 2017
Varun Mithal; Guruprasad Nayak; Ankush Khandelwal; Vipin Kumar; Nikunj C. Oza; Ramakrishna R. Nemani
Many real-world problems involve learning models for rare classes in situations where there are no gold standard labels for training samples but imperfect labels are available for all instances. In this paper, we present RAPT, a three step predictive modeling framework for classifying rare class in such problem settings. The first step of the proposed framework learns a classifier that jointly optimizes precision and recall by only using imperfectly labeled training samples. We also show that, under certain assumptions on the imperfect labels, the quality of this classifier is almost as good as the one constructed using perfect labels. The second and third steps of the framework make use of the fact that imperfect labels are available for all instances to further improve the precision and recall of the rare class. We evaluate the RAPT framework on two real-world applications of mapping forest fires and urban extent from earth observing satellite data. The experimental results indicate that RAPT can be used to identify forest fires and urban areas with high precision and recall by using imperfect labels, even though obtaining expert annotated samples on a global scale is infeasible in these applications.