Sunil Gandhi
University of Maryland, Baltimore County
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sunil Gandhi.
european conference on machine learning | 2014
Pavel Senin; Jessica Lin; Xing Wang; Tim Oates; Sunil Gandhi; Arnold P. Boedihardjo; Crystal Chen; Susan Frankenstein; Manfred Lerner
The problem of frequent and anomalous patterns discovery in time series has received a lot of attention in the past decade. Addressing the common limitation of existing techniques, which require a pattern length to be known in advance, we recently proposed grammar-based algorithms for efficient discovery of variable length frequent and rare patterns. In this paper we present GrammarViz 2.0, an interactive tool that, based on our previous work, implements algorithms for grammar-driven mining and visualization of variable length time series patterns1.
international conference on computational linguistics | 2014
Abhay L. Kashyap; Lushan Han; Roberto Yus; Jennifer Sleeman; Taneeya W. Satyapanich; Sunil Gandhi; Tim Finin
We describe UMBC’s systems developed for the SemEval 2014 tasks on Multilingual Semantic Textual Similarity (Task 10) and Cross-Level Semantic Similarity (Task 3). Our best submission in the Multilingual task ranked second in both English and Spanish subtasks using an unsupervised approach. Our best systems for Cross-Level task ranked second in Paragraph-Sentence and first in both Sentence-Phrase and Word-Sense subtask. The system ranked first for the PhraseWord subtask but was not included in the official results due to a late submission.
language resources and evaluation | 2016
Abhay L. Kashyap; Lushan Han; Roberto Yus; Jennifer Sleeman; Taneeya W. Satyapanich; Sunil Gandhi; Tim Finin
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines latent semantic analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM 2013 task on Semantic Textual Similarity, our best performing system ranked first among the 89 submitted runs. In the SemEval-2014 task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014 task on Cross-Level Semantic Similarity, we ranked first in Sentence–Phrase, Phrase–Word, and Word–Sense subtasks and second in the Paragraph–Sentence subtask.
conference on information and knowledge management | 2013
Tim Oates; Arnold P. Boedihardjo; Jessica Lin; Crystal Chen; Susan Frankenstein; Sunil Gandhi
Spatial trajectory analysis is crucial to uncovering insights into the motives and nature of human behavior. In this work, we study the problem of discovering motifs in trajectories based on symbolically transformed representations and context free grammars. We propose a fast and robust grammar induction algorithm called mSEQUITUR to infer a grammar rule set from a trajectory for motif generation. Second, we designed the Symbolic Trajectory Analysis and VIsualization System (STAVIS), the first of its kind trajectory analytical system that applies grammar inference to derive trajectory signatures and enable mining tasks on the signatures. Third, an empirical evaluation is performed to demonstrate the efficiency and effectiveness of mSEQUITUR for generating trajectory signatures and discovering motifs.
international symposium on circuits and systems | 2017
Ali Jafari; Sunil Gandhi; Sri Harsha Konuru; W. David Hairston; Tim Oates; Tinoosh Mohsenin
Electroencephalogram (EEG) data is used for a variety of purposes, including brain-computer interfaces, disease diagnosis, and determining cognitive states. Yet EEG signals are susceptible to noise from many sources, such as muscle and eye movements, and motion of electrodes and cables. Traditional approaches to this problem involve supervised training to identify signal components corresponding to noise so that they can be removed. However these approaches are artifact specific. In this paper, we present a novel software-hardware system that uses a weak supervisory signal to indicate that some noise is occurring, but not what the source of the noise is or how it is manifested in the EEG signal. The EEG data is decomposed into independent components using ICA, and these components form bags that are labeled and classified by a multi-instance learning algorithm that can identify the noise components for removal to reconstruct a clean EEG signal. We also performed extensive hyperparameter optimization for the model with the goal of improving accuracy without increasing execution time. This resulted the execution time to be reduced from 282 s to 8.8 s when running the model on an embedded ARM CPU processor at 1.6 GHz clock frequency. In this paper, we present the overall system which includes ICA, SAX and MIL, along with preliminary results for software and hardware implementation when using real EEG data from 64 electrodes. The proposed system consumes 909 mW power during processing above a baseline of 2.32 W idle, while achieving 91.2% artifact identification accuracy.
ACM Transactions on Knowledge Discovery From Data | 2018
Pavel Senin; Jessica Lin; Xing Wang; Tim Oates; Sunil Gandhi; Arnold P. Boedihardjo; Crystal Chen; Susan Frankenstein
The problems of recurrent and anomalous pattern discovery in time series, e.g., motifs and discords, respectively, have received a lot of attention from researchers in the past decade. However, since the pattern search space is usually intractable, most existing detection algorithms require that the patterns have discriminative characteristics and have its length known in advance and provided as input, which is an unreasonable requirement for many real-world problems. In addition, patterns of similar structure, but of different lengths may co-exist in a time series. Addressing these issues, we have developed algorithms for variable-length time series pattern discovery that are based on symbolic discretization and grammar inference—two techniques whose combination enables the structured reduction of the search space and discovery of the candidate patterns in linear time. In this work, we present GrammarViz 3.0—a software package that provides implementations of proposed algorithms and graphical user interface for interactive variable-length time series pattern discovery. The current version of the software provides an alternative grammar inference algorithm that improves the time series motif discovery workflow, and introduces an experimental procedure for automated discretization parameter selection that builds upon the minimum cardinality maximum cover principle and aids the time series recurrent and anomalous pattern discovery.
conference on information and knowledge management | 2015
Sunil Gandhi; Tim Oates; Arnold P. Boedihardjo; Crystal Chen; Jessica Lin; Pavel Senin; Susan Frankenstein; Xing Wang
Discretization is a crucial first step in several time series mining applications. Our research proposes a novel method to discretize time series data and develops a similarity score based on the discretized representation. The similarity score allows us to compare two time series sequences and enables us to perform pattern learning tasks such as clustering, classification, and anomaly detection. We propose a generative model for discretization based on multiple normal distributions and create an optimization technique to learn parameters of these normal distributions. To show the effectiveness of our approach, we perform comprehensive experiments in classifying datasets from the UCR time series repository.
pacific-asia conference on knowledge discovery and data mining | 2018
Sunil Gandhi; Tim Oates; Tinoosh Mohsenin; W. David Hairston
Denoising data is a preprocessing step for several time series mining algorithms. This step is especially important if the noise in data originates from diverse sources. Consequently, it is commonly used in biomedical applications that use Electroencephalography (EEG) data. In EEG data noise can occur due to ocular, muscular and cardiac activities. In this paper, we explicitly learn to remove noise from time series data without assuming a prior distribution of noise. We propose an online, fully automated, end-to-end system for denoising time series data. Our model for denoising time series is trained using unpaired training corpora and does not need information about the source of the noise or how it is manifested in the time series. We propose a new architecture called AsymmetricGAN that uses a generative adversarial network for denoising time series data. To analyze our approach, we create a synthetic dataset that is easy to visualize and interpret. We also evaluate and show the effectiveness of our approach on an existing EEG dataset.
international symposium on neural networks | 2017
Neha Tilak; Sunil Gandhi; Tim Oates
Entity linking is the task of identifying entities like people and places in textual data and linking them to corresponding entities in a knowledge base. In this paper we solve a visual equivalent of this task called visual entity linking. The goal is to link regions of images to corresponding entities in knowledge bases. Visual entity linking will enable computers to better understand visual content and thus can be used in tasks like image retrieval and visual question answering. More specifically, we propose a novel approach for linking image regions to entities in Dbpedia and Freebase. First, we select candidate entities using an automatic image description generation algorithm. We then extract image regions using object detection methods and compare them to depictions of entities in a knowledge base. We evaluate our approach on the Flickr8k dataset through surveys on Amazon Mechanical Turk, and present an extensive analysis to identify the sources of errors in our system.
extending database technology | 2016
Xing Wang; Jessica Lin; Pavel Senin; Tim Oates; Sunil Gandhi; Arnold P. Boedihardjo; Crystal Chen; Susan Frankenstein