Ashwin Satyanarayana | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ashwin Satyanarayana is active.

Explore More

Publication

Featured researches published by Ashwin Satyanarayana.

information interaction in context | 2010

Evaluating search systems using result page context

Peter Bailey; Nick Craswell; Ryen W. White; Liwei Chen; Ashwin Satyanarayana; Seyed M. M. Tahaghoghi

We introduce a method for evaluating the relevance of all visible components of a Web search results page, in the context of that results page. Contrary to Cranfield-style evaluation methods, our approach recognizes that a users initial search interaction is with the result page produced by a search system, not the landing pages linked from it. Our key contribution is that the method allows us to investigate aspects of component relevance that are difficult or impossible to judge in isolation. Such contextual aspects include component-level information redundancy and cross-component coherence. We report on how the method complements traditional document relevance measurement and its support for comparative relevance assessment across multiple search engines. We also study possible issues with applying the method, including brand presentation effects, inter-judge agreement, and comparisons with document-based relevance judgments. Our findings show this is a useful method for evaluating the dominant user experience in interacting with search systems.

knowledge discovery and data mining | 2004

A general approach to incorporate data quality matrices into data mining algorithms

Ian Davidson; Ashish Grover; Ashwin Satyanarayana; Giri Kumar Tayi

Data quality is a central issue for many information-oriented organizations. Recent advances in the data quality field reflect the view that a database is the product of a manufacturing process. While routine errors, such as non-existent zip codes, can be detected and corrected using traditional data cleansing tools, many errors systemic to the manufacturing process cannot be addressed. Therefore, the product of the data manufacturing process is an imprecise recording of information about the entities of interest (i.e. customers, transactions or assets). In this way, the database is only one (flawed) version of the entities it is supposed to represent. Quality assurance systems such as Motorolas Six-Sigma and other continuous improvement methods document the data manufacturing processs shortcomings. A widespread method of documentation is quality matrices. In this paper, we explore the use of the readily available data quality matrices for the data mining classification task. We first illustrate that if we do not factor in these quality matrices, then our results for prediction are sub-optimal. We then suggest a general-purpose ensemble approach that perturbs the data according to these quality matrices to improve the predictive accuracy and show the improvement is due to a reduction in variance.

canadian conference on electrical and computer engineering | 2014

Intelligent sampling for big data using bootstrap sampling and chebyshev inequality

Ashwin Satyanarayana

The amount of data being generated and stored is growing exponentially, owed in part to the continuing advances in computer technology. These data present tremendous opportunities in data mining, a burgeoning field in computer science that focuses on the development of methods that can extract knowledge from data. In many real world problems, these data mining algorithms have access to massive amounts of data. Mining all the available data is prohibitive due to computational (time and memory) constraints. Much of the current research is concerned with scaling up data mining algorithms (i.e. improving on existing data mining algorithms for larger datasets). An alternative approach is to scale down the data. Thus, determining a smallest sufficient training set size that obtains the same accuracy as the entire available dataset remains an important research question. Our research focuses on selecting how many (sampling) instances to present to the data mining algorithm. The goals of this paper is to study and characterize the properties of learning curves, integrate them with Chebyshev Bound to come up with an efficient general purpose adaptive sampling schedule, and to empirically validate our algorithm for scaling down the data.

International Journal of Mathematical Education in Science and Technology | 2017

Introducing Computational Thinking through Hands-on Projects Using R with Applications to Calculus, Probability and Data Analysis.

Nadia Benakli; Boyan Kostadinov; Ashwin Satyanarayana; Satyanand Singh

ABSTRACT The goal of this paper is to promote computational thinking among mathematics, engineering, science and technology students, through hands-on computer experiments. These activities have the potential to empower students to learn, create and invent with technology, and they engage computational thinking through simulations, visualizations and data analysis. We present nine computer experiments and suggest a few more, with applications to calculus, probability and data analysis, which engage computational thinking through simulations, visualizations and data analysis. We are using the free (open-source) statistical programming language R. Our goal is to give a taste of what R offers rather than to present a comprehensive tutorial on the R language. In our experience, these kinds of interactive computer activities can be easily integrated into a smart classroom. Furthermore, these activities do tend to keep students motivated and actively engaged in the process of learning, problem solving and developing a better intuition for understanding complex mathematical concepts.

international syposium on methodologies for intelligent systems | 2005

A dynamic adaptive sampling algorithm (DASA) for real world applications: finger print recognition and face recognition

Ashwin Satyanarayana; Ian Davidson

In many real world problems, data mining algorithms have access to massive amounts of data (defense and security). Mining all the available data is prohibitive due to computational (time and memory) constraints. Thus, the smallest sufficient training set size that obtains the same accuracy as the entire available dataset remains an important research question. Progressive sampling randomly selects an initial small sample and increases the sample size using either geometric or arithmetic series until the error converges, with the sampling schedule determined apriori. In this paper, we explore sampling schedules that are adaptive to the dataset under consideration. We develop a general approach to determine how many instances are required at each iteration for convergence using Chernoff Inequality. We try our approach on two real world problems where data is abundant: face recognition and finger print recognition using neural networks. Our empirical results show that our dynamic approach is faster and uses much fewer examples than other existing methods. However, the use of Chernoff bound requires the samples at each iteration to be independent of each other. Future work will look at removing this limitation which should further improve performance.

Microprocessors and Microsystems | 2016

Performance modeling of CMOS inverters using support vector machines (SVM) and adaptive sampling

Ashwin Satyanarayana

Integrated circuit designs are verified through the use of circuit simulators before being reproduced in real silicon. In order for any circuit simulation tool to accurately predict the performance of a CMOS design, it should generate models to predict the transistors electrical characteristics. The circuit simulation tools have access to massive amounts of data that are not only dynamic but generated at high speed in real time, hence making fast simulation a bottleneck in integrated circuit design. Using all the available data is prohibitive due to memory and time constraints. Accurate and fast sampling has been shown to enhance processing of large datasets without knowing all of the data. However, it is difficult to know in advance what size of the sample to choose in order to guarantee good performance. Thus, determining the smallest sufficient dataset size that obtains the same accurate model as the entire available dataset remains an important research question. This paper focuses on adaptively determining how many instances to present to the simulation tool for creating accurate models. We use Support Vector Machines (SVMs) with Chernoff inequality to come up with an efficient adaptive sampling technique, for scaling down the data. We then empirically show that the adaptive approach is faster and produces accurate models for circuit simulators as compared to other techniques such as progressive sampling and Artificial Neural Networks.

canadian conference on electrical and computer engineering | 2014

Enhanced cobweb clustering for identifying analog galaxies in astrophysics

Ashwin Satyanarayana; Viviana Acquaviva

Clustering, a very popular task in Data Mining, is unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). Clustering has been explored in many different contexts and disciplines. In this paper, we explore using the COBWEB clustering algorithm to identify and group together galaxies whose spectral energy distribution (SED) is similar. We show that using COBWEB drastically reduces CPU time, compared to a systematic one-by-one comparison previously used in astrophysics. We then extend this approach by using COBWEB clustering with Bootstrap Averaging and show that using Bootstrap Averaging produces a more accurate model in roughly the same amount of time as COBWEB.

Archive | 2016

Ensemble Noise Filtering for Streaming Data Using Poisson Bootstrap Model Filtering

Ashwin Satyanarayana; Rosemary Chinchilla

Ensemble filtering techniques filter noisy instances by combining the predictions of multiple base models, each of which is learned using a traditional algorithm. However, in the last decade, due to the massive increase in the amount of online streaming data, ensemble filtering methods, which largely operate in batch mode and requires multiple passes over the data, cause time and storage complexities. In this paper, we present an ensemble bootstrap model filtering technique with multiple inductive learning algorithms on several small Poisson bootstrapped samples of online data to filter noisy instances. We analyze three prior filtering techniques using Bayesian computational analysis to understand the underlying distribution of the model space. We implement our and other prior filtering approaches and show that our approach is more accurate than other prior filtering methods.

international conference on data mining | 2003