Barry L. Drake | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Barry L. Drake is active.

Explore More

Publication

Featured researches published by Barry L. Drake.

Journal of Global Optimization | 2003

A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

Yunjae Jung; Haesun Park; Ding-Zhu Du; Barry L. Drake

Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of different groups. Though many effective and efficient clustering algorithms have been developed and deployed, most of them still suffer from the lack of automatic or online decision for optimal number of clusters. In this paper, we define clustering gain as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds. When the measure is applied to a hierarchical clustering algorithm, an optimal number of clusters can be found. Our clustering measure shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results. Furthermore, the measure can be utilized to estimate the desired number of clusters for partitional clustering methods as well. Therefore, the clustering gain measure provides a promising technique for achieving a higher level of quality for a wide range of clustering methods.

Pattern Recognition | 2007

Multiclass classifiers based on dimension reduction with generalized LDA

Hyunsoo Kim; Barry L. Drake; Haesun Park

Linear discriminant analysis (LDA) has been widely used for dimension reduction of data sets with multiple classes. The LDA has been recently extended to various generalized LDA methods that are applicable regardless of the relative sizes between the data dimension and the number of data items. In this paper, we propose several multiclass classifiers based on generalized LDA (GLDA) algorithms, taking advantage of the dimension reducing transformation matrix without requiring additional training or parameter optimization. A marginal linear discriminant classifier (MLDC), a Bayesian linear discriminant classifier (BLDC), and a one-dimensional BLDC are introduced for multiclass classification. Our experimental results illustrate that these classifiers produce higher ten-fold cross validation accuracy than kNN and centroid-based classifiers in the reduced dimensional space obtained from GLDA.

IEEE Transactions on Knowledge and Data Engineering | 2006

Adaptive nonlinear discriminant analysis by regularized minimum squared errors

Hyunsoo Kim; Barry L. Drake; Haesun Park

Kernelized nonlinear extensions of Fishers discriminant analysis, discriminant analysis based on generalized singular value decomposition (LDA/GSVD), and discriminant analysis based on the minimum squared error formulation (MSE) have recently been widely utilized for handling undersampled high-dimensional problems and nonlinearly separable data sets. As the data sets are modified from incorporating new data points and deleting obsolete data points, there is a need to develop efficient updating and downdating algorithms for these methods to avoid expensive recomputation of the solution from scratch. In this paper, an efficient algorithm for adaptive linear and nonlinear kernel discriminant analysis based on regularized MSE, called adaptive KDA/RMSE, is proposed. In adaptive KDA/RMSE, updating and downdating of the computationally expensive eigenvalue decomposition (EVD) or singular value decomposition (SVD) is approximated by updating and downdating of the QR decomposition achieving an order of magnitude speed up. This fast algorithm for adaptive kernelized discriminant analysis is designed by utilizing regularization techniques and the relationship between linear and nonlinear discriminant analysis and the MSE. In addition, an efficient algorithm to compute leave-one-out cross validation is also introduced by utilizing downdating of KDA/RMSE.

international conference on information fusion | 2010

Supervised Raman spectra estimation based on nonnegative rank deficient least squares

Barry L. Drake; Jingu Kim; Mahendra Mallick; Haesun Park

Raman spectroscopy is a powerful and effective technique for analyzing and identifying the chemical composition of a substance. In this paper, we focus on supervised methods for estimating Raman spectra and present a supervised method that can handle rank deficiency for estimating the Raman spectra. Earlier work has mostly assumed that the reference spectra matrix whose columns consist of the library of reference spectra are of full rank. However in practice, methods that can handle rank deficient cases, and the special case of an over complete library, are needed. We present our theoretical discovery that the active set method with a proper starting vector can actually solve the rank deficient nonnegativity-constrained least squares problems without ever running into rank deficient least squares problems during iterations. Experimental results illustrate the effectiveness of the proposed approaches.

visual analytics science and technology | 2014

VisIRR: Visual analytics for information retrieval and recommendation with large-scale document data

Jaegul Choo; Changhyun Lee; Hannah Kim; Hanseung Lee; Zhicheng Liu; Ramakrishnan Kannan; Charles D. Stolper; John T. Stasko; Barry L. Drake; Haesun Park

We present VisIRR, an interactive visual information retrieval and recommendation system for large-scale document data. Starting with a query, VisIRR visualizes the retrieved documents in a scatter plot along with their topic summary. Next, based on interactive personalized preference feedback on the documents, VisIRR collects and visualizes potentially relevant documents out of the entire corpus so that an integrated analysis of both retrieved and recommended documents can be performed seamlessly.

visual analytics science and technology | 2014

PIVE: Per-Iteration visualization environment for supporting real-time interactions with computational methods

Jaegul Choo; Changhyun Lee; Hannah Kim; Hanseung Lee; Chandan K. Reddy; Barry L. Drake; Haesun Park

A main bottleneck in integrating computational methods with visual analytics is their significant computational cost, which hinders real-time interactive visualization with them. To solve this, we present PIVE (Per-Iteration Visualization Environment), which visualizes intermediate results from algorithm iterations, thus allowing users to efficiently perform multiple interactions in real time.

international conference on pattern recognition | 2008

Linear discriminant analysis for data with subcluster structure

Haesun Park; Jaegul Choo; Barry L. Drake; Jinwoo Kang

Linear discriminant analysis (LDA) is a widely-used feature extraction method in classification. However, the original LDA has limitations due to the assumption of a unimodal structure for each cluster, which is satisfied in many applications such as facial image data when variations such as angle and illumination can significantly influence the images of the same person. In this paper, we propose a novel method, hierarchical LDA(h-LDA), which takes into account hierarchical subcluster structures in the data sets. Our experiments show that regularized h-LDA produces better accuracy than LDA, PCA, and tensorFaces.

Journal of Global Optimization | 2017

DC-NMF: nonnegative matrix factorization based on divide-and-conquer for fast clustering and topic modeling

Rundong Du; Da Kuang; Barry L. Drake; Haesun Park

The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data available from numerous sources. Nonnegative matrix factorization (NMF) has proven to be a successful method for cluster and topic discovery in unlabeled data sets. In this paper, we propose a fast algorithm for computing NMF using a divide-and-conquer strategy, called DC-NMF. Given an input matrix where the columns represent data items, we build a binary tree structure of the data items using a recently-proposed efficient algorithm for computing rank-2 NMF, and then gather information from the tree to initialize the rank-k NMF, which needs only a few iterations to reach a desired solution. We also investigate various criteria for selecting the node to split when growing the tree. We demonstrate the scalability of our algorithm for computing general rank-k NMF as well as its effectiveness in clustering and topic modeling for large-scale text data sets, by comparing it to other frequently utilized state-of-the-art algorithms. The value of the proposed approach lies in the highly efficient and accurate method for initializing rank-k NMF and the scalability achieved from the divide-and-conquer approach of the algorithm and properties of rank-2 NMF. In summary, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains along with an open-source software library called SmallK.

data compression conference | 2014

Nonlinear Adaptive Filtering with Dimension Reduction in the Wavelet Domain

Tiffany Huang; Barry L. Drake; David Aalfs; Brani Vidakovic

Recent advances in adaptive filter theory and the hardware for signal acquisition have led to the realization that purely linear algorithms are often not adequate in these domains. Nonlinearities in the input space have become apparent with todays real world problems. Algorithms that process the data must keep pace with the advances in signal acquisition. Recently kernel adaptive (online) filtering algorithms have been proposed that make no assumptions regarding the linearity of the input space. Additionally, advances in wavelet data compression/dimension reduction have also led to new algorithms that are appropriate for producing a hybrid nonlinear filtering framework. In this paper we utilize a combination of wavelet dimension reduction and kernel adaptive filtering. We derive algorithms in which the dimension of the data is reduced by a wavelet transform. We follow this by kernel adaptive filtering algorithms on the reduced-domain data to find the appropriate model parameters demonstrating improved minimization of the mean-squared error (MSE). Another important feature of our methods is that the wavelet filter is also chosen based on the data, on-the-fly. In particular, it is shown that by using a few optimal wavelet coefficients from the constructed wavelet filter for both training and testing data sets as the input to the kernel adaptive filter, convergence to the near optimal learning curve (MSE) results. We demonstrate these algorithms on simulated and a real data set from food processing.

human factors in computing systems | 2018

TopicOnTiles: Tile-Based Spatio-Temporal Event Analytics via Exclusive Topic Modeling on Social Media

Minsuk Choi; Sungbok Shin; Jinho Choi; Scott Langevin; Christopher Bethune; Philippe Horne; Nathan Kronenfeld; Ramakrishnan Kannan; Barry L. Drake; Haesun Park; Jaegul Choo

Detecting anomalous events of a particular area in a timely manner is an important task. Geo-tagged social media data are useful resource for this task; however, the abundance of everyday language in them makes this task still challenging. To address such challenges, we present TopicOnTiles, a visual analytics system that can reveal information relevant to anomalous events in a multi-level tile-based map interface by using social media data. To this end, we adopt and improve a recently proposed topic modeling method that can extract spatio-temporally exclusive topics corresponding to a particular region and a time point. Furthermore, we utilize a tile-based map interface to efficiently handle large-scale data in parallel. Our user interface effectively highlights anomalous tiles using our novel glyph visualization that encodes the degree of anomaly computed by our exclusive topic modeling processes. To show the effectiveness of our system, we present several usage scenarios using real-world datasets as well as comprehensive user study results.

Explore More