Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daniel V. Samarov is active.

Publication


Featured researches published by Daniel V. Samarov.


PLOS ONE | 2012

Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing

Justin M. Zook; Daniel V. Samarov; Jennifer H. McDaniel; Shurjo K. Sen; Marc L. Salit

While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration.


Biomedical Optics Express | 2012

Algorithm validation using multicolor phantoms.

Daniel V. Samarov; Matthew L. Clarke; J. Lee; David W. Allen; Maritoni Litorja; Jeeseong Hwang

We present a framework for hyperspectral image (HSI) analysis validation, specifically abundance fraction estimation based on HSI measurements of water soluble dye mixtures printed on microarray chips. In our work we focus on the performance of two algorithms, the Least Absolute Shrinkage and Selection Operator (LASSO) and the Spatial LASSO (SPLASSO). The LASSO is a well known statistical method for simultaneously performing model estimation and variable selection. In the context of estimating abundance fractions in a HSI scene, the “sparse” representations provided by the LASSO are appropriate as not every pixel will be expected to contain every endmember. The SPLASSO is a novel approach we introduce here for HSI analysis which takes the framework of the LASSO algorithm a step further and incorporates the rich spatial information which is available in HSI to further improve the estimates of abundance. In our work here we introduce the dye mixture platform as a new benchmark data set for hyperspectral biomedical image processing and show our algorithm’s improvement over the standard LASSO.


Proceedings of SPIE | 2011

Characterization of hyperspectral imaging and analysis via microarray printing of dyes

Matthew L. Clarke; Maritoni Litorja; David W. Allen; Daniel V. Samarov; Jeeseong Hwang

The application of hyperspectral imaging requires rigorous characterization of the spatial and spectral imaging domains of the system. We present a microarray printing methodology for the testing of absorption or reflectance microscopy measurements. This controlled system can serve as a platform for inter-system calibration and provides a common framework for the development of post-processing algorithms. Calibration of the illumination at the objective plane using a transfer standard spectroradiometer allows comparison of light levels regardless of the illumination used, different apertures, and different microscopes. The method uses standard commercial optomechanical components. Printed dyes enable multiplexed testing of the spectral capability of a hyperspectral instrument. The spectral signatures of individual or blended dyes can be analyzed and applied to the testing of spectral image processing tools. Customized programming of the microarrayer allows for arbitrary patterning of dye samples onto the substrate, allowing for the testing of image processing algorithms involving the spatial distribution of spectral features.


Analytical and Bioanalytical Chemistry | 2016

PEPR: pipelines for evaluating prokaryotic references.

Nathanael D. Olson; Justin M. Zook; Daniel V. Samarov; Scott A. Jackson; Marc L. Salit

The rapid adoption of microbial whole genome sequencing in public health, clinical testing, and forensic laboratories requires the use of validated measurement processes. Well-characterized, homogeneous, and stable microbial genomic reference materials can be used to evaluate measurement processes, improving confidence in microbial whole genome sequencing results. We have developed a reproducible and transparent bioinformatics tool, PEPR, Pipelines for Evaluating Prokaryotic References, for characterizing the reference genome of prokaryotic genomic materials. PEPR evaluates the quality, purity, and homogeneity of the reference material genome, and purity of the genomic material. The quality of the genome is evaluated using high coverage paired-end sequence data; coverage, paired-end read size and direction, as well as soft-clipping rates, are used to identify mis-assemblies. The homogeneity and purity of the material relative to the reference genome are characterized by comparing base calls from replicate datasets generated using multiple sequencing technologies. Genomic purity of the material is assessed by checking for DNA contaminants. We demonstrate the tool and its output using sequencing data while developing a Staphylococcus aureus candidate genomic reference material. PEPR is open source and available at https://github.com/usnistgov/pepr.


Technometrics | 2015

The Spatial LASSO With Applications to Unmixing Hyperspectral Biomedical Images

Daniel V. Samarov; Jeeseong Hwang; Maritoni Litorja

Hyperspectral imaging (HSI) is a spectroscopic method that uses densely sampled measurements along the electromagnetic spectrum to identify the unique molecular composition of an object. Traditionally HSI has been associated with remote sensing-type applications, but recently has found increased use in biomedicine, from investigations at the cellular to the tissue level. One of the main challenges in the analysis of HSI is estimating the proportions, also called abundance fractions of each of the molecular signatures. While there is great promise for HSI in the area of biomedicine, large variability in the measurements and artifacts related to the instrumentation has slow adoption into more widespread practice. In this article, we propose a novel regularization and variable selection method called the spatial LASSO (SPLASSO). The SPLASSO incorporates spatial information via a graph Laplacian-based penalty to help improve the model estimation process for multivariate response data. We show the strong performance of this approach on a benchmark HSI dataset with considerable improvement in predictive accuracy over the standard LASSO. Supplementary materials for this article are available online.


Talanta | 2017

Quantifying the stability of trace explosives under different environmental conditions using electrospray ionization mass spectrometry

Edward Sisco; Marcela Najarro; Daniel V. Samarov; Jeffrey A. Lawrence

This work investigates the stability of trace (tens of nanograms) deposits of six explosives: erythritol tetranitrate (ETN), pentaerythritol tetranitrate (PETN), cyclotrimethylenetrinitramine (RDX), cyclotetramethylenetetranitramine (HMX), 2,4,6-trinitrotoluene (TNT), and 2,4,6-trinitrophenylmethylnitramine (tetryl) to determine environmental stabilities and lifetimes of trace level materials. Explosives were inkjet printed directly onto substrates and exposed to one of seven environmental conditions (Laboratory, -4°C, 30°C, 47°C, 90% relative humidity, UV light, and ozone) up to 42 days. Throughout the study, samples were extracted and quantified using electrospray ionization mass spectrometry (ESI-MS) to determine the stability of the explosive as a function of time and environmental exposure. Statistical models were then fit to the data and used for pairwise comparisons of the environments. Stability was found to be exposure and compound dependent with minimal sample losses observed for HMX, RDX, and PETN while substantial and rapid losses were observed in all conditions except -4°C for ETN and TNT and in all conditions for tetryl. The results of this work highlight the potential fate of explosive traces when exposed to various environments.


Biomedical Optics Express | 2012

Designing microarray phantoms for hyperspectral imaging validation

Matthew L. Clarke; J. Lee; Daniel V. Samarov; David W. Allen; Maritoni Litorja; Ralph Nossal; Jeeseong Hwang

The design and fabrication of custom-tailored microarrays for use as phantoms in the characterization of hyperspectral imaging systems is described. Corresponding analysis methods for biologically relevant samples are also discussed. An image-based phantom design was used to program a microarrayer robot to print prescribed mixtures of dyes onto microscope slides. The resulting arrays were imaged by a hyperspectral imaging microscope. The shape of the spots results in significant scattering signals, which can be used to test image analysis algorithms. Separation of the scattering signals allowed elucidation of individual dye spectra. In addition, spectral fitting of the absorbance spectra of complex dye mixtures was performed in order to determine local dye concentrations. Such microarray phantoms provide a robust testing platform for comparisons of hyperspectral imaging acquisition and analysis methods.


Technometrics | 2017

A Coordinate-Descent-Based Approach to Solving the Sparse Group Elastic Net

Daniel V. Samarov; David W. Allen; Jeeseong Hwang; Young Jong Lee; Maritoni Litorja

ABSTRACT Group sparse approaches to regression modeling are finding ever increasing utility in an array of application areas. While group sparsity can help assess certain data structures, it is desirable in many instances to also capture element-wise sparsity. Recent work exploring the latter has been conducted in the context of l2/l1 penalized regression in the form of the sparse group lasso (SGL). Here, we present a novel model, called the sparse group elastic net (SGEN), which uses an l∞/l1/ridge-based penalty. We show that the l∞-norm, which induces group sparsity is particularly effective in the presence of noisy data. We solve the SGEN model using a coordinate descent-based procedure and compare its performance to the SGL and related methods in the context of hyperspectral imaging in the presence of noisy observations. Supplementary materials for this article are available online.


ACM Transactions on Information Systems | 2017

Using Replicates in Information Retrieval Evaluation

Ellen M. Voorhees; Daniel V. Samarov; Ian Soboroff

This article explores a method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, thus increasing the sensitivity of system comparisons. Randomly partitioning the test document collection allows for multiple tests of a given system and topic (replicates). Bootstrap ANOVA can use these replicates to extract system-topic interactions—something not possible without replicates—yielding a more precise value for the system effect and a narrower confidence interval around that value. Experiments using multiple TREC collections demonstrate that removing the topic-system interactions substantially reduces the confidence intervals around the system effect as well as increases the number of significant pairwise differences found. Further, the method is robust against small changes in the number of partitions used, against variability in the documents that constitute the partitions, and the measure of effectiveness used to quantify system effectiveness.


Journal of Computational and Graphical Statistics | 2015

The Fast RODEO for Local Polynomial Regression

Daniel V. Samarov

An open challenge in nonparametric regression is finding fast, computationally efficient approaches to estimating local bandwidths for large datasets, in particular in two or more dimensions. In the work presented here, we introduce a novel local bandwidth estimation procedure for local polynomial regression, which combines the greedy search of the regularization of the derivative expectation operator (RODEO) algorithm with linear binning. The result is a fast, computationally efficient algorithm, which we refer to as the fast RODEO. We motivate the development of our algorithm by using a novel scale-space approach to derive the RODEO. We conclude with a toy example and a real-world example using data from the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite validation study, where we show the fast RODEO’s improvement in accuracy and computational speed over two other standard approaches.

Collaboration


Dive into the Daniel V. Samarov's collaboration.

Top Co-Authors

Avatar

Jeeseong Hwang

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

David W. Allen

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Maritoni Litorja

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

David F. Plusquellic

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Dennis D. Leber

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kevin O. Douglass

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Matthew L. Clarke

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

James R. Whetstone

National Institute of Standards and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge