Pavel Senin
Los Alamos National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pavel Senin.
Nature | 2008
Ray Ming; Shaobin Hou; Yun Feng; Qingyi Yu; Alexandre Dionne-Laporte; Jimmy H. Saw; Pavel Senin; Wei Wang; Benjamin V. Ly; Kanako L. T. Lewis; Lu Feng; Meghan R. Jones; Rachel L. Skelton; Jan E. Murray; Cuixia Chen; Wubin Qian; Junguo Shen; Peng Du; Moriah Eustice; Eric J. Tong; Haibao Tang; Eric Lyons; Robert E. Paull; Todd P. Michael; Kerr Wall; Danny W. Rice; Henrik H. Albert; Ming Li Wang; Yun J. Zhu; Michael C. Schatz
Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3× draft genome sequence of ‘SunUp’ papaya, the first commercial virus-resistant transgenic fruit tree to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far, may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica’s distinguishing morpho-physiological, medicinal and nutritional properties.
Nature | 2007
Peter F. Dunfield; Anton Yuryev; Pavel Senin; Angela V. Smirnova; Matthew B. Stott; Shaobin Hou; Binh Ly; Jimmy H. Saw; Zhemin Zhou; Yan Ren; Jianmei Wang; Bruce W. Mountain; Michelle A. Crowe; Tina M. Weatherby; Paul L. E. Bodelier; Werner Liesack; Lu Feng; Lei Wang; Maqsudul Alam
Aerobic methanotrophic bacteria consume methane as it diffuses away from methanogenic zones of soil and sediment. They act as a biofilter to reduce methane emissions to the atmosphere, and they are therefore targets in strategies to combat global climate change. No cultured methanotroph grows optimally below pH 5, but some environments with active methane cycles are very acidic. Here we describe an extremely acidophilic methanotroph that grows optimally at pH 2.0–2.5. Unlike the known methanotrophs, it does not belong to the phylum Proteobacteria but rather to the Verrucomicrobia, a widespread and diverse bacterial phylum that primarily comprises uncultivated species with unknown genotypes. Analysis of its draft genome detected genes encoding particulate methane monooxygenase that were homologous to genes found in methanotrophic proteobacteria. However, known genetic modules for methanol and formaldehyde oxidation were incomplete or missing, suggesting that the bacterium uses some novel methylotrophic pathways. Phylogenetic analysis of its three pmoA genes (encoding a subunit of particulate methane monooxygenase) placed them into a distinct cluster from proteobacterial homologues. This indicates an ancient divergence of Verrucomicrobia and Proteobacteria methanotrophs rather than a recent horizontal gene transfer of methanotrophic ability. The findings show that methanotrophy in the Bacteria is more taxonomically, ecologically and genetically diverse than previously thought, and that previous studies have failed to assess the full diversity of methanotrophs in acidic environments.
PLOS ONE | 2009
Tanja Woyke; Gary Xie; Alex Copeland; José M. González; Cliff Han; Hajnalka Kiss; Jimmy Hw Saw; Pavel Senin; Chi Yang; Sourav Chatterji; Jan Fang Cheng; Jonathan A. Eisen; Michael E. Sieracki; Ramunas Stepanauskas
The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured taxa from a complex microbial community of marine bacterioplankton. A combination of single cell genomics and metagenomics enabled us to analyze the genome content, metabolic adaptations, and biogeography of these taxa.
european conference on machine learning | 2014
Pavel Senin; Jessica Lin; Xing Wang; Tim Oates; Sunil Gandhi; Arnold P. Boedihardjo; Crystal Chen; Susan Frankenstein; Manfred Lerner
The problem of frequent and anomalous patterns discovery in time series has received a lot of attention in the past decade. Addressing the common limitation of existing techniques, which require a pattern length to be known in advance, we recently proposed grammar-based algorithms for efficient discovery of variable length frequent and rare patterns. In this paper we present GrammarViz 2.0, an interactive tool that, based on our previous work, implements algorithms for grammar-driven mining and visualization of variable length time series patterns1.
artificial intelligence in medicine in europe | 2017
Germain Forestier; François Petitjean; Pavel Senin; Fabien Despinoy; Pierre Jannin
The analysis of surgical motion has received a growing interest with the development of devices allowing their automatic capture. In this context, the use of advanced surgical training systems make an automated assessment of surgical trainee possible. Automatic and quantitative evaluation of surgical skills is a very important step in improving surgical patient care. In this paper, we present a novel approach for the discovery and ranking of discriminative and interpretable patterns of surgical practice from recordings of surgical motions. A pattern is defined as a series of actions or events in the kinematic data that together are distinctive of a specific gesture or skill level. Our approach is based on the discretization of the continuous kinematic data into strings which are then processed to form bags of words. This step allows us to apply discriminative pattern mining technique based on the word occurrence frequency. We show that the patterns identified by the proposed technique can be used to accurately classify individual gestures and skill levels. We also present how the patterns provide a detailed feedback on the trainee skill assessment. Experimental evaluation performed on the publicly available JIGSAWS dataset shows that the proposed approach successfully classifies gestures and skill levels.
pacific-asia conference on knowledge discovery and data mining | 2014
Rasaq Otunba; Jessica Lin; Pavel Senin
Massive amounts of data are generated daily at a rapid rate. As a result, the world is faced with unprecedented challenges and opportunities on managing the ever-growing data. These challenges are prevalent in time series for obvious reasons. Clearly, there is an urgent need for efficient solutions to mine large-scale time series databases. One of such data mining tasks is periodicity mining. Efficient and effective periodicity mining techniques in big data would be useful in cases such as finding animal migration patterns, analysis of stock market data for periodicity, and outlier detection in electrocardiogram (ECG), analyses of periodic disease outbreak etc. This work utilizes the notion of time series motifs for approximate period detection. Specifically, we present a novel and simple method to detect periods on time series data based on recurrent patterns. Our approach is effective, noise-resilient, and efficient. Experimental results show that our approach is superior compared to a popularly used period detection technique with respect to accuracy while requiring much less time and space.
ACM Transactions on Knowledge Discovery From Data | 2018
Pavel Senin; Jessica Lin; Xing Wang; Tim Oates; Sunil Gandhi; Arnold P. Boedihardjo; Crystal Chen; Susan Frankenstein
The problems of recurrent and anomalous pattern discovery in time series, e.g., motifs and discords, respectively, have received a lot of attention from researchers in the past decade. However, since the pattern search space is usually intractable, most existing detection algorithms require that the patterns have discriminative characteristics and have its length known in advance and provided as input, which is an unreasonable requirement for many real-world problems. In addition, patterns of similar structure, but of different lengths may co-exist in a time series. Addressing these issues, we have developed algorithms for variable-length time series pattern discovery that are based on symbolic discretization and grammar inference—two techniques whose combination enables the structured reduction of the search space and discovery of the candidate patterns in linear time. In this work, we present GrammarViz 3.0—a software package that provides implementations of proposed algorithms and graphical user interface for interactive variable-length time series pattern discovery. The current version of the software provides an alternative grammar inference algorithm that improves the time series motif discovery workflow, and introduces an experimental procedure for automated discretization parameter selection that builds upon the minimum cardinality maximum cover principle and aids the time series recurrent and anomalous pattern discovery.
Artificial Intelligence in Medicine | 2017
Germain Forestier; François Petitjean; Pavel Senin; Laurent Riffaud; Pierre Louis Henaux; Pierre Jannin
OBJECTIVE Surgery is one of the riskiest and most important medical acts that is performed today. Understanding the ways in which surgeries are similar or different from each other is of major interest to understand and analyze surgical behaviors. This article addresses the issue of identifying discriminative patterns of surgical practice from recordings of surgeries. These recordings are sequences of low-level surgical activities representing the actions performed by surgeons during surgeries. MATERIALS AND METHOD To discover patterns that are specific to a group of surgeries, we use the vector space model (VSM) which is originally an algebraic model for representing text documents. We split long sequences of surgical activities into subsequences of consecutive activities. We then compute the relative frequencies of these subsequences using the tf*idf framework and we use the Cosine similarity to classify the sequences. This process makes it possible to discover which patterns discriminate one set of surgeries recordings from another set. RESULTS Experiments were performed on 40 neurosurgeries of anterior cervical discectomy (ACD). The results demonstrate that our method accurately identifies patterns that can discriminate between (1) locations where the surgery took place, (2) levels of expertise of surgeons (i.e., expert vs. intermediate) and even (3) individual surgeons who performed the intervention. We also show how the tf*idf weight vector can be used to both visualize the most interesting patterns and to highlight the parts of a given surgery that are the most interesting. CONCLUSIONS Identifying patterns that discriminate groups of surgeon is a very important step in improving the understanding of surgical processes. The proposed method finds discriminative and interpretable patterns in sequences of surgical activities. Our approach provides intuitive results, as it identifies automatically the set of patterns explaining the differences between the groups.
conference on information and knowledge management | 2015
Sunil Gandhi; Tim Oates; Arnold P. Boedihardjo; Crystal Chen; Jessica Lin; Pavel Senin; Susan Frankenstein; Xing Wang
Discretization is a crucial first step in several time series mining applications. Our research proposes a novel method to discretize time series data and develops a similarity score based on the discretized representation. The similarity score allows us to compare two time series sequences and enables us to perform pattern learning tasks such as clustering, classification, and anomaly detection. We propose a generative model for discretization based on multiple normal distributions and create an optimization technique to learn parameters of these normal distributions. To show the effectiveness of our approach, we perform comprehensive experiments in classifying datasets from the UCR time series repository.
Artificial Intelligence in Medicine | 2018
Germain Forestier; François Petitjean; Pavel Senin; Fabien Despinoy; Arnaud Huaulmé; Hassan Ismail Fawaz; Jonathan Weber; Lhassane Idoumghar; Pierre-Alain Muller; Pierre Jannin
OBJECTIVE The analysis of surgical motion has received a growing interest with the development of devices allowing their automatic capture. In this context, the use of advanced surgical training systems makes an automated assessment of surgical trainee possible. Automatic and quantitative evaluation of surgical skills is a very important step in improving surgical patient care. MATERIAL AND METHOD In this paper, we present an approach for the discovery and ranking of discriminative and interpretable patterns of surgical practice from recordings of surgical motions. A pattern is defined as a series of actions or events in the kinematic data that together are distinctive of a specific gesture or skill level. Our approach is based on the decomposition of continuous kinematic data into a set of overlapping gestures represented by strings (bag of words) for which we compute comparative numerical statistic (tf-idf) enabling the discriminative gesture discovery via its relative occurrence frequency. RESULTS We carried out experiments on three surgical motion datasets. The results show that the patterns identified by the proposed method can be used to accurately classify individual gestures, skill levels and surgical interfaces. We also present how the patterns provide a detailed feedback on the trainee skill assessment. CONCLUSIONS The proposed approach is an interesting addition to existing learning tools for surgery as it provides a way to obtain a feedback on which parts of an exercise have been used to classify the attempt as correct or incorrect.