Jessica Lin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jessica Lin is active.

Explore More

Publication

Featured researches published by Jessica Lin.

The Lancet | 2006

Complete genome sequence of USA300, an epidemic clone of community-acquired meticillin-resistant Staphylococcus aureus

Binh An Diep; Steven R. Gill; Richard F. Chang; Tiffany HaiVan Phan; Jason H. Chen; Matthew G Davidson; Felice Lin; Jessica Lin; Heather Carleton; Emmanuel F. Mongodin; George F. Sensabaugh; Francoise Perdreau-Remington

BACKGROUND USA300, a clone of meticillin-resistant Staphylococcus aureus, is a major source of community-acquired infections in the USA, Canada, and Europe. Our aim was to sequence its genome and compare it with those of other strains of S aureus to try to identify genes responsible for its distinctive epidemiological and virulence properties. METHODS We ascertained the genome sequence of FPR3757, a multidrug resistant USA300 strain, by random shotgun sequencing, then compared it with the sequences of ten other staphylococcal strains. FINDINGS Compared with closely related S aureus, we noted that almost all of the unique genes in USA300 clustered in novel allotypes of mobile genetic elements. Some of the unique genes are involved in pathogenesis, including Panton-Valentine leucocidin and molecular variants of enterotoxin Q and K. The most striking feature of the USA300 genome is the horizontal acquisition of a novel mobile genetic element that encodes an arginine deiminase pathway and an oligopeptide permease system that could contribute to growth and survival of USA300. We did not detect this element, termed arginine catabolic mobile element (ACME), in other S aureus strains. We noted a high prevalence of ACME in S epidermidis, suggesting not only that ACME transfers into USA300 from S epidermidis, but also that this element confers a selective advantage to this ubiquitous commensal of the human skin. INTERPRETATION USA300 has acquired mobile genetic elements that encode resistance and virulence determinants that could enhance fitness and pathogenicity.

Data Mining and Knowledge Discovery | 2007

Experiencing SAX: a novel symbolic representation of time series

Jessica Lin; Eamonn J. Keogh; Li Wei; Stefano Lonardi

Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic representations of time series have been introduced over the past decades, they all suffer from two fatal flaws. First, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Second, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series.In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. In particular, we will demonstrate the utility of our representation on various data mining tasks of clustering, classification, query by content, anomaly detection, motif discovery, and visualization.

international conference on data mining | 2005

HOT SAX: efficiently finding the most unusual time series subsequence

Eamonn J. Keogh; Jessica Lin; Ada Wai-Chee Fu

In this work, we introduce the new problem of finding time series discords. Time series discords are subsequences of a longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. Time series discords have many uses for data mining, including improving the quality of clustering, data cleaning, summarization, and anomaly detection. Discords are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. We evaluate our work with a comprehensive set of experiments. In particular, we demonstrate the utility of discords with objective experiments on domains as diverse as Space Shuttle telemetry monitoring, medicine, surveillance, and industry, and we demonstrate the effectiveness of our discord discovery algorithm with more than one million experiments, on 82 different datasets from diverse domains.

international conference on data mining | 2003

Clustering of time series subsequences is meaningless: implications for previous and future research

Eamonn J. Keogh; Jessica Lin; Wagner Truppel

Given the recent explosion of interest in streaming data and online algorithms, clustering of time-series subsequences, extracted via a sliding window, has received much attention. In this work, we make a surprising claim. Clustering of time-series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising because it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method that, based on the concept of time-series motifs, is able to meaningfully cluster subsequences on some time-series datasets.

extending database technology | 2004

Iterative Incremental Clustering of Time Series

Jessica Lin; Michail Vlachos; Eamonn J. Keogh; Dimitrios Gunopulos

We present a novel anytime version of partitional clustering algorithm, such as k-Means and EM, for time series. The algorithm works by leveraging off the multi-resolution property of wavelets. The dilemma of choosing the initial centers is mitigated by initializing the centers at each approximation level, using the final centers returned by the coarser representations. In addition to casting the clustering algorithms as anytime algorithms, this approach has two other very desirable properties. By working at lower dimensionalities we can efficiently avoid local minima. Therefore, the quality of the clustering is usually better than the batch algorithm. In addition, even if the algorithm is run to completion, our approach is much faster than its batch counterpart. We explain, and empirically demonstrate these surprising and desirable properties with comprehensive experiments on several publicly available real data sets. We further demonstrate that our approach can be generalized to a framework of much broader range of algorithms or data mining problems.

knowledge discovery and data mining | 2004

Visually mining and monitoring massive time series

Jessica Lin; Eamonn J. Keogh; Stefano Lonardi; Jeffrey P. Lankford; Donna M. Nystrom

Moments before the launch of every space vehicle, engineering discipline specialists must make a critical go/no-go decision. The cost of a false positive, allowing a launch in spite of a fault, or a false negative, stopping a potentially successful launch, can be measured in the tens of millions of dollars, not including the cost in morale and other more intangible detriments. The Aerospace Corporation is responsible for providing engineering assessments critical to the go/no-go decision for every Department of Defense space vehicle. These assessments are made by constantly monitoring streaming telemetry data in the hours before launch. We will introduce VizTree, a novel time-series visualization tool to aid the Aerospace analysts who must make these engineering assessments. VizTree was developed at the University of California, Riverside and is unique in that the same tool is used for mining archival data and monitoring incoming live telemetry. The use of a single tool for both aspects of the task allows a natural and intuitive transfer of mined knowledge to the monitoring task. Our visualization approach works by transforming the time series into a symbolic representation, and encoding the data in a modified suffix tree in which the frequency and other properties of patterns are mapped onto colors and other visual properties. We demonstrate the utility of our system by comparing it with state-of-the-art batch algorithms on several real and synthetic datasets.

Knowledge and Information Systems | 2006

Finding the most unusual time series subsequence: algorithms and applications

Eamonn J. Keogh; Jessica Lin; Sang-Hee Lee; Helga Van Herle

In this work we introduce the new problem of finding time seriesdiscords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. While discords have many uses for data mining, they are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. While the brute force algorithm to discover time series discords is quadratic in the length of the time series, we show a simple algorithm that is three to four orders of magnitude faster than brute force, while guaranteed to produce identical results. We evaluate our work with a comprehensive set of experiments on diverse data sources including electrocardiograms, space telemetry, respiration physiology, anthropological and video datasets.

intelligent information systems | 2012

Rotation-invariant similarity in time series using bag-of-patterns representation

Jessica Lin; Rohan Khade; Yuan Li

For more than a decade, time series similarity search has been given a great deal of attention by data mining researchers. As a result, many time series representations and distance measures have been proposed. However, most existing work on time series similarity search relies on shape-based similarity matching. While some of the existing approaches work well for short time series data, they typically fail to produce satisfactory results when the sequence is long. For long sequences, it is more appropriate to consider the similarity based on the higher-level structures. In this work, we present a histogram-based representation for time series data, similar to the “bag of words” approach that is widely accepted by the text mining and information retrieval communities. We performed extensive experiments and show that our approach outperforms the leading existing methods in clustering, classification, and anomaly detection on dozens of real datasets. We further demonstrate that the representation allows rotation-invariant matching in shape datasets.

Information Visualization | 2005

Visualizing and discovering non-trivial patterns in large time series databases

Jessica Lin; Eamonn J. Keogh; Stefano Lonardi

Data visualization techniques are very important for data analysis, since the human eye has been frequently advocated as the ultimate data-mining tool. However, there has been surprisingly little work on visualizing massive time series data sets. To this end, we developed VizTree, a time series pattern discovery and visualization system based on augmenting suffix trees. VizTree visually summarizes both the global and local structures of time series data at the same time. In addition, it provides novel interactive solutions to many pattern discovery problems, including the discovery of frequently occurring patterns (motif discovery), surprising patterns (anomaly detection), and query by content. VizTree works by transforming the time series into a symbolic representation, and encoding the data in a modified suffix tree in which the frequency and other properties of patterns are mapped onto colors and other visual properties. We demonstrate the utility of our system by comparing it with state-of-the-art batch algorithms on several real and synthetic data sets. Based on the tree structure, we further device a coefficient which measures the dissimilarity between any two time series. This coefficient is shown to be competitive with the well-known Euclidean distance.

PLOS ONE | 2009

Genetic Diversity of Arginine Catabolic Mobile Element in Staphylococcus epidermidis

Maria Miragaia; Hermínia de Lencastre; Francoise Perdreau-Remington; Henry F. Chambers; Julie Higashi; Paul M. Sullam; Jessica Lin; Kester I. Wong; Katherine A. King; Michael Otto; George F. Sensabaugh; Binh An Diep

Background The methicillin-resistant Staphylococcus aureus clone USA300 contains a novel mobile genetic element, arginine catabolic mobile element (ACME), that contributes to its enhanced capacity to grow and survive within the host. Although ACME appears to have been transferred into USA300 from S. epidermidis, the genetic diversity of ACME in the latter species remains poorly characterized. Methodology/Principal Findings To assess the prevalence and genetic diversity of ACME, 127 geographically diverse S. epidermidis isolates representing 86 different multilocus sequence types (STs) were characterized. ACME was found in 51% (65/127) of S. epidermidis isolates. The vast majority (57/65) of ACME-containing isolates belonged to the predominant S. epidermidis clonal complex CC2. ACME was often found in association with different allotypes of staphylococcal chromosome cassette mec (SCCmec) which also encodes the recombinase function that facilities mobilization ACME from the S. epidermidis chromosome. Restriction fragment length polymorphism, PCR scanning and DNA sequencing allowed for identification of 39 distinct ACME genetic variants that differ from one another in gene content, thereby revealing a hitherto uncharacterized genetic diversity within ACME. All but one ACME variants were represented by a single S. epidermidis isolate; the singular variant, termed ACME-I.02, was found in 27 isolates, all of which belonged to the CC2 lineage. An evolutionary model constructed based on the eBURST algorithm revealed that ACME-I.02 was acquired at least on 15 different occasions by strains belonging to the CC2 lineage. Conclusions/Significance ACME-I.02 in diverse S. epidermidis isolates were nearly identical in sequence to the prototypical ACME found in USA300 MRSA clone, providing further evidence for the interspecies transfer of ACME from S. epidermidis into USA300.

Explore More