Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Weng-Keen Wong is active.

Publication


Featured researches published by Weng-Keen Wong.


Genome Research | 2010

Genome-wide mapping of alternative splicing in Arabidopsis thaliana

Sergei A. Filichkin; Henry D. Priest; Scott A. Givan; Rongkun Shen; Douglas W. Bryant; Samuel E. Fox; Weng-Keen Wong; Todd C. Mockler

Alternative splicing can enhance transcriptome plasticity and proteome diversity. In plants, alternative splicing can be manifested at different developmental stages, and is frequently associated with specific tissue types or environmental conditions such as abiotic stress. We mapped the Arabidopsis transcriptome at single-base resolution using the Illumina platform for ultrahigh-throughput RNA sequencing (RNA-seq). Deep transcriptome sequencing confirmed a majority of annotated introns and identified thousands of novel alternatively spliced mRNA isoforms. Our analysis suggests that at least approximately 42% of intron-containing genes in Arabidopsis are alternatively spliced; this is significantly higher than previous estimates based on cDNA/expressed sequence tag sequencing. Random validation confirmed that novel splice isoforms empirically predicted by RNA-seq can be detected in vivo. Novel introns detected by RNA-seq were substantially enriched in nonconsensus terminal dinucleotide splice signals. Alternative isoforms with premature termination codons (PTCs) comprised the majority of alternatively spliced transcripts. Using an example of an essential circadian clock gene, we show that intron retention can generate relatively abundant PTC(+) isoforms and that this specific event is highly conserved among diverse plant species. Alternatively spliced PTC(+) isoforms can be potentially targeted for degradation by the nonsense mediated mRNA decay (NMD) surveillance machinery or regulate the level of functional transcripts by the mechanism of regulated unproductive splicing and translation (RUST). We demonstrate that the relative ratios of the PTC(+) and reference isoforms for several key regulatory genes can be considerably shifted under abiotic stress treatments. Taken together, our results suggest that like in animals, NMD and RUST may be widespread in plants and may play important roles in regulating gene expression.


BMC Bioinformatics | 2009

QSRA – a quality-value guided de novo short read assembler

Douglas W. Bryant; Weng-Keen Wong; Todd C. Mockler

BackgroundNew rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data.ResultsWe have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality.ConclusionQSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.


Medicine and Science in Sports and Exercise | 2012

Artificial Neural Networks to Predict Activity Type and Energy Expenditure in Youth.

Stewart G. Trost; Weng-Keen Wong; Karen A. Pfeiffer; Yonglei Zheng

UNLABELLED Previous studies have demonstrated that pattern recognition approaches to accelerometer data reduction are feasible and moderately accurate in classifying activity type in children. Whether pattern recognition techniques can be used to provide valid estimates of physical activity (PA) energy expenditure in youth remains unexplored in the research literature. PURPOSE The objective of this study is to develop and test artificial neural networks (ANNs) to predict PA type and energy expenditure (PAEE) from processed accelerometer data collected in children and adolescents. METHODS One hundred participants between the ages of 5 and 15 yr completed 12 activity trials that were categorized into five PA types: sedentary, walking, running, light-intensity household activities or games, and moderate-to-vigorous-intensity games or sports. During each trial, participants wore an ActiGraph GT1M on the right hip, and VO2 was measured using the Oxycon Mobile (Viasys Healthcare, Yorba Linda, CA) portable metabolic system. ANNs to predict PA type and PAEE (METs) were developed using the following features: 10th, 25th, 50th, 75th, and 90th percentiles and the lag one autocorrelation. To determine the highest time resolution achievable, we extracted features from 10-, 15-, 20-, 30-, and 60-s windows. Accuracy was assessed by calculating the percentage of windows correctly classified and root mean square error (RMSE). RESULTS As window size increased from 10 to 60 s, accuracy for the PA-type ANN increased from 81.3% to 88.4%. RMSE for the MET prediction ANN decreased from 1.1 METs to 0.9 METs. At any given window size, RMSE values for the MET prediction ANN were 30-40% lower than the conventional regression-based approaches. CONCLUSIONS ANNs can be used to predict both PA type and PAEE in children and adolescents using count data from a single waist mounted accelerometer.


intelligent user interfaces | 2009

Fixing the program my computer learned: barriers for end users, challenges for the machine

Todd Kulesza; Weng-Keen Wong; Simone Stumpf; Stephen Perona; Rachel White; Margaret M. Burnett; Ian Oberst; Andrew J. Ko

The results of a machine learning from user behavior can be thought of as a program, and like all programs, it may need to be debugged. Providing ways for the user to debug it matters, because without the ability to fix errors users may find that the learned programs errors are too damaging for them to be able to trust such programs. We present a new approach to enable end users to debug a learned program. We then use an early prototype of our new approach to conduct a formative study to determine where and when debugging issues arise, both in general and also separately for males and females. The results suggest opportunities to make machine-learned programs more effective tools.


Journal of Urban Health-bulletin of The New York Academy of Medicine | 2003

WSARE: What's Strange About Recent Events?

Weng-Keen Wong; Andrew W. Moore; Gregory F. Cooper; Michael M. Wagner

This article presents an, algorithm for performing early detection of disease outbreaks by searching a database of emergency department cases for anomalous patterns. Traditional techniques for anomaly detection are unsatisfactory for this problem because they identify individual data points that are rare due to particular combinations of features. Thus, these traditional algorithms discover isolated outliers of particularly strange events, such as someone accidentally shooting their ear, that are not indicative of a new outbreak. Instead, we would like to detect groups with specific characteristics that have a recent pattern of illness that is anomalous relative to historical patterns. We propose using an anomaly detection algorithm that would characterize each anomalous pattern with a rule. The significance of each rule would be carefully evaluated using the Fisher exact test and a randomization test. In this study, we compared our algorithm with a standard detection algorithm by measuring the number of false positives and the timeliness of detection. Simulated data, produced by a simulator that creates the effects of an epidemic on a city, were used for evaluation. The results indicate that our algorithm has significantly better detection times for common significance thresholds while having a slightly higher false positive rate.


Physiological Measurement | 2014

Machine learning for activity recognition: hip versus wrist data

Stewart G. Trost; Yonglei Zheng; Weng-Keen Wong

PROBLEM ADDRESSED Wrist-worn accelerometers are associated with greater compliance. However, validated algorithms for predicting activity type from wrist-worn accelerometer data are lacking. This study compared the activity recognition rates of an activity classifier trained on acceleration signal collected on the wrist and hip. METHODOLOGY 52 children and adolescents (mean age 13.7  ±  3.1 year) completed 12 activity trials that were categorized into 7 activity classes: lying down, sitting, standing, walking, running, basketball, and dancing. During each trial, participants wore an ActiGraph GT3X+ tri-axial accelerometer on the right hip and the non-dominant wrist. Features were extracted from 10-s windows and inputted into a regularized logistic regression model using R (Glmnet + L1). RESULTS Classification accuracy for the hip and wrist was 91.0% ± 3.1% and 88.4% ± 3.0%, respectively. The hip model exhibited excellent classification accuracy for sitting (91.3%), standing (95.8%), walking (95.8%), and running (96.8%); acceptable classification accuracy for lying down (88.3%) and basketball (81.9%); and modest accuracy for dance (64.1%). The wrist model exhibited excellent classification accuracy for sitting (93.0%), standing (91.7%), and walking (95.8%); acceptable classification accuracy for basketball (86.0%); and modest accuracy for running (78.8%), lying down (74.6%) and dance (69.4%).Potential Impact: Both the hip and wrist algorithms achieved acceptable classification accuracy, allowing researchers to use either placement for activity recognition.


Bioinformatics | 2010

Supersplat—spliced RNA-seq alignment

Douglas W. Bryant; Rongkun Shen; Henry D. Priest; Weng-Keen Wong; Todd C. Mockler

Motivation: High-throughput sequencing technologies have recently made deep interrogation of expressed transcript sequences practical, both economically and temporally. Identification of intron/exon boundaries is an essential part of genome annotation, yet remains a challenge. Here, we present supersplat, a method for unbiased splice-junction discovery through empirical RNA-seq data. Results: Using a genomic reference and RNA-seq high-throughput sequencing datasets, supersplat empirically identifies potential splice junctions at a rate of ∼11.4 million reads per hour. We further benchmark the performance of the algorithm by mapping Illumina RNA-seq reads to identify introns in the genome of the reference dicot plant Arabidopsis thaliana and we demonstrate the utility of supersplat for de novo empirical annotation of splice junctions using the reference monocot plant Brachypodium distachyon. Availability: Implemented in C++, supersplat source code and binaries are freely available on the web at http://mocklerlab-tools.cgrb.oregonstate.edu/ Contact: [email protected]


Ksii Transactions on Internet and Information Systems | 2011

Why-oriented end-user debugging of naive Bayes text classification

Todd Kulesza; Simone Stumpf; Weng-Keen Wong; Margaret M. Burnett; Stephen Perona; Andrew J. Ko; Ian Oberst

Machine learning techniques are increasingly used in intelligent assistants, that is, software targeted at and continuously adapting to assist end users with email, shopping, and other tasks. Examples include desktop SPAM filters, recommender systems, and handwriting recognition. Fixing such intelligent assistants when they learn incorrect behavior, however, has received only limited attention. To directly support end-user “debugging” of assistant behaviors learned via statistical machine learning, we present a Why-oriented approach which allows users to ask questions about how the assistant made its predictions, provides answers to these “why” questions, and allows users to interactively change these answers to debug the assistants current and future predictions. To understand the strengths and weaknesses of this approach, we then conducted an exploratory study to investigate barriers that participants could encounter when debugging an intelligent assistant using our approach, and the information those participants requested to overcome these barriers. To help ensure the inclusiveness of our approach, we also explored how gender differences played a role in understanding barriers and information needs. We then used these results to consider opportunities for Why-oriented approaches to address user barriers and information needs.


knowledge discovery and data mining | 2013

Systematic construction of anomaly detection benchmarks from real data

Andrew Emmott; Shubhomoy Das; Thomas G. Dietterich; Alan Fern; Weng-Keen Wong

Research in anomaly detection suffers from a lack of realistic and publicly-available problem sets. This paper discusses what properties such problem sets should possess. It then introduces a methodology for transforming existing classification data sets into ground-truthed benchmark data sets for anomaly detection. The methodology produces data sets that vary along three important dimensions: (a) point difficulty, (b) relative frequency of anomalies, and (c) clusteredness. We apply our generated datasets to benchmark several popular anomaly detection algorithms under a range of different conditions.


international conference on data mining | 2010

Modeling Experts and Novices in Citizen Science Data for Species Distribution Modeling

Jun Yu; Weng-Keen Wong; Rebecca A. Hutchinson

Citizen scientists, who are volunteers from the community that participate as field assistants in scientific studies [3], enable research to be performed at much larger spatial and temporal scales than trained scientists can cover. Species distribution modeling [6], which involves understanding species-habitat relationships, is a research area that can benefit greatly from citizen science. The eBird project [16] is one of the largest citizen science programs in existence. By allowing birders to upload observations of bird species to an online database, eBird can provide useful data for species distribution modeling. However, since birders vary in their levels of expertise, the quality of data submitted to eBird is often questioned. In this paper, we develop a probabilistic model called the Occupancy-Detection-Expertise (ODE) model that incorporates the expertise of birders submitting data to eBird. We show that modeling the expertise of birders can improve the accuracy of predicting observations of a bird species at a site. In addition, we can use the ODE model for two other tasks: predicting birder expertise given their history of eBird checklists and identifying bird species that are difficult for novices to detect.

Collaboration


Dive into the Weng-Keen Wong's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jun Yu

Oregon State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Todd Kulesza

Oregon State University

View shared research outputs
Top Co-Authors

Avatar

Andrew W. Moore

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ian Oberst

Oregon State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge