Byron C. Wallace | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Byron C. Wallace is active.

Explore More

Publication

Featured researches published by Byron C. Wallace.

international joint conference on natural language processing | 2015

Sparse, Contextually Informed Models for Irony Detection: Exploiting User Communities, Entities and Sentiment

Byron C. Wallace; Do Kook Choe; Eugene Charniak

Automatically detecting verbal irony (roughly, sarcasm) in online content is important for many practical applications (e.g., sentiment detection), but it is difficult. Previous approaches have relied predominantly on signal gleaned from word counts and grammatical cues. But such approaches fail to exploit the context in which comments are embedded. We thus propose a novel strategy for verbal irony classification that exploits contextual features, specifically by combining noun phrases and sentiment extracted from comments with the forum type (e.g., conservative or liberal) to which they were posted. We show that this approach improves verbal irony classification performance. Furthermore, because this method generates a very large feature space (and we expect predictive contextual features to be strong but few), we propose a mixed regularization strategy that places a sparsity-inducing `1 penalty on the contextual feature weights on top of the `2 penalty applied to all model coefficients. This increases model sparsity and reduces the variance of model performance.

Journal of the American Medical Informatics Association | 2017

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.

Byron C. Wallace; Anna Noel-Storr; Iain James Marshall; Aaron M. Cohen; Neil R. Smalheiser; James Thomas

Abstract Objectives Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. Methods We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. Results Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%–99% recall) with substantially less effort (we observed a reduction of around 60%–80%) than relying on manual screening alone. Conclusions Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.

Methods in Ecology and Evolution | 2017

OpenMEE: Intuitive, open‐source software for meta‐analysis in ecology and evolutionary biology

Byron C. Wallace; Marc J. Lajeunesse; George Dietz; Issa J. Dahabreh; Thomas A Trikalinos; Christopher H. Schmid; Jessica Gurevitch

Summary Meta-analysis and meta-regression are statistical methods for synthesizing and modelling the results of different studies, and are critical research synthesis tools in ecology and evolutionary biology (E&E). However, many E&E researchers carry out meta-analyses using software that is limited in its statistical functionality and is not easily updatable. It is likely that these software limitations have slowed the uptake of new methods in E&E and limited the scope and quality of inferences from research syntheses. We developed OpenMEE: Open Meta-analyst for Ecology and Evolution to address the need for advanced, easy-to-use software for meta-analysis and meta-regression. OpenMEE has a cross-platform, easy-to-use graphical user interface (GUI) that gives E&E researchers access to the diverse and advanced statistical functionalities offered in R, without requiring knowledge of R programming. OpenMEE offers a suite of advanced meta-analysis and meta-regression methods for synthesizing continuous and categorical data, including meta-regression with multiple covariates and their interactions, phylogenetic analyses, and simple missing data imputation. OpenMEE also supports data importing and exporting, exploratory data analysis, graphing of data, and summary table generation. As intuitive, open-source, free software for advanced methods in meta-analysis, OpenMEE meets the current and pressing needs of the E&E community for teaching meta-analysis and conducting high-quality syntheses. Because OpenMEEs statistical components are written in R, new methods and packages can be rapidly incorporated into the software. To fully realize the potential of OpenMEE, we encourage community development with an aim to advance the capabilities of meta-analyses in E&E.

north american chapter of the association for computational linguistics | 2016

MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification

Ye Zhang; Stephen Roller; Byron C. Wallace

We introduce a novel, simple convolution neural network (CNN) architecture - multi-group norm constraint CNN (MGNC-CNN) that capitalizes on multiple sets of word embeddings for sentence classification. MGNC-CNN extracts features from input embedding sets independently and then joins these at the penultimate layer in the network to form a final feature vector. We then adopt a group regularization strategy that differentially penalizes weights associated with the subcomponents generated from the respective embedding sets. This model is much simpler than comparable alternative architectures and requires substantially less training time. Furthermore, it is flexible in that it does not require input word embeddings to be of the same dimensionality. We show that MGNC-CNN consistently outperforms baseline models.

conference on computational natural language learning | 2016

Modelling Context with User Embeddings for Sarcasm Detection in Social Media.

Silvio Amir; Byron C. Wallace; Hao Lyu; Paula Carvalho; Mário J. Silva

We introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such speaker information. Current methods have achieved this by way of laborious feature engineering. By contrast, we propose to automatically learn and then exploit user embeddings, to be used in concert with lexical signals to recognize sarcasm. Our approach does not require elaborate feature engineering (and concomitant data scraping); fitting user embeddings requires only the text from their previous posts. The experimental results show that our model outperforms a state-of-the-art approach leveraging an extensive set of carefully crafted features.

empirical methods in natural language processing | 2016

Rationale-Augmented Convolutional Neural Networks for Text Classification.

Ye Zhang; Iain James Marshall; Byron C. Wallace

We present a new Convolutional Neural Network (CNN) model for text classification that jointly exploits labels on documents and their constituent sentences. Specifically, we consider scenarios in which annotators explicitly mark sentences (or snippets) that support their overall document categorization, i.e., they provide rationales. Our model exploits such supervision via a hierarchical approach in which each document is represented by a linear combination of the vector representations of its component sentences. We propose a sentence-level convolutional model that estimates the probability that a given sentence is a rationale, and we then scale the contribution of each sentence to the aggregate document representation in proportion to these estimates. Experiments on five classification datasets that have document labels and associated rationales demonstrate that our approach consistently outperforms strong baselines. Moreover, our model naturally provides explanations for its predictions.

european conference on machine learning | 2014

Spá: a web-based viewer for text mining in evidence based medicine

Joël Kuiper; Iain James Marshall; Byron C. Wallace; Morris A. Swertz

Summarizing the evidence about medical interventions is an immense undertaking, in part because unstructured Portable Document Format (PDF) documents remain the main vehicle for disseminating scientific findings. Clinicians and researchers must therefore manually extract and synthesise information from these PDFs. We introduce Spa1,2 a web-based viewer that enables automated annotation and summarisation of PDFs via machine learning. To illustrate its functionality, we use Spa to semi-automate the assessment of bias in clinical trials. Spa has a modular architecture, therefore the tool may be widely useful in other domains with a PDF-based literature, including law, physics, and biology.

Systematic Reviews | 2016

Evaluating Data Abstraction Assistant, a novel software application for data abstraction during systematic reviews: protocol for a randomized controlled trial.

Ian J Saldanha; Christopher H. Schmid; Joseph Lau; Kay Dickersin; Jesse A. Berlin; Jens Jap; Bryant T Smith; Simona Carini; Wiley Chan; Berry de Bruijn; Byron C. Wallace; Susan Hutfless; Ida Sim; M. Hassan Murad; Sandra A. Walsh; Elizabeth J. Whamond; Tianjing Li

BackgroundData abstraction, a critical systematic review step, is time-consuming and prone to errors. Current standards for approaches to data abstraction rest on a weak evidence base. We developed the Data Abstraction Assistant (DAA), a novel software application designed to facilitate the abstraction process by allowing users to (1) view study article PDFs juxtaposed to electronic data abstraction forms linked to a data abstraction system, (2) highlight (or “pin”) the location of the text in the PDF, and (3) copy relevant text from the PDF into the form. We describe the design of a randomized controlled trial (RCT) that compares the relative effectiveness of (A) DAA-facilitated single abstraction plus verification by a second person, (B) traditional (non-DAA-facilitated) single abstraction plus verification by a second person, and (C) traditional independent dual abstraction plus adjudication to ascertain the accuracy and efficiency of abstraction.MethodsThis is an online, randomized, three-arm, crossover trial. We will enroll 24 pairs of abstractors (i.e., sample size is 48 participants), each pair comprising one less and one more experienced abstractor. Pairs will be randomized to abstract data from six articles, two under each of the three approaches. Abstractors will complete pre-tested data abstraction forms using the Systematic Review Data Repository (SRDR), an online data abstraction system. The primary outcomes are (1) proportion of data items abstracted that constitute an error (compared with an answer key) and (2) total time taken to complete abstraction (by two abstractors in the pair, including verification and/or adjudication).DiscussionThe DAA trial uses a practical design to test a novel software application as a tool to help improve the accuracy and efficiency of the data abstraction process during systematic reviews. Findings from the DAA trial will provide much-needed evidence to strengthen current recommendations for data abstraction approaches.Trial registrationThe trial is registered at National Information Center on Health Services Research and Health Care Technology (NICHSR) under Registration # HSRP20152269: https://wwwcf.nlm.nih.gov/hsr_project/view_hsrproj_record.cfm?NLMUNIQUE_ID=20152269&SEARCH_FOR=Tianjing%20Li. All items from the World Health Organization Trial Registration Data Set are covered at various locations in this protocol. Protocol version and date: This is version 2.0 of the protocol, dated September 6, 2016. As needed, we will communicate any protocol amendments to the Institutional Review Boards (IRBs) of Johns Hopkins Bloomberg School of Public Health (JHBSPH) and Brown University. We also will make appropriate as-needed modifications to the NICHSR website in a timely fashion.

Research Synthesis Methods | 2018

Machine Learning for Identifying Randomized Controlled Trials: an evaluation and practitioner’s guide

Iain James Marshall; Anna Noel-Storr; Joël Kuiper; James Thomas; Byron C. Wallace

Machine learning (ML) algorithms have proven highly accurate for identifying Randomized Controlled Trials (RCTs) but are not used much in practice, in part because the best way to make use of the technology in a typical workflow is unclear. In this work, we evaluate ML models for RCT classification (support vector machines, convolutional neural networks, and ensemble approaches). We trained and optimized support vector machine and convolutional neural network models on the titles and abstracts of the Cochrane Crowd RCT set. We evaluated the models on an external dataset (Clinical Hedges), allowing direct comparison with traditional database search filters. We estimated area under receiver operating characteristics (AUROC) using the Clinical Hedges dataset. We demonstrate that ML approaches better discriminate between RCTs and non‐RCTs than widely used traditional database search filters at all sensitivity levels; our best‐performing model also achieved the best results to date for ML in this task (AUROC 0.987, 95% CI, 0.984‐0.989). We provide practical guidance on the role of ML in (1) systematic reviews (high‐sensitivity strategies) and (2) rapid reviews and clinical question answering (high‐precision strategies) together with recommended probability cutoffs for each use case. Finally, we provide open‐source software to enable these approaches to be used in practice.

Research Synthesis Methods | 2017

An exploration of crowdsourcing citation screening for systematic reviews

Michael L. Mortensen; Gaelen P Adam; Thomas A Trikalinos; Tim Kraska; Byron C. Wallace

Systematic reviews are increasingly used to inform health care decisions, but are expensive to produce. We explore the use of crowdsourcing (distributing tasks to untrained workers via the web) to reduce the cost of screening citations. We used Amazon Mechanical Turk as our platform and 4 previously conducted systematic reviews as examples. For each citation, workers answered 4 or 5 questions that were equivalent to the eligibility criteria. We aggregated responses from multiple workers into an overall decision to include or exclude the citation using 1 of 9 algorithms and compared the performance of these algorithms to the corresponding decisions of trained experts. The most inclusive algorithm (designating a citation as relevant if any worker did) identified 95% to 99% of the citations that were ultimately included in the reviews while excluding 68% to 82% of irrelevant citations. Other algorithms increased the fraction of irrelevant articles excluded at some cost to the inclusion of relevant studies. Crowdworkers completed screening in 4 to 17 days, costing

Explore More