Burr Settles
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Burr Settles.
empirical methods in natural language processing | 2008
Burr Settles; Mark Craven
Active learning is well-suited to many problems in natural language processing, where unlabeled data may be abundant but annotation is slow and expensive. This paper aims to shed light on the best active learning approaches for sequence labeling tasks such as information extraction and document segmentation. We survey previously used query selection strategies for sequence models, and propose several novel algorithms to address their shortcomings. We also conduct a large-scale empirical comparison using multiple corpora, which demonstrates that our proposed methods advance the state of the art.
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications | 2004
Burr Settles
As the wealth of biomedical knowledge in the form of literature increases, there is a rising need for effective natural language processing tools to assist in organizing, curating, and retrieving this information. To that end, named entity recognition (the task of identifying words and phrases in free text that belong to certain classes of interest) is an important first step for many of these larger information management goals.
ACS Chemical Biology | 2011
Brian C. Smith; Burr Settles; William C. Hallows; Mark Craven; John M. Denu
Accumulating evidence suggests that reversible protein acetylation may be a major regulatory mechanism that rivals phosphorylation. With the recent cataloging of thousands of acetylation sites on hundreds of proteins comes the challenge of identifying the acetyltransferases and deacetylases that regulate acetylation levels. Sirtuins are a conserved family of NAD(+)-dependent protein deacetylases that are implicated in genome maintenance, metabolism, cell survival, and lifespan. SIRT3 is the dominant protein deacetylase in mitochondria, and emerging evidence suggests that SIRT3 may control major pathways by deacetylation of central metabolic enzymes. Here, to identify potential SIRT3 substrates, we have developed an unbiased screening strategy that involves a novel acetyl-lysine analogue (thiotrifluoroacetyl-lysine), SPOT-peptide libraries, machine learning, and kinetic validation. SPOT peptide libraries based on known and potential mitochondrial acetyl-lysine sites were screened for SIRT3 binding and then analyzed using machine learning to establish binding trends. These trends were then applied to the mitochondrial proteome as a whole to predict binding affinity of all lysine sites within human mitochondria. Machine learning prediction of SIRT3 binding correlated with steady-state kinetic k(cat)/K(m) values for 24 acetyl-lysine peptides that possessed a broad range of predicted binding. Thus, SPOT peptide-binding screens and machine learning prediction provides an accurate and efficient method to evaluate sirtuin substrate specificity from a relatively small learning set. These analyses suggest potential SIRT3 substrates involved in several metabolic pathways such as the urea cycle, ATP synthesis, and fatty acid oxidation.
european conference on machine learning | 2010
Edith Law; Burr Settles; Tom M. Mitchell
Most approaches to classifying media content assume a fixed, closed vocabulary of labels. In contrast, we advocate machine learning approaches which take advantage of the millions of free-form tags obtainable via online crowd-sourcing platforms and social tagging websites. The use of such open vocabularies presents learning challenges due to typographical errors, synonymy, and a potentially unbounded set of tag labels. In this work, we present a new approach that organizes these noisy tags into well-behaved semantic classes using topic modeling, and learn to predict tags accurately using a mixture of topic classes. This method can utilize an arbitrary open vocabulary of tags, reduces training time by 94% compared to learning from these tags directly, and achieves comparable performance for classification and superior performance for retrieval. We also demonstrate that on open vocabulary tasks, human evaluations are essential for measuring the true performance of tag classifiers, which traditional evaluation methods will consistently underestimate. We focus on the domain of tagging music clips, and demonstrate our results using data collected with a human computation game called TagATune.
human factors in computing systems | 2013
Burr Settles; Steven P. Dow
In online creative communities, members work together to produce music, movies, games, and other cultural products. Despite the proliferation of collaboration in these communities, we know little about how these teams form and what leads to their ultimate success. Building on theories of social identity and exchange, we present an exploratory study of an online songwriting community. We analyze four years of longitudinal behavioral data using a novel path-based regression model that accurately predicts and reveals key variables about collab formation. Combined with a large-scale survey of members, we find that communication, nuanced complementary interest and status, and a balanced effort from both parties contribute to successful collaborations. We also discuss several applications of these findings for socio-technical infrastructures that support online creative production.
ACM Crossroads Student Magazine | 2013
Steven P. Dow; Burr Settles
A study of the online music writing community FAWM.ORG reveals that people who collaborate share less in common than you might think.
national conference on artificial intelligence | 2010
Andrew Carlson; Justin Betteridge; Bryan Kisiel; Burr Settles; Estevam R. Hruschka; Tom M. Mitchell
Bioinformatics | 2005
Burr Settles
neural information processing systems | 2007
Burr Settles; Mark Craven; Soumya Ray
Archive | 2008
Burr Settles; Mark Craven; Lewis Friedland