Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joel D. Martin is active.

Publication


Featured researches published by Joel D. Martin.


BMC Bioinformatics | 2003

PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine

Ian M. Donaldson; Joel D. Martin; Berry de Bruijn; Cheryl Wolting; Vicki Lay; Brigitte Tuekam; Shudong Zhang; Berivan Baskin; Gary D. Bader; Katerina Michalickova; Tony Pawson; Christopher W. V. Hogue

BackgroundThe majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.ResultsCross-validation estimated the support vector machines test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.ConclusionsMachine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.


International Journal of Human-computer Studies \/ International Journal of Man-machine Studies | 1995

Student assessment using Bayesian nets

Joel D. Martin; Kurt VanLehn

We describe OLAE as an assessment tool that collects data from students solving problems in introductory college physics, analyses that data with probabilistic methods that determine what knowledge the student is using, and flexibly presents the results of analysis. For each problem, OLAE automatically creates a Bayesian net that relates knowledge, represented as first-order rules, to particular actions, such as written equations. Using the resulting Bayesian network, OLAE observes a students behavior and computes the probabilities that the student knows and uses each of the rules.


Journal of the American Medical Informatics Association | 2011

Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010.

Berry de Bruijn; Colin Cherry; Svetlana Kiritchenko; Joel D. Martin; Xiaodan Zhu

Objective As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. Design The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. Measurements Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. Results The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second). Conclusion For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks.


Glia | 2007

Molecular Markers of Extracellular Matrix Remodeling in Glioblastoma Vessels: Microarray Study of Laser-Captured Glioblastoma Vessels

Ally Pen; Maria Moreno; Joel D. Martin; Danica Stanimirovic

Glioblastoma multiforme (GBM) are the most malignant and vascularized brain tumors. The aberrant vascular phenotype of GBM could be exploited for diagnosis or therapeutic targeting. This study identified new molecular markers of GBM vessels, using a combination of laser capture microdissection (LCM) microscopy, RNA amplification, and microarray analyses to compare vessels from nonmalignant human brain and GBM tumors. Forty‐two genes were differentially expressed in GBM vessels compared to nonmalignant brain vessels. Validation of differentially expressed genes was performed by literature mining, Q‐PCR, and immunohistochemistry. Among the differentially expressed genes, only 64% were previously associated with vessels, angiogenesis, gliomas, and/or cancer. The upregulation of genes encoding secreted extracellular proteins IGFBP7 and SPARC was confirmed by Q‐PCR in LCM‐captured vessels. Whereas SPARC and IGFBP7 protein were absent in nonmalignant brain vessels, a distinct immunoreactivity patterns were observed in GBM sections whereby SPARC was strongly expressed in perivascular cells adjacent to GBM vessels while GBM endothelial cells were immunostained for IGFBP7. IGFBP7 immunoreactivity was also detected on the abluminal side of GBM vessels deposited between strands of vascular basal lamina. The study discerns unique molecular characteristics of GBM vessels compared with nonmalignant brain vessels that could potentially be used for diagnostic or therapeutic purposes.


meeting of the association for computational linguistics | 2005

Word Alignment for Languages with Scarce Resources

Joel D. Martin; Rada Mihalcea; Ted Pedersen

This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the ACL 2005 Workshop on Building and Using Parallel Texts. The shared task included English-Inuktitut, Romanian-English, and English-Hindi sub-tasks, and drew the participation of ten teams from around the world with a total of 50 systems.


Information Processing and Management | 2015

Sentiment, emotion, purpose, and style in electoral tweets

Saif M. Mohammad; Xiaodan Zhu; Svetlana Kiritchenko; Joel D. Martin

We automatically compile a dataset of 2012 US presidential election tweets.We annotate the tweets for sentiment, emotion, style, and purpose.We show that the tweets convey negative emotions twice as often as positive.We describe two automatic systems that predict emotion and purpose in tweets. Social media is playing a growing role in elections world-wide. Thus, automatically analyzing electoral tweets has applications in understanding how public sentiment is shaped, tracking public sentiment and polarization with respect to candidates and issues, understanding the impact of tweets from various entities, etc. Here, for the first time, we automatically annotate a set of 2012 US presidential election tweets for a number of attributes pertaining to sentiment, emotion, purpose, and style by crowdsourcing. Overall, more than 100,000 crowdsourced responses were obtained for 13 questions on emotions, style, and purpose. Additionally, we show through an analysis of these annotations that purpose, even though correlated with emotions, is significantly different. Finally, we describe how we developed automatic classifiers, using features from state-of-the-art sentiment analysis systems, to predict emotion and purpose labels, respectively, in new unseen tweets. These experiments establish baseline results for automatic systems on this new data.


meeting of the association for computational linguistics | 2005

PORTAGE: A Phrase-Based Machine Translation System

Fatiha Sadat; Howard Johnson; Akakpo Agbago; George F. Foster; Roland Kuhn; Joel D. Martin; Aaron Tikuisis

This paper describes the participation of the Portage team at NRC Canada in the shared task of ACL 2005 Workshop on Building and Using Parallel Texts. We discuss Portage, a statistical phrase-based machine translation system, and present experimental results on the four language pairs of the shared task. First, we focus on the French-English task using multiple resources and techniques. Then we describe our contribution on the Finnish-English, Spanish-English and German-English language pairs using the provided data for the shared task.


north american chapter of the association for computational linguistics | 2003

Unsupervised learning of morphology for English and Inuktitut

Howard Johnson; Joel D. Martin

We describe a simple unsupervised technique for learning morphology by identifying hubs in an automaton. For our purposes, a hub is a node in a graph with in-degree greater than one and out-degree greater than one. We create a word-trie, transform it into a minimal DFA, then identify hubs. Those hubs mark the boundary between root and suffix, achieving similar performance to more complex mixtures of techniques.


BMC Medical Informatics and Decision Making | 2010

De-identification of primary care electronic medical records free-text data in Ontario, Canada

Karen Tu; Julie Klein-Geltink; Tezeta F. Mitiku; Chiriac Mihai; Joel D. Martin

BackgroundElectronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR dataMethodsWe used deid open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers.ResultsWe found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively.ConclusionThe deid program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content.


Journal of the American Medical Informatics Association | 2013

À la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge

Colin Cherry; Xiaodan Zhu; Joel D. Martin; Berry de Bruijn

Objective An analysis of the timing of events is critical for a deeper understanding of the course of events within a patient record. The 2012 i2b2 NLP challenge focused on the extraction of temporal relationships between concepts within textual hospital discharge summaries. Materials and methods The team from the National Research Council Canada (NRC) submitted three system runs to the second track of the challenge: typifying the time-relationship between pre-annotated entities. The NRC system was designed around four specialist modules containing statistical machine learning classifiers. Each specialist targeted distinct sets of relationships: local relationships, ‘sectime’-type relationships, non-local overlap-type relationships, and non-local causal relationships. Results The best NRC submission achieved a precision of 0.7499, a recall of 0.6431, and an F1 score of 0.6924, resulting in a statistical tie for first place. Post hoc improvements led to a precision of 0.7537, a recall of 0.6455, and an F1 score of 0.6954, giving the highest scores reported on this task to date. Discussion and conclusions Methods for general relation extraction extended well to temporal relations, and gave top-ranked state-of-the-art results. Careful ordering of predictions within result sets proved critical to this success.

Collaboration


Dive into the Joel D. Martin's collaboration.

Top Co-Authors

Avatar

Berry de Bruijn

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaodan Zhu

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Howard Johnson

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Colin Cherry

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kurt VanLehn

Arizona State University

View shared research outputs
Top Co-Authors

Avatar

Alain Désilets

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge