Lawrence Cavedon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lawrence Cavedon is active.

Explore More

Publication

Featured researches published by Lawrence Cavedon.

BMC Bioinformatics | 2011

Automatic classification of sentences to support Evidence Based Medicine.

Su Nam Kim; David Martinez; Lawrence Cavedon; Lars Yencken

AimGiven a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels.MethodWe constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification.ResultsFor the classification tasks over all labels, our systems achieved micro-averaged f-scores of 80.9% and 66.9% over datasets of structured and unstructured abstracts respectively, using sequential features. In labeling only the key sentences, our systems produced f-scores of 89.3% and 74.0% over structured and unstructured abstracts respectively, using the same sequential features. The results over an external dataset were lower (f-scores of 63.1% for all labels, and 83.8% for key sentences).ConclusionsOf the features we used, the best for classifying any given sentence in an abstract were based on unigrams, section headings, and sequential information from preceding sentences. These features resulted in improved performance over a simple bag-of-words approach, and outperformed feature sets used in previous work.

Information Retrieval | 2009

Exploring criteria for successful query expansion in the genomic domain

Nicola Stokes; Yi Li; Lawrence Cavedon; Justin Zobel

Query Expansion is commonly used in Information Retrieval to overcome vocabulary mismatch issues, such as synonymy between the original query terms and a relevant document. In general, query expansion experiments exhibit mixed results. Overall TREC Genomics Track results are also mixed; however, results from the top performing systems provide strong evidence supporting the need for expansion. In this paper, we examine the conditions necessary for optimal query expansion performance with respect to two system design issues: IR framework and knowledge source used for expansion. We present a query expansion framework that improves Okapi baseline passage MAP performance by 185%. Using this framework, we compare and contrast the effectiveness of a variety of biomedical knowledge sources used by TREC 2006 Genomics Track participants for expansion. Based on the outcome of these experiments, we discuss the success factors required for effective query expansion with respect to various sources of term expansion, such as corpus-based cooccurrence statistics, pseudo-relevance feedback methods, and domain-specific and domain-independent ontologies and databases. Our results show that choice of document ranking algorithm is the most important factor affecting retrieval performance on this dataset. In addition, when an appropriate ranking algorithm is used, we find that query expansion with domain-specific knowledge sources provides an equally substantive gain in performance over a baseline system.

Database | 2013

Annotating the biomedical literature for the human variome

Karin Verspoor; Antonio Jimeno Yepes; Lawrence Cavedon; Tara McIntosh; Asha Herten-Crabb; Zoë Thomas; John-Paul Plazzer

This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome.

BMC Medical Informatics and Decision Making | 2010

Boolean versus ranked querying for biomedical systematic reviews

Sarvnaz Karimi; Stefan Pohl; Falk Scholer; Lawrence Cavedon; Justin Zobel

BackgroundThe process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Boolean search; however, this leaves the user(s) the requirement of examining every returned result. Further, our experience is that effective Boolean queries for this specific task are extremely difficult to formulate and typically require multiple iterations of refinement before being finalized.MethodsWe explore the effectiveness of using ranked retrieval as compared to Boolean querying for the purpose of constructing a systematic review. We conduct a series of experiments involving ranked retrieval, using queries defined methodologically, in an effort to understand the practicalities of incorporating ranked retrieval into the systematic search task.ResultsOur results show that ranked retrieval by itself is not viable for this search task requiring high recall. However, we describe a refinement of the standard Boolean search process and show that ranking within a Boolean result set can improve the overall search performance by providing early indication of the quality of the results, thereby speeding up the iterative query-refinement process.ConclusionsOutcomes of experiments suggest that an interactive query-development process using a hybrid ranked and Boolean retrieval system has the potential for significant time-savings over the current search process in the systematic reviewing.

north american chapter of the association for computational linguistics | 2009

Extraction of Named Entities from Tables in Gene Mutation Literature

Wern Wong; David Martinez; Lawrence Cavedon

We investigate the challenge of extracting information about genetic mutations from tables, an important source of information in scientific papers. We use various machine learning algorithms and feature sets, and evaluate performance in extracting fields associated with an existing handcreated database of mutations. We then show how classifying tabular information can be leveraged for the task of named entity detection for mutations.

australasian joint conference on artificial intelligence | 2009

Using Topic Models to Interpret MEDLINE's Medical Subject Headings

David Newman; Sarvnaz Karimi; Lawrence Cavedon

We consider the task of interpreting and understanding a taxonomy of classification terms applied to documents in a collection. In particular, we show how unsupervised topic models are useful for interpreting and understanding MeSH, the Medical Subject Headings applied to articles in MEDLINE. We introduce the resampled author model, which captures some of the advantages of both the topic model and the author-topic model. We demonstrate how topic models complement and add to the information conveyed in a traditional listing and description of a subject heading hierarchy.

cross language evaluation forum | 2006

NICTA I2D2 Group at GeoCLEF 2006

Yi Li; Nicola Stokes; Lawrence Cavedon; Alistair Moffat

We report on the experiments undertaken by the NICTA I2D2 Group as part of GeoCLEF 2006, as well as post-GeoCLEF evaluations and improvements to the submitted system. In particular, we used techniques to assign probabilistic likelihoods to geographic candidates for each identified geo-term, and a probabilistic IR engine. A normalisation process that adjusts term weights, so as to prevent expanded geo-terms from overwhelming non-geo terms, is shown to be crucial.

Archive | 2004

Extending Web Services Technologies

Lawrence Cavedon; Zakaria Maamar; David Martin; Boualem Benatallah

Service-Oriented Computing (SOC) is the computing paradigm that utilizes services as fundamental elements for developing applications/solutions. To build the service model, SOC relies on the Service Oriented Architecture (SOA), which is a way of reorganizing software applications and infrastructure into a set of interacting services. However, the basic SOA does not address overarching concerns such as management, service orchestration, service transaction management and coordination, security, and other concerns that apply to all components in a services architecture. In this paper we introduce an Extended Service Oriented Architecture that provides separate tiers for composing and coordinating services and for managing services in an open marketplace by employing grid services and discuss how agent technology can be used to support the functions of the Extended SOA.

Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics | 2010

Automatic classification of sentences for evidence based medicine

Su Nam Kim; David Martinez; Lawrence Cavedon

AIM Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD We construct a corpus of 1,000 medical abstracts annotated by hand with medical categories (e.g. Intervention, Outcome). We explore the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification. RESULT For the classification tasks over all labels, our systems achieved micro-averaged F-scores of 80.9% and 66.9% in structured and unstructured datasets respectively, using sequential features. In labeling only key sentences, our systems produced F-scores of 89.3% and 74.0% in structured and unstructured datasets respectively, using the same sequential features. The results over an external dataset were lower (F-scores of 63.1% for all-labels, and 83.8% for key sentences). CONCLUSION Of the features we used, the best for classifying any given sentence in an abstract are based on unigrams, section headings, and sequential information from preceding sentences. These features resulted in improved performance over a simple bag-of-words approach, and outperform feature sets used in previous work.

empirical methods in natural language processing | 2005

A Flexible Conversational Dialog System for MP3 Player

Fuliang Weng; Lawrence Cavedon; Badri Raghunathan; Danilo Mirkovic; Ben Bei; Heather Pon-Barry; Harry Bratt; Hua Cheng; Hauke Schmidt; Rohit Mishra; Brian Lathrop; Qi Zhang; Tobias Scheideck; Kui Xu; Tess Hand-Bender; Stanley Peters; Liz Shriberg; Carsten Bergmann

In recent years, an increasing number of new devices have found their way into the cars we drive. Speech-operated devices in particular provide a great service to drivers by minimizing distraction, so that they can keep their hands on the wheel and their eyes on the road. This presentation will demonstrate our latest development of an in-car dialog system for an MP3 player designed under a joint research effort from Bosch RTC, VW ERL, Stanford CSLI, and SRI STAR Lab funded by NIST ATP [Weng et al 2004] with this goal in mind. This project has developed a number of new technologies, some of which are already incorporated in the system. These include: end-pointing with prosodic cues, error identification and recovering strategies, flexible multi-threaded, multi-device dialog management, and content optimization and organization strategies. A number of important language phenomena are also covered in the systems functionality. For instance, one may use words relying on context, such as this, that, it, and them, to reference items mentioned in particular use contexts. Different types of verbal revision are also permitted by the system, providing a great convenience to its users. The system supports multi-threaded dialogs so that users can diverge to a different topic before the current one is finished and still come back to the first after the second topic is done. To lower the cognitive load on the drivers, the content optimization component organizes any information given to users based on ontological structures, and may also refine users queries via various strategies. Domain knowledge is represented using OWL, a web ontology language recommended by W3C, which should greatly facilitate its portability to new domains.The spoken dialog system consists of a number of components (see Fig. 1 for details). Instead of the hub architecture employed by Communicator projects [Senef et al, 1998], it is developed in Java and uses a flexible event-based, message-oriented middleware. This allows for dynamic registration of new components. Among the component modules in Figure 1, we use the Nuance speech recognition engine with class-based ngrams and dynamic grammars, and the Nuance Vocalizer as the TTS engine. The Speech Enhancer removes noises and echo. The Prosody module will provide additional features to the Natural Language Understanding (NLU) and Dialogue Manager (DM) modules to improve their performance.The NLU module takes a sequence of recognized words and tags, performs a deep linguistic analysis with probabilistic models, and produces an XML-based semantic feature structure representation. Parallel to the deep analysis, a topic classifier assigns top n topics to the utterance, which are used in the cases where the dialog manager cannot make any sense of the parsed structure. The NLU module also supports dynamic updates of the knowledge base.The CSLI DM module mediates and manages interaction. It uses the dialogue-move approach to maintain dialogue context, which is then used to interpret incoming utterances (including fragments and revisions), resolve NPs, construct salient responses, track issues, etc. Dialogue states can also be used to bias SR expectation and improve SR performance, as has been performed in previous applications of the DM. Detailed descriptions of the DM can be found in [Lemon et al 2002; Mirkovic & Cavedon 2005].The Knowledge Manager (KM) controls access to knowledge base sources (such as domain knowledge and device information) and their updates. Domain knowledge is structured according to domain-dependent ontologies. The current KM makes use of OWL, a W3C standard, to represent the ontological relationships between domain entities. Protege (http://protege.stanford.edu), a domain-independent ontology tool, is used to maintain the ontology offline. In a typical interaction, the DM converts a users query into a semantic frame (i.e. a set of semantic constraints) and sends this to the KM via the content optimizer.The Content Optimization module acts as an intermediary between the dialogue management module and the knowledge management module during the query process. It receives semantic frames from the DM, resolves possible ambiguities, and queries the KM. Depending on the items in the query result as well as the configurable properties, the module selects and performs an appropriate optimization strategy.Early evaluation shows that the system has a task completion rate of 80% on 11 tasks of MP3 player domain, ranging from playing requests to music database queries. Porting to a restaurant selection domain is currently under way.

Explore More