Gondy Leroy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gondy Leroy is active.

Explore More

Publication

Featured researches published by Gondy Leroy.

Journal of Biomedical Informatics | 2003

A shallow parser based on closed-class words to capture relations in biomedical text

Gondy Leroy; Hsinchun Chen; Jesse D. Martinez

Natural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun phrases automatically from free text has been developed and evaluated. It uses heuristics and a noun phraser to capture entities of interest in the text. Cascaded finite state automata structure the relations between individual entities. The automata are based on closed-class English words and model generic relations not limited to specific words. The parser also recognizes coordinating conjunctions and captures negation in text, a feature usually ignored by others. Three cancer researchers evaluated 330 relations extracted from 26 abstracts of interest to them. There were 296 relations correctly extracted from the abstracts resulting in 90% precision of the relations and an average of 11 correct relations per abstract.

pacific symposium on biocomputing | 2001

Filling preposition-based templates to capture information from medical abstracts.

Gondy Leroy; Hsinchun Chen

Due to the recent explosion of information in the biomedical field, it is hard for a single researcher to review the complex network involving genes, proteins, and interactions. We are currently building GeneScene, a toolkit that will assist researchers in reviewing existing literature, and report on the first phase in our development effort: extracting the relevant information from medical abstracts. We are developing a medical parser that extracts information, fills basic prepositional-based templates, and combines the templates to capture the underlying sentence logic. We tested our parser on 50 unseen abstracts and found that it extracted 246 templates with a precision of 70%. In comparison with many other techniques, more information was extracted without sacrificing precision. Future improvement in precision will be achieved by correcting three categories of errors.

International Journal of Medical Informatics | 2005

Effects of information and machine learning algorithms on word sense disambiguation with small datasets

Gondy Leroy; Thomas C. Rindflesch

Current approaches to word sense disambiguation use (and often combine) various machine learning techniques. Most refer to characteristics of the ambiguity and its surrounding words and are based on thousands of examples. Unfortunately, developing large training sets is burdensome, and in response to this challenge, we investigate the use of symbolic knowledge for small datasets. A naïve Bayes classifier was trained for 15 words with 100 examples for each. Unified Medical Language System (UMLS) semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in nine experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10% higher than the baseline; however, it varied from 8% deterioration to 29% improvement. To investigate this large variance, we performed several follow-up evaluations, testing additional algorithms (decision tree and neural network), and gold standards (per expert), but the results did not significantly differ. However, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators. We conclude that neither algorithm nor individual human behavior cause these large differences, but that the structure of the UMLS Metathesaurus (used to represent senses of ambiguous words) contributes to inaccuracies in the gold standard, leading to varied performance of word sense disambiguation techniques.

Journal of the Association for Information Science and Technology | 2005

Genescene: An Ontology-Enhanced Integration of Linguistic and Co-Occurrence Based Relations in Biomedical Texts

Gondy Leroy; Hsinchun Chen

The increasing amount of publicly available literature and experimental data in biomedicine makes it hard for biomedical researchers to stay up-to-date. Genescene is a toolkit that will help alleviate this problem by providing an overview of published literature content. We combined a linguistic parser with Concept Space, a co-occurrence based semantic net. Both techniques extract complementary biomedical relations between noun phrases from MEDLINE abstracts. The parser extracts precise and semantically rich relations from individual abstracts. Concept Space extracts relations that hold true for the collection of abstracts. The Gene Ontology, the Human Genome Nomenclature, and the Unified Medical Language System, are also integrated in Genescene. Currently, they are used to facilitate the integration of the two relation types, and to select the more interesting and high-quality relations for presentation. A user study focusing on p53 literature is discussed. All MEDLINE abstracts discussing p53 were processed in Genescene. Two researchers evaluated the terms and relations from several abstracts of interest to them. The results show that the terms were precise (precision 93%) and relevant, as were the parser relations (precision 95%). The Concept Space relations were more precise when selected with ontological knowledge (precision 78%) than without (60%).

Journal of Medical Systems | 2011

A Smart-Phone Application and a Companion Website for the Improvement of the Communication Skills of Children with Autism: Clinical Rationale, Technical Development and Preliminary Results

Gianluca De Leo; Carol Kernitzki Gonzales; Padmaja Battagiri; Gondy Leroy

Autism is a complex neurobiological disorder that is part of a group of disorders known as autism spectrum disorders (ASD). Today, one in 150 individuals is diagnosed with autism. Lack of social interaction and problems with communication are the main characteristics displayed by children with ASD. The Picture Exchange Communication System (PECS) is a communication system where children exchange visual symbols as a form of communication. The visual symbols are laminated pictures stored in a binder. We have designed, developed and are currently testing a software application, called PixTalk which works on any Windows Mobile Smart-phone. Teachers and caregivers can access a web site and select from an online library the images to be downloaded on to the Smart-phone. Children can browse and select images to express their intentions, desires, and emotions using PixTalk. Case study results indicate that PixTalk can be used as part of ongoing therapy.

interaction design and children | 2008

Smartphones to facilitate communication and improve social skills of children with severe autism spectrum disorder: special education teachers as proxies

Gianluca De Leo; Gondy Leroy

We present an overview of the approach we used and the challenges we encountered while designing software for smartphones to facilitate communication and improve social skills of children with severe autism spectrum disorder (ASD). We employed participatory design, using special education teachers of children with ASD as proxies for our target population.

Journal of Medical Internet Research | 2013

User Evaluation of the Effects of a Text Simplification Algorithm Using Term Familiarity on Perception, Understanding, Learning, and Information Retention

Gondy Leroy; James E. Endicott; David Kauchak; Obay Mouradi; Melissa Just

Background Adequate health literacy is important for people to maintain good health and manage diseases and injuries. Educational text, either retrieved from the Internet or provided by a doctor’s office, is a popular method to communicate health-related information. Unfortunately, it is difficult to write text that is easy to understand, and existing approaches, mostly the application of readability formulas, have not convincingly been shown to reduce the difficulty of text. Objective To develop an evidence-based writer support tool to improve perceived and actual text difficulty. To this end, we are developing and testing algorithms that automatically identify difficult sections in text and provide appropriate, easier alternatives; algorithms that effectively reduce text difficulty will be included in the support tool. This work describes the user evaluation with an independent writer of an automated simplification algorithm using term familiarity. Methods Term familiarity indicates how easy words are for readers and is estimated using term frequencies in the Google Web Corpus. Unfamiliar words are algorithmically identified and tagged for potential replacement. Easier alternatives consisting of synonyms, hypernyms, definitions, and semantic types are extracted from WordNet, the Unified Medical Language System (UMLS), and Wiktionary and ranked for a writer to choose from to simplify the text. We conducted a controlled user study with a representative writer who used our simplification algorithm to simplify texts. We tested the impact with representative consumers. The key independent variable of our study is lexical simplification, and we measured its effect on both perceived and actual text difficulty. Participants were recruited from Amazon’s Mechanical Turk website. Perceived difficulty was measured with 1 metric, a 5-point Likert scale. Actual difficulty was measured with 3 metrics: 5 multiple-choice questions alongside each text to measure understanding, 7 multiple-choice questions without the text for learning, and 2 free recall questions for information retention. Results Ninety-nine participants completed the study. We found strong beneficial effects on both perceived and actual difficulty. After simplification, the text was perceived as simpler (P<.001) with simplified text scoring 2.3 and original text 3.2 on the 5-point Likert scale (score 1: easiest). It also led to better understanding of the text (P<.001) with 11% more correct answers with simplified text (63% correct) compared to the original (52% correct). There was more learning with 18% more correct answers after reading simplified text compared to 9% more correct answers after reading the original text (P=.003). There was no significant effect on free recall. Conclusions Term familiarity is a valuable feature in simplifying text. Although the topic of the text influences the effect size, the results were convincing and consistent.

ieee international conference on technologies for homeland security | 2008

Crime Information Extraction from Police and Witness Narrative Reports

Chih Hao Ku; Alicia Iriberri; Gondy Leroy

To solve crimes, investigators often rely on interviews with witnesses, victims, or criminals themselves. The interviews are transcribed and the pertinent data is contained in narrative form. To solve one crime, investigators may need to interview multiple people and then analyze the narrative reports. There are several difficulties with this process: interviewing people is time consuming, the interviews - sometimes conducted by multiple officers - need to be combined, and the resulting information may still be incomplete. For example, victims or witnesses are often too scared or embarrassed to report or prefer to remain anonymous. We are developing an online reporting system that combines natural language processing with insights from the cognitive interview approach to obtain more information from witnesses and victims. We report here on information extraction from police and witness narratives. We achieved high precision, 94% and 96% and recall, 85% and 90%, for both narrative types.

acm transactions on management information systems | 2013

Smart Health and Wellbeing

Christopher C. Yang; Gondy Leroy; Sophia Ananiadou

Healthcare informatics has drawn substantial attention in recent years. Current work on healthcare informatics is highly interdisciplinary involving methodologies from computing, engineering, information science, behavior science, management science, social science, as well as many different areas in medicine and public health. Three major tracks, (i) systems, (ii) analytics, and (iii) human factors, can be identified. The systems track focuses on healthcare system architecture, framework, design, engineering, and application; the analytics track emphasizes data/information processing, retrieval, mining, analytics, as well as knowledge discovery; the human factors track targets the understanding of users or context, interface design, and user studies of healthcare applications. In this article, we discuss some of the latest development and introduce several articles selected for this special issue. We envision that the development of computing-oriented healthcare informatics research will continue to grow rapidly. The integration of different disciplines to advance the healthcare and wellbeing of our society will also be accelerated.

International Journal of Medical Informatics | 2010

The Influence of Text Characteristics on Perceived and Actual Difficulty of Health Information

Gondy Leroy; Stephen Helmreich; James R. Cowie

PURPOSE Willingness and ability to learn from health information in text are crucial for people to be informed and make better medical decisions. These two user characteristics are influenced by the perceived and actual difficulty of text. Our goal is to find text features that are indicative of perceived and actual difficulty so that barriers to reading can be lowered and understanding of information increased. METHODS We systematically manipulated three text characteristics, - overall sentence structure (active, passive, extraposed-subject, or sentential-subject), noun phrases complexity (simple or complex), and function word density (high or low), - which are more fine-grained metrics to evaluate text than the commonly used readability formulas. We measured perceived difficulty with individual sentences by asking consumers to choose the easiest and most difficult version of a sentence. We measured actual difficulty with entire paragraphs by posing multiple-choice questions to measure understanding and retention of information in easy and difficult versions of the paragraphs. RESULTS Based on a study with 86 participants, we found that low noun phrase complexity and high function words density lead to sentences being perceived as simpler. In the sentences with passive, sentential-subject, or extraposed-subject sentences, both main and interaction effects were significant (all p<.05). In active sentences, only noun phrase complexity mattered (p<.001). For the same group of participants, simplification of entire paragraphs based on these three linguistic features had only a small effect on understanding (p=.99) and no effect on retention of information. CONCLUSIONS Using grammatical text features, we could measure and improve the perceived difficulty of text. In contrast to expectations based on readability formulas, these grammatical manipulations had limited effects on actual difficulty and so were insufficient to simplify the text and improve understanding. Future work will include semantic measures and overall text composition and their effects on perceived and actual difficulty. LIMITATIONS These results are limited to grammatical features of text. The studies also used only one task, a question-answering task, to measure understanding of information.

Explore More