Bob J. A. Schijvenaars
Erasmus University Medical Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bob J. A. Schijvenaars.
Bioinformatics | 2009
Kristina M. Hettne; R.H. Stierum; Martijn J. Schuemie; Peter J. M. Hendriksen; Bob J. A. Schijvenaars; Erik M. van Mulligen; Jos Kleinjans; Jan A. Kors
MOTIVATION From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database identifiers. RESULTS We developed a dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus. Rule-based term filtering, manual check of highly frequent terms and disambiguation rules were applied. We tested the combined dictionary and the dictionaries derived from the individual resources on an annotated corpus, and conclude the following: (i) each of the different processing steps increase precision with a minor loss of recall; (ii) the overall performance of the combined dictionary is acceptable (precision 0.67, recall 0.40 (0.80 for trivial names); (iii) the combined dictionary performed better than the dictionary in the chemical recognizer OSCAR3; (iv) the performance of a dictionary based on ChemIDplus alone is comparable to the performance of the combined dictionary. AVAILABILITY The combined dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web site http://www.biosemantics.org/chemlist.
Bioinformatics | 2004
Martijn J. Schuemie; Marc Weeber; Bob J. A. Schijvenaars; E.M. van Mulligen; C C van der Eijk; Rob Jelier; Barend Mons; Jan A. Kors
MOTIVATION Full-text documents potentially hold more information than their abstracts, but require more resources for processing. We investigated the added value of full text over abstracts in terms of information content and occurrences of gene symbol--gene name combinations that can resolve gene-symbol ambiguity. RESULTS We analyzed a set of 3902 biomedical full-text articles. Different keyword measures indicate that information density is highest in abstracts, but that the information coverage in full texts is much greater than in abstracts. Analysis of five different standard sections of articles shows that the highest information coverage is located in the results section. Still, 30-40% of the information mentioned in each section is unique to that section. Only 30% of the gene symbols in the abstract are accompanied by their corresponding names, and a further 8% of the gene names are found in the full text. In the full text, only 18% of the gene symbols are accompanied by their gene names.
Journal of Electrocardiology | 2008
Bob J. A. Schijvenaars; Gerard van Herpen; Jan A. Kors
The electrocardiogram (ECG) can be affected by intraindividual variations from various sources that may confuse the diagnosis of the underlying cardiac condition and impair the accuracy of ECG interpretation. Intraindividual variability is a hindrance in serial ECG analysis, where ECGs of the same individual, but taken at different times, are compared. Two sources of intraindividual variability can be distinguished as follows: variability related to the technical circumstances during ECG recording (technical sources) and nonpathologic biologic variability (biological sources). Among the technical sources, variation in electrode positioning between recordings is the most confusing. Of the biological sources, respiratory variations are effective at any time scale, but the most important are age and weight that work on prolonged time scales. Technical problems are best prevented by rigorously sticking to a standard acquisition protocol. Criteria can be adapted to changing circumstances (age, weight), and by computer modeling, it may be possible to correct the ECG diagnosis for some sources of intraindividual variability.
BMC Bioinformatics | 2005
Bob J. A. Schijvenaars; Barend Mons; Marc Weeber; Martijn J. Schuemie; Erik M. van Mulligen; Hester M. Wain; Jan A. Kors
BackgroundMassive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck.ResultsWe developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols), not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set.ConclusionThe ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications.
Journal of Biomedical Semantics | 2010
Kristina M. Hettne; Erik M. van Mulligen; Martijn J. Schuemie; Bob J. A. Schijvenaars; Jan A. Kors
BackgroundIdentification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule.ResultsFive of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus.ConclusionsWe recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at http://biosemantics.org/casper.
american medical informatics association annual symposium | 2003
Marc Weeber; Bob J. A. Schijvenaars; Erik M. van Mulligen; Barend Mons; Rob Jelier; C. Christiaan van der Eijk; Jan A. Kors
american medical informatics association annual symposium | 2002
Erik M. van Mulligen; C. Christiaan van der Eijk; Jan A. Kors; Bob J. A. Schijvenaars; Barend Mons
text retrieval conference | 2003
Rob Jelier; Martijn J. Schuemie; C. Christiaan van der Eijk; Marc Weeber; Erik M. van Mulligen; Bob J. A. Schijvenaars; Barend Mons; Jan A. Kors
Studies in health technology and informatics | 2001
Bob J. A. Schijvenaars; Jan A. Kors; Gerard van Herpen; Jan H. van Bemmel
text retrieval conference | 2005
Bob J. A. Schijvenaars; Martijn J. Schuemie; Erik M. van Mulligen; Marc Weeber; Rob Jelier; Barend Mons; Jan A. Kors; Wessel Kraaij