Studies in health technology and informatics | 2019

Building an Experimental German User Interface Terminology Linked to SNOMED CT

 
 
 
 
 

Abstract


We describe the process of creating a User Interface Terminology (UIT) with the goal to generate a maximum of German language interface terms that are mapped to the reference terminology SNOMED CT. The purpose is to offer a high coverage of medical jargon in order to optimise semantic annotations of clinical documents by text mining systems. The first step consisted in the creation of an n-gram table to which words and short phrases from the English SNOMED CT description table were automatically extracted and entered. The second step was to fill up the n-gram table with human and machine translations, manually enriched by POS tags. Top-down and bottom-up methods for manual terminology population were used. Grammar rules were formulated and embedded into a term generator, which then created one-to-many German variants per SNOMED CT description. Currently, the German user interface terminology contains 4,425,948 entries, created out of 111,605 German n-grams, assigned to 95,298 English n-grams. With 341,105 active concepts and 542,462 (non FSN) descriptions, it corresponds to an average of 13 interface terms per concept and 8.2 per description. An analysis of the current quality of this resource by blinded human assessment terminology states equivalence regarding term understandability compared to a fully automated Web-based translator, which, however does not yield any synonyms, so that there are good reasons to further develop this semi-automated terminology engineering method and recommend it for other language pairs.

Volume 264
Pages \n 153-157\n
DOI 10.3233/SHTI190202
Language English
Journal Studies in health technology and informatics

Full Text