Download Full Text (570 KB)


UMLS::Association - A Semantic Association Framework for Biomedical Texts

Keith Herbert

Natural Language Processing Lab, Department of Computer Science


We present UMLS::Association, a software package to explore the semantic association of biomedical terms with applications for literature-based discovery. Literature-based discovery is an endeavour to ”connect the dots” for scientists between the topics of their research and those of unexpected relevance. However, many approaches rely on the exact wording for the ideas in the research papers being analyzed. The Unified Medical Language System (UMLS) provides a way to map natural language phrases in these papers to sequences of abstract yet very specific concepts. These concepts are referred to as Concept Unique Identifiers (CUIs). We can identify which concepts are strongly associated by measuring how often they occur together within a corpus of biomedical texts and applying statistical techniques.


We measure the semantic association of CUIs with bigrams: pairs of CUIs that follow each other in some string of symbols. Research articles and clinical studies were first preprocessed by a UMLS tool that generates sequences of CUIs for every phrase within each sentence of the papers. Our framework then extracts bigrams from the CUI sequences to build a database from which we can calculate meaningful statistics for the association of two CUIs. We developed a utility to quickly return a variety of statistical association measures for any two concepts as well as an application programming interface to allow these association measures to be incorporated into new software packages.


We evaluated UMLS::Association’s predictive performance for semantic association by running it on four datasets which had been tagged by human judges for semantic similarity and relatedness. The results show our semantic association measures to match human judgements on the association between concepts as well or better than current state-of-the art semantic similarity and relatedness measures.


UMLS::Association provides an easy to use framework for the semantic association of concepts within biomedical literature. Work is in progress to extend the reach of the bigram model with a directed graph representation of the many unique CUI sequences generated for each phrase in a sentence. A user friendly web application interface to our framework is also under development. Besides access to existing functions, it will also feature a directed graph visualization for the search results for concepts strongly associated with some query concept. This will allow any researcher to explore the semantic associations between concepts in a simple and intuitive way.

Publication Date


Subject Major(s)

Computer Science


Natural Language Processing, Medical Informatics, UMLS, Perl, MetaMap


Health Information Technology | Other Computer Sciences

Current Academic Year


Faculty Advisor/Mentor

Dr. Bridget McInnes


© The Author(s)

UMLS::Association - Measuring the Association Between Biomedical Terms