Teaching Computers to Read Can Drive Human Health Forward [online video]


Media is loading

Original Publication Date


Document Type



7th Annual VCU 3MT® Competition, held on October 15, 2021.


This research involves teaching computers to read and extract data from clinical notes in order to conduct better clinical research.


Evan French: Over the course of the 20th century, American life expectancy increased more than 50 percent. You heard that, right? 50%. During that period, we spared thousands of children from paralysis by eradicating polio. We transformed HIV from a death sentence to a manageable condition. And just last year, we produced multiple vaccines for COVID-19. This is the power of clinical research. It informs the continuous improvement of patient care by analyzing big health care data aggregated from individual patient records. Here's how it works. Every time you go to the doctor, information about you is captured in two forms. The first is facts like age, sex, heart rate, and diagnoses. The other is clinical notes, like this one, where a doctor records the patient's concerns along with their own clinical assessment and treatment plan. Okay? So facts are relatively easy to filter and aggregate, which would make them ideal for data analysis, except they're often incomplete. So, information in the notes could fill in the gaps in the factual record. But to get the facts out of the notes, we would need an expert to read every single patient note. Not gonna happen. But, if we could get a computer to do it instead, we could extract that data quickly and cheaply, add it to the factual record, and perform higher quality research. So here's where I come in. I'm teaching computers to read clinical notes. The premise of my research is that the meaning of a phrase can be inferred from its context in a sentence. We can exploit this with respect to clinical notes using natural language processing and machine learning techniques that I've outlined here. So starting at step one, we have the computer aggregate millions of biomedical articles into a statistical model of which words appear frequently and how they're ordered in sentences. In other words, it's learning biomedical grammar. Next, we have an expert label phrases in a small set of actual clinical notes. This helps the computer to learn context clues that it needs to identify phrases in the real world. For example, if I told you that my daughter was diagnosed with atopic dermatitis, you understand that's a disease even if you've never heard of it before. And that's because the ending "-itis," coupled with the expression "diagnosed with," provide the context to make that assumption. By helping the computers to understand grammar and recognize clinical context, they can identify phrases in real clinical notes and use them to fill in the gaps in the factual record. By teaching computers to read and extract data from notes, we can conduct better clinical research, which will inform better patient care. And better patient care will drive human health forward over the next century.


© The Author

French.srt (5 kB)
Closed caption file

Evan French_ Transcript.docx (7 kB)

This document is currently not available here.