Clinical Text Mining in R, Python
Monday, January 22, 2024, 1:00 pm - 2:00 pm
Location: Health Science Building HS 790
Registration Link:


Title: Clinical Text Mining: A Showcase of Computational Tools and Statistical Models in R, Python

Instructor: Christopher Meaney

As healthcare systems amass vast amounts of digital patient information, a significant portion exists in clinical text format. This workshop aims to unveil the potential of this data through advanced computational tools and statistical models in R and Python.

The session will begin by exploring how digital character sequences are stored and manipulated in R/Python using pattern matching and regular expressions. We’ll delve into tokenization—transforming text into words or tokens—and compare various tokenizers.

You’ll gain insights into statistical semantic models like document term matrices and term co-occurrence matrices. These concepts will be applied to real-world clinical NLP challenges, including document classification, topic extraction, and token clustering. A highlight of the workshop is the introduction to large language models (LLMs), showcasing their effectiveness in clinical NLP tasks, particularly in clinical text de-identification and named entity recognition.

Here are this workshop’s materials