Author ORCID Identifier

Defense Date


Document Type


Degree Name

Master of Science


Computer Science

First Advisor

Dr. Preetam Ghosh


Structure name standardization is a critical problem in Radiotherapy planning systems to correctly identify the various Organs-at-Risk, Planning Target Volumes and `Other' organs for monitoring present and future medications. Physicians often label anatomical structure sets in Digital Imaging and Communications in Medicine (DICOM) images with nonstandard random names. Hence, the standardization of these names for the Organs at Risk (OARs), Planning Target Volumes (PTVs), and `Other' organs is a vital problem. Prior works considered traditional machine learning approaches on structure sets with moderate success. We compare both traditional methods and deep neural network-based approaches on the multimodal vision-language prostate cancer patient data, compiled from the radiotherapy centers of the US Veterans Health Administration (VHA) and Virginia Commonwealth University (VCU) for structure name standardization. These de-identified data comprise 16,290 prostate structures. Our method integrates the multimodal textual and imaging data with Convolutional Neural Network (CNN)-based deep learning approaches such as CNN, Visual Geometry Group (VGG) network, and Residual Network (ResNet) and shows improved results in prostate radiotherapy structure name standardization. Our proposed deep neural network-based approach on the multimodal vision-language prostate cancer patient data provides state-of-the-art results for structure name standardization. Evaluation with macro-averaged F1 score shows that our CNN model with single-modal textual data usually performs better than previous studies. We also experimented with various combinations of multimodal data (masked images, masked dose) besides textual data. The models perform well on textual data alone, while the addition of imaging data shows that deep neural networks achieve better performance using information present in other modalities. Our pipeline can successfully standardize the Organs-at-Risk and the Planning Target Volumes, which are of utmost interest to the clinicians and simultaneously, performs very well on the `Other' organs. We performed comprehensive experiments by varying input data modalities to show that using masked images and masked dose data with text outperforms the combination of other input modalities. We also undersampled the majority class, i.e., the `Other' class, at different degrees and conducted extensive experiments to demonstrate that a small amount of majority class undersampling is essential for superior performance. Overall, our proposed integrated, deep neural network-based architecture for prostate structure name standardization can solve several challenges associated with multimodal data. The VGG network on the masked image-dose data combined with CNNs on the text data performs the best and presents the state-of-the-art in this domain.


© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission