Imon Banerjee, Associate Professor at Mayo Clinic, shared a post on LinkedIn:
“Excited to share our recent book chapter published by the National Cancer Institute (NCI ): “Development of LLM for Prostate Cancer—The Need for Domain-Tailored Training”
Generic LLMs are powerful—but when it comes to complex, high-stakes domains like oncology, precision matters. We hypothesized that domain-specific models would outperform general-purpose LLMs on clinical tasks requiring deep knowledge and structured terminology.
What we did:
- Collected 1.8M+ clinical notes from 15,341 prostate cancer patients at Mayo Clinic
- Built domain-specific tokenizers and applied UMLS-guided two-phase training
- Trained and evaluated our model on tasks like clinical info prediction, treatment compliance, and QA
Key findings:
- Our prostate cancer–focused LLM outperformed GPT-2 across all tasks
- It also beat BioGPT, a 3× larger model, on treatment prediction, compliance checks, and more
- This work highlights the value of targeted data and domain knowledge in building trustworthy clinical AI.
Read the Full Chapter.”
More Posts Featuring Prostate Cancer.