Evaluation of a natural language processing approach to identify social determinants of health in electronic health records in a diverse community cohort
Background: Health care systems in the United States are increasingly interested in measuring and addressing social determinants of health (SDoH). Advances in electronic health record systems and Natural Language Processing (NLP) create a unique opportunity to systematically document patient SDoH from digitized free-text provider notes. Methods: Patient SDoH status [recorded by Your Current Life Situation (YCLS) Survey] and associated provider notes recorded between March 2017 and June 2020 were extracted (32,261 beneficiaries; 50,722 YCLS surveys; 485,425 provider notes). NLP patterns were generated using a machine learning test statistic (Term Frequency-Inverse Document Frequency). Patterns were developed and assessed in a training, training validation, and final validation dataset (64%, 16%, and 20% of total data, respectively). NLP models analyzed SDoH-specific categories (housing, medical care, and transportation needs) and a combined SDoH metric. Model performance was assessed using sensitivity, specificity, and Cohen κ statistic, assuming the YCLS Survey to be the gold standard. Results: Within the training validation dataset, NLP models showed strong sensitivity and specificity, with moderate agreement with the YCLS Survey (Housing: sensitivity=0.67, specificity=0.89, κ=0.51; Medical care: sensitivity=0.55, specificity=0.73, κ=0.20; Transportation: sensitivity=0.79, specificity=0.87, κ=0.58). Model performance in the training and training validation datasets were comparable. In the final validation dataset, a combined SDoH prediction metric showed sensitivity=0.77, specificity=0.69, κ=0.45. Conclusion: This NLP algorithm demonstrated moderate performance in identification of unmet patient social needs. This novel approach may enable improved targeting of interventions, allocation of limited resources and monitoring a health care system’s addressing its patients’ SDoH needs.
Rouillard CJ, Nasser MA, Hu H, Roblin DW. Evaluation of a natural language processing approach to identify social determinants of health in electronic health records in a diverse community cohort. Med Care. 2022 Jan 5. doi:10.1097/MLR.0000000000001683. Epub ahead of print. PMID: 34984989.