This study evaluates whether adding machine learning-based risk information to electronic health record (EHR) lab result messages helps older adults better understand their risk of developing diabetes and influences their emotional responses, quality of life, and healthcare use.
Eligible participants are adults aged 65 years and older with a UCLA primary care provider and a hemoglobin A1c level in the range (5.7-6.0%). Participants are identified automatically at the time their lab results are processed and are randomly assigned to receive either standard lab result messages or modified messages that include a "very low risk" label generated by a machine learning model.
All participants who are randomized are invited to complete two surveys: one shortly after their lab result is posted in MyChart and a follow-up survey approximately 30 days later. The study also uses de-identified EHR data to examine patterns of healthcare utilization and progression to diabetes. Provider comments related to lab result messaging will be analyzed to explore differences in response patterns between the two groups.
Prediabetes thresholds based on hemoglobin A1c were originally developed using younger, healthier populations and may not reflect the slower and more variable glycemic changes observed in older adults. Evidence from large community-based cohorts suggests that adults aged 65 years and older with A1c values in the prediabetes range are often more likely to return to normal glycemia than to progress to diabetes, creating uncertainty for patients and providers when interpreting lab results.
Machine learning models developed using de-identified UCLA Health EHR data from multiple annual cohorts between 2020 and 2024 demonstrated strong performance in predicting progression to diabetes. The final model uses a CatBoost architecture and incorporates approximately 94 routinely collected clinical variables to generate patient-specific risk scores. Model performance was evaluated across yearly cohorts, and the selected model is locked for the duration of the study without updating or adapting to new data.
The study follows a real-world, randomized deployment design in which eligible individuals in the lowest 15% of model-predicted risk within the eligible study population are identified automatically at the time lab results are processed and assigned to either modified or standard lab result messaging. De-identified EHR data and free-text provider comments are used to examine healthcare utilization, disease progression, and provider response patterns over time.
All participants who are randomized are invited to complete two surveys. The first survey is administered shortly after receipt of the laboratory result and is designed to assess immediate patient understanding of the result and emotional responses such as anxiety or reassurance. A second survey is administered approximately one month later and uses validated instruments to measure health-related quality of life, food-related quality of life and eating behavior, and perceived burden of healthcare. Both study arms receive the same surveys, allowing comparison of patient-reported outcomes between standard and modified laboratory result messaging. Surveys are distributed only to participants who have been randomized to either modified or standard laboratory result messaging. Therefore, no additional eligibility criteria apply for survey participation beyond randomization.
By embedding model-generated risk information directly into routine EHR workflows, this study aims to generate evidence on whether precision-based communication can support more individualized, patient-centered care and inform future implementation across broader patient populations and clinical use cases.