Summary

Published Date: February 08, 2023

Summary: The goals were to develop easy-to-use and validated predictive models to identify beneficiaries experiencing homelessness from administrative data. Authors pooled enrollment and claims data from enrollees of the California Whole Person Care (WPC) Medicaid demonstration program that coordinated care of a subset of Medicaid beneficiaries identified as high utilizers in 26 California counties (25 WPC Pilots). They also used public directories of social service and health care facilities. Using WPC Pilot-reported homelessness status, authors trained seven supervised learning algorithms with different specifications to identify beneficiaries experiencing homelessness. The list of predictors included address- and claims-based indicators, demographics, health status, health care utilization, and county-level homelessness rate. Model performance was assessed using measures of balanced accuracy (BA), sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiver operating characteristic curve (AUC). Authors included 93,656 WPC enrollees from 2017 and 2018, 37,441 of which had a WPC Pilot-reported homelessness indicator.

Findings: The random forest algorithm with all available indicators had the best performance (87% BA and 0.95 AUC), but a simpler Generalized Linear Model (GLM) also performed well (74% BA and 0.83 AUC). Reducing predictors to the top five most important indicators in a GLM model yields only slightly lower performance.  Authors conclude that in the absences of a validated indicator, likelihood of homelessness can be calculated using county rate of homelessness, address- and claim-based indicators, and beneficiary age using a prediction model presented here.

Read the Publication: