Risk adjustment for regional healthcare funding allocations with ensemble methods : an empirical study and interpretation

Tuukka Holster*, Shaoxiong Ji, Pekka Marttinen

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review


We experiment with recent ensemble machine learning methods in estimating healthcare costs, utilizing Finnish data containing rich individual-level information on healthcare costs, socioeconomic status and diagnostic data from multiple registries. Our data are a random 10% sample (553,675 observations) from the Finnish population in 2017. Using annual healthcare cost in 2017 as a response variable, we compare the performance of Random forest, Gradient Boosting Machine (GBM) and eXtreme Gradient Boosting (XGBoost) to linear regression. As machine learning methods are often seen as unsuitable in risk adjustment applications because of their relative opaqueness, we also introduce visualizations from the machine learning literature to help interpret the contribution of individual variables to the prediction. Our results show that ensemble machine learning methods can improve predictive performance, with all of them significantly outperforming linear regression, and that a certain level of interpretation can be provided for them. We also find individual-level socioeconomic variables to improve prediction accuracy and that their effect is larger for machine learning methods. However, we find that the predictions used for funding allocations are sensitive to model selection, highlighting the need for comprehensive robustness testing when estimating risk adjustment models used in applications.

Original languageEnglish
JournalEuropean Journal of Health Economics
Publication statusE-pub ahead of print - 2024
MoE publication typeA1 Journal article-refereed


  • Healthcare costs
  • Interpretation
  • Machine learning
  • Predictive modeling
  • Risk adjustment
  • Socioeconomic information


Dive into the research topics of 'Risk adjustment for regional healthcare funding allocations with ensemble methods : an empirical study and interpretation'. Together they form a unique fingerprint.

Cite this