MMR Dropout: A Predictive Approach

Author
Affiliation

Kevin Linares

University of Maryland

Published

May 5, 2026

Abstract
Measles vaccination dropout — when children receive a first dose but fail to return for the essential second — poses a critical barrier to achieving protective immunity in Afghanistan, where coverage has stalled at approximately 83% and nearly 13,000 measles cases were reported among children under five in 2024 alone. This study constructs and validates a supervised machine learning classification framework to identify Afghan children at high risk of dropout using the 2023 UNICEF Multiple Indicator Cluster Survey (MICS) and World Bank provincial-level indicators. A cohort of 971 children aged 24 months and older who received at least one measles vaccine dose was used to train and evaluate five algorithms — logistic regression, bagging, random forest, neural network, and XGBoost — under 10-fold cross-validation with an 80/20 train-test split. Models were assessed on AUC-ROC and F1-score to balance sensitivity and precision under resource-constrained deployment conditions. The neural network achieved the strongest performance (F1 = 0.722, AUC = 0.633), outperforming random chance and identifying provincial population density, household characteristics, and housing tenure as key predictors of dropout. These findings offer actionable guidance for targeting limited public health resources toward children most at risk of incomplete vaccination in a fragile, conflict-affected setting.

1 Introduction

Measles stands as a formidable global public health challenge as it is more contagious than Ebola or tuberculosis. This virus continues to claim thousands of lives annually while disproportionately affecting the most vulnerable children under the age of five. Afghanistan currently faces a critical situation, identified by the World Health Organization (WHO) as a leading country experiencing significant measles outbreaks. Underscoring this urgency, the WHO reported nearly 13,000 measles cases among Afghan children under five in 2024 alone. This highlights the pervasive risk and the pressing need for effective preventative measures.

The cornerstone of measles prevention lies in achieving widespread immunity through vaccination. The Measles, Mumps, and Rubella (MMR) vaccine offers highly effective protection. Full immunity requires a two-dose regimen, and in Afghanistan it is typically recommended to be completed by the time a child reaches 24 months. Encouragingly, these vaccines are relatively inexpensive and should be readily available. However, a significant gap exists between the availability of the vaccine and achieving protective population immunity.

The benchmark for effective measles control and the potential for elimination is reaching and maintaining at least 95% two-dose vaccination coverage. According to UNICEF data from 2023, measles vaccine coverage in Afghanistan has stalled at approximately 83%. Crucially, this figure often reflects at least one dose, masking a more dangerous problem called vaccination dropout. Measles vaccination dropout occurs when children receive their initial measles vaccine dose but fail to return for the essential second dose. This incomplete vaccination leaves them insufficiently protected and contributes to the persistence of measles transmission within the com

Measles vaccination dropout is driven by several interconnected factors within the Afghan context such as; ongoing instability and conflict forcing population displacement and the disruption of established immunization schedules, geographic remoteness posing a challenge for vaccines to reach their intended audience, a fragile health infrastructure lacking systematic tracking and reminder systems for vaccinations, and household poverty and low levels of education which hinder health seeking behaviors. To add to these factors, a new challenge emerged this year with the incoming US administration drastically minimizing foreign aid to this part of the world. These combined challenges underscore the critical need for innovative approaches, such as deploying tailored interventions informed by predictive modeling to identify and support children most likely to miss their crucial second dose. The primary goal of this study is to construct and validate a robust classification model using 2023 UNICEF MICS and World Bank data to identify Afghan children at high risk of measles vaccination dropout, comparing XGBoost with other methods and highlighting key predictors to guide public health efforts.

2 Method

2.0.1 Inputs

The analytical basis for this study is derived from robust, multi-level data collected in Afghanistan in 2023. Our primary source is the 2023 Afghanistan Multiple Indicator Cluster Survey (MICS), conducted under the auspices of UNICEF. The MICS is a multi-stage cluster sampling design with systematic selection of households using proportionate allocation. Only households with children are eligible to partake in this survey. The MICS is designed to collect statistically rigorous data on a wide range of indicators concerning the health, development, and wellbeing of children and women.

From this rich dataset, we identified a specific cohort relevant to understanding MMR vaccine dropout. Our sample consists of 1,067 children who were 24 months and older at the time of the survey and who had received at least one dose of the measles vaccine. From this sample, eight percent had missing information and were dropped from analysis resulting in 971 children. The age threshold ensures that the children were generally past the recommended age for receiving the second dose, making it possible to assess dropout. We aim to predict the key outcome measles vaccination dropout. We used responses to two questions from the MICS survey about having been vaccinated with MMR and how many times. We used this information to create a binary outcome variable coded as follows:

  • Coded as 1: Children who received the first measles vaccine dose but had not received the second dose by the time of the survey—representing dropout—39 percent.

  • Coded as 0: Children who had received both the first and second doses—representing completion—61 percent.

2.0.2 Machine Learning Algorithms

We compared various supervised ML models such as logistic regression, Bagging, Random Forest, Neural Networks and XGBoost to identify the most effective model at predicting the target variable based on performance metrics. Hyperparameter optimization was conducted via a 10-fold cross-validation procedure, with an 80% training/testing partition. We assessed model performance using the AUC-ROC, which provides a measure of discriminatory power. The ROC allows us to evaluate the trade-off between sensitivity and 1 – specificity at various tuning parameters, while AUC will provide us with an evaluation of how well the model correctly classifies vaccination dropout. Additionally, we also used the F1-score, a harmonic mean of precision and recall, serves as a balanced indicator of a model’s ability to correctly identify relevant instances while minimizing both false positives and false negatives. We choose these metrics as we are interested in correctly classifying true positive cases while minimizing false positives as we aim to to produce a model that works best when foreign aid is limited as to not be wasteful in using resources on false positives. Figure 1 captures our workflow for this project in identifying the best model to predict measles vaccine dropout.

Figure 1. Our Machine Learning Workflow Process

2.0.3 Feature Selection

We chose 10 features from the MICS survey to represent the contextual factors that drive measles vaccine dropout such as household characteristics (e.g., access to electricity and running water, house construction materials ), family socieconomics (e.g., wealth score calculated by MICS) , mother’s education level, and family participation in national immunization campaigns. Moreover, there were four national immunization campaigns in the country and the MICS asked respondents to identify which event they attended. For illustration purposes we collapsed these four variables into one and coded it as attending one event (e.g., some), two or three events (e.g., half), or all four events (e.g., all) which can be seen on the x-axis in Figure 2, in addition to rural vs urban, and mother’s education level as they relate to the target variable. We can see from this figure that the rural-urban divide is important for understanding the outcome alongside mother’s education level which seems to correlate with participation in the immunization campaigns.

The MICS contains information about which province children live in; therefore, we were able to leveraged World Bank indicators for each Afghan province to provide additional geographic contextual features. This province indicators include conflict score with values closer to 100 indicating extreme instability from armed conflict, female literacy rate, health infrastructure with scores closer to 100 indicating greater delivery of medical goods to the public, population density, and poverty rate. Table 1 presents features used to predict the target outcome. We present in Figure 3 the breakdown in the target variable distributed across the provinces as well as show health institution availability and female literacy rates. We can see variation in the provinces across these indicators and our outcome variable. Overall we had 22 features in total to predict the target variable.

Features Used For Predicting Measles Vaccine Dropout
variable label
hh6 Area
im12a Participate in campaign, national immunization day A
im12b Participate in campaign, national immunization day B
im12c Participate in campaign, national immunization day C
im12d Participate in campaign, national immunization day D
melevel Mother’s education
wscore Combined wealth score
hh52 Number of children age 5-17
hc3 Number of Room in House
hc4 Main material of floor
hc7g Household have: A Mattress
hc8 Household have electricity
hc14 Household owns the dwelling
ws1 Main source of drinking water
ws9 Treat water to make safer for drinking
ws11 Type of toilet facility
hw3 Soap or detergent present at place of handwashing
conflict_index
female_literacy_rate
health_institutional_delivery
density_per_inhabited_area
poverty_rate
outcome

2.0.4

2.0.5 Feature Engineering

Across our factor features we collapsed levels when appropriate, such as when very few cases were observed in obscure levels. For instance, very few mothers in the survey had a secondary or higher education (e.g., < 14 percent) and therefore we collapsed all levels secondary and above together. Alternatively, very few had no formal education and we collapsed this level with primary education. Access to water feature factor levels were collapsed into city, well, or other as well as access to plumbing levels were collapsed into sewer, septic, pit, outhouse, latrine, and other. One shot encoding was used for factor levels and quantitative variables such as the family wealth score and province indicators were scaled and centered.

3 Results

For computation we used the r-language and conducted all of our modeling using tidymodels. We tuned hyperparameters using cross validation during model training and choose models with the best AUC and F1-scores. We start with the logistic regression as a base learner to compare against ensemble methods that require parameter tuning. We show parameter tuning values in Figures 4 and 5 for all five models.

The logistic regression was only tuned for the penalty parameter and was the less demanding algorithm to compute. Bagging was tuned for cost complexity between .00001 and .001, minimum number of n (e.g., 8, 10, 12 shown as columns in Figure 4), and tree depth between 2 to 6. Random forest was tuned for number of mtry to use between 3 to 10 (e.g., ), minimum n between 8 to 16 (e.g., shown as columns in Figure 4), and number of trees (e.g., 500, 1000, 1500, 2000). Neural networks using the nnet engine we tuned parameters penalty (i.e., amount of regularization, epochs, and hidden units). For this model we used the dials::grid_space_filling() function which is used to construct parameter grids that try to cover the parameter space in a way that the parameter space does not have observed combinations that are close to one another. This resulted in 300 combinations to use for tuning and can be seen in Figure 5. The XGBoost had the most parameters to tune and again we relied on dials::grid_latin_hypercube() which achieves something similar to that what we used in the neural network. This allowed us to tune tree depth, minimum node size, number of trees, minimum loss reduction, sample size (e.g., proportion observation samples, mtry, and learning rate resulting in training with 500 combinations of these parameters shown in Figure 5.

Figure 4. Hyperparameter Tuning for ML Model Training

Figure 5. Continuation of Hyperparameter Tuning for ML Model Training

Once we picked our best models for each algorithm based on F1-scores and the AUC we applied the best model to the test set and predicted measles vaccine dropout for each one. We present performance metrics for each final model on the test set in Figure 6. All of the models were very close in performance yet we chose the neural network model because it performed the best on the F1-score (.722) yet came in second place after random forest on AUC (.633). These scores are not great but they do better than random chance. Figure 7 presents the most important variables for interpretability purpose.

4 Discussion

This study aimed to predict measles vaccination dropout among Afghan children using machine learning with 2023 MICS and World Bank data. We found the neural network model performed best, exceeding random chance but with moderate performance. This highlights the challenge of predicting dropout in this complex setting. Key predictors identified by the model, such as provincial population density, number of rooms per house, having a septic tank, and renting home. These findings underscore the multifaceted nature of vaccination dropout. While imperfect, the models offer a valuable tool for public health efforts. These can help identify at-risk children and regions, they can assist in targeting limited resources and interventions more effectively, address factors like health service access, and campaign outreach.

Limitations include reliance on the small samples size of almost 1,000 respondents, necessary feature simplification such as collapsing factors, and a reliance on tool parameter specification for for model tuning. Future work could explore richer data sources or refined modeling techniques. In conclusion, this research demonstrates the utility of machine learning for identifying children at risk of measles vaccination dropout in Afghanistan. The models, despite limitations, provide actionable insights to guide targeted public health strategies aimed at improving crucial two-dose vaccination coverage.