A Shift to Precision using Predictive Analytics in Public Health
If you’ve ever stepped inside an Anganwadi Centre during a busy morning, you would have probably observed a long queue of mothers and their small children, anxious with questions about weight and growth, and frontline workers doing their best to track hundreds of little lives. It’s both inspiring and overwhelming!
India has made significant progress in reducing child mortality and morbidity but continues to witness significant malnutrition levels in early childhood. According to NFHS-5 (2019–21), 32.1% of children under five are underweight (low weight-for-age), 35.5% are stunted (low height-for-age), and 19.3% are wasted (low weight-for-height). Even with immense effort on the ground, frontline workers can only reach so many children in a day. And by the time growth faltering becomes visible, we’ve already lost precious time.
This reality motivated our team to explore something new:
- What if we could anticipate nutritional decline before it happens?
- What if we could help frontline workers focus on children who need attention right now, not weeks later?
These questions became the foundation of a project, initially supported by USAID’s Development Innovation Ventures (DIV), and later completed with funding from Open Road Impact (ORI) to bring predictive analytics in public health to the community level, specifically within nutrition programs by enabling earlier, more targeted action.
Partnering with the American India Foundation (AIF), we built a machine-learning model—an application of predictive analytics in public health—using data from their CommCare-based SNEH (Skilling, Nutrition Education, and Health) program in Madhya Pradesh and Odisha. The goal wasn’t simply to identify malnutrition, but rather to predict the transition from healthy to malnourished, an ambitious and operationally transformative shift from treatment to prevention.
While this work began as a pilot, the questions it raises are deeply relevant for policymakers, implementers, and funders seeking to move public health systems from reactive response to anticipatory care.
How We Built the Predictive Model & Why It Matters
Building this model was as much a human journey as it was a technical one.
A Dataset Rooted in Real Life
We began with a robust dataset of 22,767 children between 6 and 60 months of age, tracked over an 18-month period. The data came from the daily work of community health workers and included:
- Height and Weight Measurements (anthropometric data)
- Family Demographics and Registration
- Nutrition Supplement Records
- Home Visit Observations
These were not abstract numbers. Together, they told living stories of how children grow, stall, recover, or decline over time.
Thinking in Time, Not Just Snapshots
Children do not become malnourished overnight. Risk often builds gradually, through small and compounding changes.
To reflect this reality, the model was designed to look at growth trajectories rather than static measurements. It used:
- 120 days of past data to construct features
- 60 days into the future to predict transitions
This allowed the model to ask a simple but powerful question: Is this child on a growth trajectory that may worsen soon?
Predicting Four Critical Transitions
Because the AIF program already had an established intervention for children who were stunted or wasted, the model focused on prevention. Specifically, it predicted four types of nutritional transitions among children who were not yet severely malnourished:
- Stunting Transition: A child with normal height-for-age (HFA) who becomes stunted (HFA z-score < -2).
- Wasting Transition: A child with normal weight-for-height (WFH) who becomes wasted (WFH z-score < -2).
- Becoming Underweight Transition: A child with normal weight-for-age (WFA) who becomes underweight (WFA z-score < -2).
- Underweight Worsening: A child with low WFA who experiences a further decrease in WFA z-score.
Rather than building separate models for each transition, we ultimately developed a single unified model. This simplified interpretation for frontline teams and performed better operationally.
The Technology Behind the Scenes
We used a Random Forest Classifier (via scikit-learn) because it’s:
- Easy to interpret
- Robust against noisy data
- Well-suited for non-linear interactions
From an initial set of 54 engineered variables, the final model relied on 26 carefully selected features that struck a balance between predictive performance, explainability, and practical usability in the field.
What the Model Found: Key Predictive Factors
To understand which features contributed most to accurate predictions, we used Gini importance. These results reflect predictive correlations in the data and should not be interpreted as causal evidence.
The strongest signals came from recent growth patterns, particularly changes observed in the last two to four months.
| Predictive Factor | Feature Importance(Relative Weight) |
| Mean HFA Z-score, t[-1] (past 60 days) | 0.0828 |
| Mean HFA Z-score, t[-2] (prior 60-120 days) | 0.0786 |
| Mean WFA Z-score, t[-1] (past 60 days) | 0.0587 |
| Mean WFA Z-score, t[-2] (prior 60-120 days) | 0.0540 |
Recent trajectories mattered far more than static characteristics. How often a child was measured, how consistently growth was tracked, and whether small declines appeared over time all played a critical role.
Meanwhile, factors like:
- Parental education
- Gender
- Caste
- Number of siblings
- Home visits
…carried lower but still meaningful weights.
Why does this matter?
Frontline workers make critical decisions every day, often with limited information and overwhelming caseloads. Our findings showed that while experience-based judgement is valuable, it does not always capture early and subtle signs of nutritional risk.
As a tool for predictive analytics in public health, the model was able to detect small shifts in growth patterns often following illness or periods of low dietary diversity well before those risks became visible. Used responsibly, tools like this can provide an additional layer of decision support, helping frontline workers focus attention where it is most urgently needed.
While the relative importance of individual features will vary across geographies, the broader pattern is likely to hold. Recent growth trajectories, measurement consistency, and early deviations matter more than static demographic traits. Any adaptation of this approach should retrain and recalibrate feature weights using local data.
How the Model Performed
When tested on a holdout dataset, the model showed:
- Average Precision: 0.53
- Precision at 0.5 Recall: 0.55
- Improvement: A 3.4-fold better precision than a classifier that performs completely at random (which would be 0.16 precision at 0.5 recall given the class imbalance of 0.16/.84).
For the intervention design, where each village received a list of the four children most at risk, the model achieved:
- Precision@4: 0.32
- Recall@4: 0.48
While not perfect, this performance was significantly stronger than relying on subjective judgment alone. In population-scale programs, even modest gains in precision can translate into thousands of children receiving timely support who might otherwise be missed. For policymakers, this means better use of constrained resources without increasing frontline workload.
From Predictions to People: What Happened in the Field
The real test of this work was never the algorithm itself; it was what frontline workers did with it.
At every Anganwadi Centre, the model identified the four children most likely to experience a decline in their nutritional status over the next two months. To act on these insights, AIF deployed Integrated Community Facilitators (ICFs), a supervisory cadre that complements the work of Anganwadi Workers by providing focused mentoring and household-level support.
Once the high-risk children were flagged, ICFs visited their families and provided a package of focused support:
- Tailored nutrition counselling
- Raising awareness about the importance of regular growth monitoring
- Identify behaviour changes to improve the child’s nutritional status and provide guidance, counselling around it
- Continued follow-up and problem-solving support to help sustain these changes over time.

This simple shift from trying to serve all families with equal intensity to proactively supporting those most at risk proved transformative. It made frontline workers’ time more meaningful, ensured timely interventions, improved follow-up consistency and brought precision to community-level nutrition work.
Implementation Challenges and What We Learned Along the Way
While the model showed promise, implementation was not without its challenges.
- Data quality varied across sites, reflecting the realities of large-scale field data collection where measurements are taken under time pressure and difficult conditions.
- The intervention window was relatively short, which limited our ability to observe measurable anthropometric change, even when positive behaviour change was clearly underway.
- Introducing predictive prioritisation required careful change management. ICFs needed time, training, and trust to adapt to a more focused approach, one that prioritised children at higher risk while ensuring conversations with caregivers remained supportive, clear, and non-alarming.
Alongside these challenges, several lessons consistently stood out across states, teams, and villages:
- Collaboration Matters More Than Code
The decision to predict future malnutrition, not current malnutrition, came from deep collaboration between technical teams and nutrition experts. Neither group could have arrived at the same solution alone.
- Accuracy Isn’t Everything: Usability Wins
Yes, the model could be more “accurate” in theory. But the point was the impact on the ground:
- Simple output
- Clear ranking
- Actionable insights
- Trust and usability for frontline workers
This model helped shift from uniform outreach to precision counselling, which is more efficient, more impactful, and less burdensome.
- Context Is Everything
A model trained in two states does not automatically generalise to all of India. Cultural patterns, feeding practices, disease profiles, and even measurement habits vary. Future models will need a “base architecture” that can be fine-tuned regionally using local data.
Taken together, these experiences reinforced an important principle: predictive analytics can strengthen public health systems, but only when paired with thoughtful implementation, human oversight, and respect for local realities.
Implications for Public Health Policy and Practice
The findings from this work extend far beyond a single pilot and offers several broader lessons for public health systems:
- Predictive analytics can strengthen targeting and supervision.
The model performed better than subjective judgment, demonstrating how machine learning can support more objective, equitable identification of vulnerability.
- Prevention-first systems are possible.
Instead of waiting for a child’s nutrition status to deteriorate, prediction allows health systems to intervene earlier. This not only improves outcomes but also reduces the long-term costs associated with treating severe malnutrition.
- Decision support reduces frontline burden.
With limited time and resources, frontline workers often struggle to give equal attention to every case. Risk scores help them focus where support is most urgently needed, easing their burden while increasing the impact of their work.
- Absolute risk matters more than ranking alone.
The model highlights children who truly need urgent attention, not just those at the top of a list. This distinction ensures that the most vulnerable are consistently reached.
Beyond child nutrition, this approach has relevance across a range of public health programs where early signals matter. Similar predictive models could be used to anticipate pregnancy-related risks during the antenatal and postnatal period, such as identifying mothers at higher risk of complications, drop-off from care, or delayed follow-up. Predictive analytics could also help flag irregularities in service delivery data, support early warning systems for disease surveillance, or strengthen supervision by identifying where frontline support may be most needed.
The specific predictors and model weights would naturally vary by context, population, and program. However, the underlying principle remains consistent: combining routine program data with temporal patterns to enable earlier, more targeted, and more equitable action.
Looking Forward
This project didn’t just build a model; it built a new way of thinking!
- A way where frontline workers have sharper tools.
- Where at-risk children are seen earlier.
- Where technology supports, not replaces, human decision-making.
We see this work as a starting point, not a finished product. We are continuing to refine the model, improve its generalisability, and explore how similar approaches can be responsibly integrated into government systems and other public health domains. For organisations interested in adapting or testing this approach in their own contexts, we welcome collaboration and shared learning.
Because at its heart, this isn’t just about algorithms, it’s about giving every child a fair chance to grow, thrive, and stay healthy. And sometimes, all it takes is noticing the small signals, just a little sooner!
To learn more or explore collaboration, contact:
[Appendix]
The table below lists the encoded features and their corresponding feature importance scores in decreasing order:
| Predictive Factor | Feature Importance (Relative Weight) |
| Mean HFA Z-score, t[-1] (past 60 days) | 0.0828 |
| Mean HFA Z-score, t[-2] (prior 60-120 days) | 0.0786 |
| Mean WFA Z-score, t[-1] (past 60 days) | 0.0587 |
| Mean WFA Z-score, t[-2] (prior 60-120 days) | 0.0540 |
| Mean weight, t[-2] (prior 60-120 days) | 0.0484 |
| Mean weight, t[-1] (past 60 days) | 0.0460 |
| Mean height, t[-1] (past 60 days) | 0.0448 |
| Mean height, t[-2] (prior 60-120 days) | 0.0417 |
| Number of days since first measurement | 0.0413 |
| Mean WFH Z-score, t[-1] (past 60 days) | 0.0392 |
| Mean WFH Z-score, t[-2] (prior 60-120 days) | 0.0391 |
| Target month | 0.0390 |
| Current age | 0.0350 |
| Mean time to complete visit form | 0.0292 |
| Mean scaled number of days meals received | 0.0281 |
| Birth weight | 0.0279 |
| Number of measurements | 0.0174 |
| Number of siblings | 0.0121 |
| State, Odisha | 0.0113 |
| District, Vidisha | 0.0108 |
| Number of home visits | 0.0083 |
| Father completed secondary schooling | 0.0072 |
| Gender, female | 0.0067 |
| Gender, male | 0.0064 |
| Mother completed primary schooling | 0.0064 |
| Family unit caste | 0.0063 |
| Mother completed secondary schooling | 0.0062 |
| Father completed primary schooling | 0.0057 |
| Number of fever checks | 0.0053 |


