Grading evidence

How evidence scale predicts successful policies

In 2017, Indonesia faced a crisis. Over 30 percent of children under five were stunted.  This was not just a nutritional issue. Research by the World Bank, UNICEF and the Indonesia Ministry of Health found that sanitation, maternal health and poverty were equally critical.

Armed with evidence, the government acted to launch the National Strategy to Accelerate Stunting Prevention in 2018. By 2022, stunting rates had fallen to 21 percent. The success of this strategy was not just about using research—it was about using evidence that had been tested in diverse settings and at scale.

While good policy decisions should be based on good information, in reality, things are not that simple. Public opinion, budget constraints, political pressures, and unforeseen events—like a global pandemic—can upend policy agendas overnight. Even the best research is just one factor in a complex decision-making process.

Policymakers must also decide what kind of evidence carries the most weight. Should they trust large-scale studies over small case studies? Are randomised controlled trials always superior to observational research? And does the scale of evidence—such as sample size and real-world testing—predict whether a policy will succeed?

This challenge has led researchers to explore whether the scale of evidence—such as sample size and real-world testing—can predict whether a policy will succeed. A recently released paper by a Columbia University researcher, Assessing Evidence Based on Scale as a Predictor of Policy Outcomes, investigates this question.

This paper takes a fascinating angle. It looks at whether the scale of the evidence, like the sample size and how much real-world testing has been done, can help predict if a policy is going to work the way it is supposed to. A study with a few dozen people versus a thousand, you will probably trust the bigger one more. In a way, it is not about the research method itself but the scale of the evidence. 

Scale in Evidence-Based Policy

The study surveyed 251 policymakers and found something surprising. No single type of evidence was universally preferred. Even randomised controlled trials, often seen as the “gold standard”, weren’t always the most useful. Sometimes, a quick focus group to get a sense of public opinion was more valuable than a year-long study. This is often the situation when quick decisions need to be made. 

So how can different types of evidence be compared for reliability? The research introduces a framework called THEARI, which classifies evidence into five levels, based not on research methods but on how extensively the evidence has been tested:

THEARI Framework (Ruggeri et al., 2020):

  • Level 1: Theoretical or opinion-based (no empirical data)
  • Level 2: Empirical (small surveys, lab experiments, limited trials)
  • Level 3: Applicable (evidence from field studies in real-world settings)
  • Level 4: Replicable (evidence tested in multiple real-world settings)
  • Level 5: Impact (evidence of success at scale across multiple settings)

Does Scale Predict Policy Success?

To test THEARI’s predictive power, the study examined 82 policies across healthcare, education, energy, and other areas. It found that policies based only on Level 1 evidence—just theory—had only a 38 percent success rate. Policies based on Level 2 evidence, like small studies and surveys, improved to a 50% success rate. Even limited real-world testing made a difference.

However, policies based on level five evidence that has been tested and confirmed many times in multiple settings had a 78 percent success rate. A huge difference. Investing in some solid research can really pay off in policies. At least, it suggests the scale of evidence can be a big deal. 

While level one produced a lower success rate, the THEARI Framework viewed level one as still valuable. It can be used to stimulate new ideas and approaches, especially when data is lacking. Theory can also be used to explore entirely new policy areas or start to design policy by defining the problem, setting goals and identifying potential solutions. 

Real-world application

THEARI can be used when compiling evidence for particular policy questions. Scoping reviews or structuring literature review are other uses. In 2022, the United Kingdom’s Competition and Markets Authority used the framework to review online market formats in their consumer research.

THEARI can also be used by policy advisors to make formal recommendations to decision-makers. It helps policymakers and funding bodies recognise a responsible balance between established and untested approaches.

Policymaking is messy, but research can play a critical role in improving outcomes—if the right kind of evidence is used. THEARI offers a way to assess not just what the research says, but how thoroughly it has been tested. In an era where misinformation and rushed decisions can have huge consequences, tools like this could help policymakers make better, more informed choices.

Read further:  Ruggeri, K., 2024, Assessing evidence based on scale can be a useful predictor of policy outcomes, Policy Sciences

Leave a comment