Do Evidence-Based Policy Clearinghouses Provide Good Guidance for Local Policymakers?
- Greg Thorson

- 2 hours ago
- 4 min read

Orr (2026) asks whether evidence-based policy clearinghouses give reliable advice to local policymakers when program impacts vary across places. He examines data from six large multisite randomized controlled trials in education and youth programs, drawing on site-level impact estimates and cross-site variation reported in prior studies. Using a Bayesian model, he evaluates how often a clearinghouse rule based on statistically significant average effects leads to correct local decisions. He finds that when cross-site impact variation is moderate to large, the probability of making the correct local decision is often low, frequently near or below 60 percent. Even advanced prediction methods offer only limited improvement.
Why This Article Was Selected for The Policy Scientist
This article addresses a central problem in contemporary evidence-based policymaking: whether results from rigorously designed studies meaningfully inform decisions made by local governments. That question is especially salient as federal and state agencies increasingly mandate the use of clearinghouse ratings in funding and program adoption, often without regard to local context. Orr has written extensively on external validity and policy decision-making, and this article consolidates and extends that body of work at a moment when replication concerns and heterogeneous treatment effects are widely recognized. The analysis relies on high-quality data from well-known multisite randomized controlled trials, which strengthens internal validity. The Bayesian framework is well suited to evaluating decision accuracy under heterogeneity and builds directly on leading work in causal inference. The findings raise important questions about the generalizability of average treatment effects across jurisdictions.
Full Citation and Link to Article
Orr, L. L. (2026). Do “Evidence-Based Policy” Clearinghouses Provide Good Advice for Local Policymakers? Journal of Policy Analysis and Management, 45(1), 1–??. https://doi.org/10.1002/pam.70077
Central Research Question
This article examines whether evidence-based policy clearinghouses provide reliable guidance to local policymakers when policy impacts vary across sites. The central question is not whether multisite randomized controlled trials are internally valid, but whether the standard clearinghouse rule—treating statistically significant average effects as evidence of effectiveness—leads to correct adoption or rejection decisions at the local level. The analysis focuses on “broad-to-narrow” external validity: whether findings from multisite trials meaningfully predict outcomes in individual jurisdictions. The study asks how often a policymaker who follows clearinghouse advice would make the same decision they would make if the true local impact were known, and how that probability changes with cross-site heterogeneity, prior beliefs, and alternative prediction methods.
Previous Literature
The study builds on a long-standing literature recognizing the limits of external validity in social experiments, dating back to Campbell and Stanley and later work by Bryk and Raudenbush. More recent research has emphasized replication failures and treatment effect heterogeneity across sites, with evidence from economics, education, criminology, and social policy showing that average treatment effects often mask substantial variation. Prior empirical work, including earlier studies by Orr and collaborators, demonstrated that projecting multisite RCT results to individual sites can lead to large prediction errors. Other studies have explored whether observable moderators or advanced machine-learning methods can improve local predictions, with mixed success. This article extends that literature by shifting attention from prediction accuracy alone to decision accuracy, directly evaluating whether the clearinghouse decision rule yields correct policy choices under realistic conditions.
Data
The empirical analysis draws on six well-known multisite randomized controlled trials in education and youth policy. These studies cover a range of interventions, including charter middle schools, summer reading programs, mentoring programs, teacher pipeline initiatives, and a violence prevention program. The trials include between roughly 30 and over 100 sites each, with random assignment conducted within sites, allowing for internally valid site-level impact estimates. Outcomes include standardized test scores, proficiency indicators, and behavioral measures. The data are drawn from high-quality sources, including the National Center for Education Statistics and the Inter-university Consortium for Political and Social Research, and have been previously analyzed to estimate cross-site impact distributions. The availability of site-level estimates and associated variances makes these data well suited to evaluating heterogeneity and decision risk.
Methods
The author develops a Bayesian decision-theoretic framework to assess the probability that a policymaker following clearinghouse advice would make a correct decision. A correct decision is defined as adopting an intervention if the true local impact is positive and rejecting it otherwise. The model incorporates two sources of uncertainty: sampling error in the estimated average treatment effect and true variation in impacts across sites. Using estimates of the cross-site impact distribution, the model calculates the probability of correct decisions, Type I errors, and Type II errors under the standard evidence-based policy rule. The analysis also computes positive predictive value and negative predictive value, measuring how often interventions labeled effective or ineffective are truly so in a given site. Extensions consider the role of informative priors, alternative adoption thresholds, and site-specific predictions generated using Bayesian Additive Regression Trees, which represent a best-case scenario for using observable moderators.
Findings/Size Effects
Across the six trials, the probability that clearinghouse guidance leads to a correct local decision varies widely and is often modest. When cross-site impact heterogeneity is low, correct decision probabilities can be relatively high, sometimes exceeding 80 or 90 percent. However, when heterogeneity is moderate to large—a common finding in the data—the probability of a correct decision frequently falls near 50 to 60 percent, little better than chance. Positive predictive value and negative predictive value are similarly sensitive to heterogeneity, declining sharply as cross-site variation increases. Incorporating informative prior beliefs has little effect on results because the large number of sites in multisite trials dominates the posterior. Using BART-based local predictions improves decision accuracy slightly in some cases but does not eliminate the fundamental problem. Simulations with larger hypothetical effect sizes show some improvement, but even medium-sized effects do not guarantee high decision accuracy under substantial heterogeneity.
Conclusion
The study demonstrates that evidence-based policy clearinghouses, as currently structured, often provide weak guidance for local decision-making when treatment effects vary across sites. Even when based on high-quality randomized controlled trials, average effects can mislead policymakers about likely local impacts. The findings underscore that internal validity alone is insufficient for policy guidance and that decision accuracy depends critically on the distribution of site-specific effects. While advanced statistical methods and richer data may offer incremental gains, they do not fully resolve the problem. The article makes a significant contribution by reframing the evaluation of evidence-based policy from statistical significance to decision reliability, highlighting a core challenge facing contemporary policy design and implementation.





Comments