top of page

Be Notified of New Research Summaries -

It's Free!

Can Machine Learning Identify Fraudulent Hospital Billing in Medicare?

  • Writer: Greg Thorson
    Greg Thorson
  • 1 day ago
  • 5 min read

Shekhar, Leder-Luis, and Akoglu (2026) ask whether unsupervised, explainable machine learning can effectively identify hospitals engaging in potentially fraudulent Medicare billing. They analyze millions of Medicare inpatient claims from 2017, combined with patients’ prior medical histories and hospital characteristics, covering over 2,200 hospitals. Using anomaly-detection algorithms, they rank hospitals based on suspicious coding and spending patterns. The authors find that their method substantially outperforms random or payment-based targeting: among the top 50 hospitals flagged, 21 had been previously named in Department of Justice fraud actions, representing nearly a fivefold improvement over random auditing.


Why This Article Was Selected for The Policy Scientist

Recent enforcement actions underscore the scale and persistence of fraud across government programs, reinforcing the policy relevance of this article. Over the past several years, federal authorities have announced record-setting health care fraud takedowns involving tens of billions of dollars in alleged Medicare and Medicaid overbilling, alongside major cases in pandemic unemployment insurance, SNAP benefits, and state-administered social services. These discoveries highlight systemic vulnerabilities in large, rules-based payment systems that rely on claims processing at scale. Against this backdrop, the article is timely and important: it speaks directly to the growing gap between program size and enforcement capacity, and to the need for scalable, data-driven tools that can improve targeting without relying solely on whistleblowers or ex post prosecutions.

Full Citation and Link to Article

Shekhar, S., Leder-Luis, J., & Akoglu, L. (2026). Can machine learning target health care fraud? Evidence from Medicare hospitalizations. Journal of Policy Analysis and Management, 45(1), Article e70078. https://doi.org/10.1002/pam.70078 


Central Research Question

The central research question of Shekhar, Leder-Luis, and Akoglu (2026) is whether unsupervised, explainable machine learning methods can be used to reliably identify hospitals engaging in anomalous billing behavior consistent with health care fraud in Medicare inpatient hospitalizations. More specifically, the authors ask whether it is possible to detect suspicious hospitals using only claims data, without relying on labeled examples of known fraud, and whether such methods can meaningfully improve the targeting of audits and investigations relative to existing approaches. A related question is whether the resulting signals are interpretable enough to be operationally useful for enforcement agencies that must justify and prioritize investigative actions under resource constraints. The study is explicitly framed around improving detection and targeting rather than proving fraud in a legal sense, recognizing the distinction between statistical suspiciousness and adjudicated wrongdoing.


Previous Literature

The paper builds on several strands of prior research. In economics and health policy, earlier work has documented widespread incentives for overbilling under prospective payment systems, particularly through upcoding diagnoses and exaggerating patient severity. Classic studies by Dafny, Silverman and Skinner, and others demonstrated how hospitals respond to payment incentives by shifting coding behavior, often in ways consistent with strategic manipulation. That literature, however, generally relies on specific policy changes, narrow clinical settings, or reduced-form regressions that test for average responses rather than provider-level detection.


In the computer science and data mining literature, a parallel body of work has explored anomaly detection in health care claims, including provider profiling, clustering, and supervised classification using known cases of fraud. These approaches often depend on labeled data derived from prior enforcement actions, which are incomplete and potentially biased because enforcement is nonrandom. Moreover, many existing methods function as black boxes, offering limited explanation for why a provider is flagged. The authors position their contribution as filling a gap between these literatures by developing an unsupervised, explainable approach that operates at scale, does not rely on prior enforcement labels, and produces outputs that can guide real-world auditing decisions.


Data

The analysis uses large-scale administrative data from Medicare inpatient hospitalizations. The primary sample consists of all inpatient claims from acute care hospitals in 2017, covering approximately 7.3 million claims, 4.6 million beneficiaries, and 2,207 hospitals, representing roughly $80 billion in Medicare spending. To adjust for patient complexity and medical history, the authors construct detailed beneficiary histories using inpatient, outpatient, and physician claims from 2012 through 2016. These data include diagnostic and procedure codes, billing codes, demographic information, and indicators for chronic conditions.


Hospital characteristics are drawn from Medicare Provider-of-Service files, including ownership type, location, and teaching status. For validation, the authors assemble a novel dataset of hospitals named in Department of Justice press releases related to Medicare fraud, spanning multiple years. While this DOJ dataset is incomplete and reflects enforcement capacity and priorities rather than the full universe of fraud, it provides a partial ground truth for evaluating whether the algorithm disproportionately flags hospitals that have previously faced civil or criminal fraud actions.


Methods

The authors develop a multi-view, unsupervised anomaly detection framework that combines evidence from three distinct detection models. The first model analyzes hospitals’ use of ICD-10 diagnosis and procedure codes. Hospitals are represented by high-dimensional vectors capturing code frequencies, adjusted for semantic similarity between codes. Subspace anomaly detection methods are then applied to identify hospitals that deviate from common coding patterns in localized subsets of codes, consistent with covert upcoding behavior.


The second model uses a peer-based approach focused on Diagnosis-Related Group (DRG) billing patterns. Hospitals are grouped with peers that treat similar patient populations or provide similar categories of care. Each hospital’s DRG distribution is compared to that of its peers, and excess spending is calculated based on differences in DRG usage weighted by average reimbursement amounts. This model directly links anomalous coding behavior to excess Medicare payments.


The third model estimates hospital fixed effects from a large regression predicting inpatient spending as a function of patient medical history and demographics. Hospitals with unusually high residual spending, conditional on patient characteristics, are treated as more suspicious.


Each model produces a ranking of hospitals. These rankings are combined using an instant-runoff voting procedure to generate a single aggregate suspiciousness ranking. Importantly, all models are designed to be explainable, allowing investigators to trace suspicious rankings back to specific codes, DRGs, or excess spending patterns.


Findings/Size Effects

The aggregate model substantially outperforms simple baselines such as ranking hospitals by average spending or random selection. Among the top 50 hospitals identified as most suspicious, 21 had previously been named in DOJ fraud actions, compared with an expected rate of roughly four under random targeting. This represents nearly a fivefold lift over random selection. Across the full ranking, the model achieves approximately a twofold improvement in detection rates relative to baseline approaches.


The authors also find that the ICD codes contributing most to suspicious rankings tend to be relatively common, high-reimbursement diagnoses with ambiguous clinical definitions, rather than rare or specialized conditions. This pattern is consistent with strategic coding behavior rather than legitimate specialization. Peer-based analyses show that flagged hospitals systematically use more expensive DRGs than similar hospitals treating comparable patients, generating measurable excess spending per claim. Robustness checks using emergency department admissions and excluding Medicare Advantage patients yield qualitatively similar results, though with smaller effect sizes.


Conclusion

The study demonstrates that unsupervised, explainable machine learning methods can meaningfully improve the targeting of health care fraud investigations in Medicare inpatient care. While the approach does not establish causality or legal proof of fraud, it provides a scalable and interpretable tool for prioritizing audits in a system characterized by massive data volume and limited enforcement resources. The quality and breadth of the Medicare data support the internal validity of the findings, and the general framework is plausibly applicable to other public and private insurance systems that rely on claims-based reimbursement. From a methodological standpoint, the paper relies primarily on anomaly detection and regression rather than causal inference. Future research incorporating quasi-experimental designs or randomized audit assignments could strengthen causal interpretation and further integrate these tools into evidence-based enforcement strategies.

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Screenshot of Greg Thorson
  • Facebook
  • Twitter
  • LinkedIn


The Policy Scientist

Offering Concise Summaries*
of the
Most Recent, Impactful 
Public Policy Research

*Summaries Powered by ChatGPT

bottom of page