Nov 15, 2025
Research Fellow:
- Stephan Mabry, MacroPraxis Research Institute Fellow
Link To Research Paper
ABSTRACT
Healthcare technology systems are evolving rapidly, driven by advancements in artificial intelligence, improved treatments, and the growing needs of clinicians and patients. To ensure seamless communication between these systems, interoperability standards like HL7, FHIR, and DICOM facilitate data exchange. However, maintaining the reliability and security of these systems through automated testing remains a challenge due to the complexity of healthcare data and frequent system updates, making manual testing impractical. Despite this need, research on automated testing for healthcare interoperability, particularly in real-time FHIR-based environments, is limited. This paper defines this gap by surveying the current state of automated testing in healthcare interoperability, identifying research gaps, and exploring relevant methodologies from other industries that could be adapted to healthcare. Analyzing 26 studies, it highlights the limitations of current testing approaches and the absence of healthcare-specific testing metrics. The study's main contribution is the identification of critical research gaps and a proposed direction for developing new tools and metrics tailored to healthcare needs. The findings underscore the necessity of expanding automated testing frameworks to ensure secure, real-time data exchange, especially in FHIR-based systems. These conclusions are supported by a comprehensive literature review and detailed analysis of existing methodologies
Research Summary
Stephan Mabry’s research highlights how the rapid expansion of healthcare data ecosystems—and their shift toward real-time, API-driven interoperability—has far outpaced the evolution of testing practices needed to keep them reliable. Drawing on the detailed ecosystem diagram in the paper (p.2), Mabry shows that modern health systems rely on dozens of interconnected applications (EHRs, LIS, RIS, PACS, CRM systems, IoT devices, etc.), all exchanging data through HL7, FHIR, DICOM, or document-centric standards like C-CDA. This complexity multiplies the risk of downstream clinical failures every time a system updates—some on quarterly cycles, others biweekly—yet manual testing remains the dominant approach. The literature review (Tables 3.1–3.4) reveals that while interoperability research is rich, studies that examine how to systematically and repeatedly test the correctness of these exchanges are rare, and automated testing of interoperability is almost nonexistent.
From the 74 papers initially identified, only three addressed automated testing for healthcare interoperability at all, a distribution made clear in Chart 5.1 (p.12). Even more striking is that testing studies overwhelmingly focus on HL7 (Chart 5.2), while real-time FHIR—which is now the industry’s most important interoperability standard—has almost no automated testing research devoted to it. Mabry concludes that healthcare lacks not only automation frameworks but also testing metrics: none of the surveyed papers discuss coverage metrics, fault detection rates, workflow accuracy, or patient-safety-aligned measures. The paper argues that future progress requires domain-specific metrics, automated frameworks tailored for FHIR APIs, AI-based cyber-resilience testing, and semantic-level validation capable of handling unstructured data and clinical context. Without closing these gaps, healthcare systems will continue to operate with interoperability that is fragile, inconsistently validated, and increasingly misaligned with the critical, real-time demands of modern clinical environments.
Key Takeaways
- Automated testing for interoperability in healthcare is almost nonexistent, with only three studies identified—far too few given the complexity and clinical stakes involved in real-time data exchange (Chart 5.1).
- FHIR—the dominant modern interoperability standard—has nearly no automated-testing research, despite its central role in API-driven, real-time clinical workflows (Chart 5.2).
- Current testing methods rely on outdated, manual, or non-scalable approaches, many inherited from earlier HL7 and document-exchange systems, and are inadequate for high-frequency system updates and multi-system environments.
- Healthcare lacks both standard testing metrics and healthcare-specific testing metrics, meaning there is no established way to measure test coverage, semantic accuracy, workflow integrity, or patient-safety impact.
- The biggest opportunities for innovation lie in AI-driven automated testing, real-time validation tools, and semantic-interoperability testing, all of which are needed to secure FHIR exchanges, strengthen cyber-resilience, and ensure clinical reliability at scale.