Designing self-healing database fabrics for real-time payment rails
Abstract
Real-time payment platforms operating at scale face an unforgiving operational reality: even brief outages translate directly into failed transactions, regulatory exposure, and eroded customer trust. Database replication and failover automation have matured considerably over the past two decades, yet a troubling blind spot remains. Recovery frameworks built for general-purpose distributed systems were never designed with settlement finality in mind, and that design omission leaves payment operators exposed to split-brain scenarios that generic high-availability tooling cannot reliably prevent. This paper addresses that omission head-on through a self-healing database fabric purpose-built for payment rail environments. The proposed autonomous resilience fabric architecture (ARFA) operates across three coordinated layers: a continuous monitoring layer that harvests telemetry from compute, storage, and network subsystems; a decision layer that fuses rule-based heuristics with an ensemble of isolation forests, recurrent neural networks, and gradient boosting classifiers to separate genuine fault conditions from transient noise; and a deterministic action layer that executes recovery procedures anchored to explicit settlement finality constraints. In fault injection trials covering node crashes, network partitions, replication lag, and performance degradation, the architecture cut average recovery times by 88% against manual baselines, restoring service in roughly 8 seconds rather than the 180 seconds that human-driven remediation typically requires. False positive rates held below 2% across all failure categories, and the system achieved a 98% recovery success rate. Taken together, these results make a practical case that autonomous resilience and regulatory compliance reinforce rather than conflict with each other when the regulatory constraints are designed in from the start.
Keywords
Automated recovery; Autonomous fault management; Database resilience; Fault tolerance; Self-healing systems
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v16i3.pp1360-1368
Copyright (c) 2026 Raghu Gollapudi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by theĀ Institute of Advanced Engineering and Science (IAES).