Continuous prediction of glucose levels and hypoglycemia events is critical for managing type 1 diabetes mellitus (T1DM) under intensive insulin therapy. Existing models focus on a single task, limiting their practicality and adaptability in automated insulin delivery (AID) systems. To address this, a domain-agnostic continual multi-task learning (DA-CMTL) framework that simultaneously performs glucose level forecasting and hypoglycemia event classification within a unified framework is proposed. Trained on simulated datasets via Sim2Real transfer and adapted using elastic weight consolidation, DA-CMTL supports cross-domain generalization. Evaluation on public datasets (DiaTrend, OhioT1DM, and ShanghaiT1DM) yielded a root mean squared error of 14.01 mg/dL, mean absolute error of 10.03 mg/dL, and sensitivity/specificity of 92.13%/94.28% on 30 min prediction. Real-world validation using diabetes-induced rats demonstrated a reduction in time below range from 3.01% to 2.58%, supporting reliable integration as a safety layer in AID systems. These results highlight DA-CMTL's robustness, scalability, and potential to improve safety in AID.
Intensive insulin therapy is essential for blood glucose management in individuals with T1DM. In this context, the automated insulin delivery (AID) system, also known as the artificial pancreas system, has led to remarkable advancements in T1DM management by enhancing the accuracy and convenience of insulin administration. This system comprises continuous glucose monitoring (CGM) sensors, insulin pumps, and control algorithms that process CGM glucose readings and user inputs to dynamically adjust insulin delivery in real time. This approach helps maintain optimal time in range (TIR) -- the proportion of time glucose readings remain between 70 and 180 mg/dL. Numerous clinical studies have validated the effectiveness of AID systems, revealing substantial improvements in glycemic outcomes such as increased TIR and reduced HbA1c levels. For instance, a UK study demonstrated a reduction in HbA1c from 9.4% to 7.8% and a notable increase in TIR from 34.2% to 61.7% among adults using AID. Additionally, Brown et al. observed TIR improvements of 15.6% in children and 9.3% in adults, further confirming the benefits of AID systems across different age groups.
As the adoption of AID systems continues to expand, the development of accurate glucose level prediction and robust hypoglycemia event classification algorithms is essential to ensure the safety and efficacy of insulin therapy. This need is clarified by two key considerations. First, correction boluses are frequently required, even after the administration of a meal bolus. For example, an analysis of 996 insulin pump users in the United States found that correction boluses accounted for ~12% of the total daily insulin dose, reflecting a significant burden of postprandial hyperglycemia. This requirement is largely attributed to delayed insulin dosing, inaccuracies in carbohydrate estimation, and the conservative dosing strategies utilized by many current AID algorithms. Accurate glucose forecasting is therefore critical to enable timely and autonomous correction dosing, thereby supporting optimal TIR. Second, AID systems represent a form of intensive insulin therapy, inherently increasing the risk of hypoglycemia. Given the potential severity of hypoglycemic events (e.g., seizures, coma, and vision impairment), early and reliable detection is essential to enable prompt suspension of insulin delivery and mitigate adverse outcomes. Such safety mechanisms are fundamental to the successful and responsible deployment of AID technologies.
Recent advancements in prediction algorithms for AID systems have increased interest in deep learning (DL)-based glucose prediction and hypoglycemia event classification. DL models offer the capacity to learn complex nonlinear relationships between glucose dynamics and various physiological and behavioral factors (e.g., insulin, meals, patient characteristics), thereby improving glycemic prediction performance. In glucose level prediction, Pérez-Gandía et al. first applied artificial neural networks for glucose prediction, laying the foundation for subsequent research. Recurrent neural networks (RNNs), such as long short-term memory (LSTM) and gated recurrent units (GRU), have since gained prominence due to their effectiveness in modeling temporal dependencies. For instance, Martinsson et al. reported a root mean squared error (RMSE) of 18.87 mg/dL on the OhioT1DM dataset using an LSTM-based model with a 30 min prediction horizon (PH), while Alshehri et al. demonstrated the efficiency of GRU-based models, citing comparable accuracy with reduced computational complexity. More recently, advanced models have emerged to further enhance predictive performance. Zhu et al. employed a temporal fusion transformer (TFT), achieving RMSEs of 19.10 and 12.70 mg/dL on the OhioT1DM and ShanghaiT1DM, respectively, under a 30-minute PH. Piao et al. introduced a graph attentive RNN (GARNN) that yielded RMSEs of 18.97 and 13.62 mg/dL on the same datasets. In parallel, Montaser et al. proposed seasonal local modeling frameworks for glucose prediction using variable-length, time-stamped events, demonstrating strong adaptability and performance across patient-specific trajectories. These efforts collectively reflect a growing emphasis on capturing individual variability, temporal complexity, and multimodal dependencies. However, many models remain vulnerable to overfitting on dataset-specific patterns, limiting their applicability to diverse populations and undermining generalizability across domains. While glucose forecasting has progressed as a distinct research pillar, hypoglycemia event prediction has evolved largely in parallel, using both feature-based and time series-based methodologies. Earlier approaches relied on handcrafted features such as CGM trends, insulin-on-board (IOB), and carbohydrates-on-board (COB), processed through traditional machine learning (ML) models such as random forests (30 min PH; sensitivity/specificity: 89.6/91.3%), support vector regression (30 min PH; 96.0/97.0%), support vector machine (100% sensitivity in 17 cases). Although these models offered computational simplicity, they were often inadequate for modeling dynamic glucose fluctuations, especially under postprandial, nocturnal, or exercise-induced variability. Subsequent approaches leveraged DL architectures such as deep belief networks and fully connected neural networks (FCNNs) to capture richer feature representations and improve sensitivity. However, their reliance on manually engineered inputs limited scalability and hindered integration with glucose forecasting frameworks. More recent models have shifted toward time series-based classifiers that directly learn from CGM glucose readings. For example, LSTM-based and transformer-based architectures have demonstrated improved performance and generalization across populations. Notably, the LSTM model proposed by Shao et al., trained on Chinese patient data, outperformed traditional ML baselines when tested on European-American cohorts, highlighting cross-domain potential. Despite these advances, most hypoglycemia prediction models remain decoupled from glucose forecasting pipelines or are developed in isolation for specific datasets. Treating prediction and classification as independent tasks impairs the coordinated operation required for real-time insulin delivery systems. In particular, this separation leads to a fragmented model architecture, necessitating distinct inference pathways for each task and resulting in asynchronous outputs. Such a design increases computational burden and latency, which directly undermines the feasibility of deploying responsive and unified feedback mechanisms essential for closed-loop control in AID systems. Beyond algorithmic development, practical and ethical barriers further hinder the deployment of adaptive AID systems. Collecting real-world data from a single patient over one year is estimated to cost at least $2940, excluding the time-consuming processes of recruitment, experimentation, and ethical approval, which can take a minimum of one year. Additionally, privacy concerns remain a major barrier to data sharing, with over 20 studies citing them as a leading issue in healthcare AI deployment. Thus, integration, generalization, and large-scale real-world data collection remain critical challenges to be addressed for reliable and responsible deployment.
In summary, despite promising advancements, current DL-based research is limited by three key factors: system complexity resulting from task separation, poor generalization across populations, and high dependence on real-world data. These limitations hinder the deployment of scalable and adaptive insulin delivery solutions. To address these challenges, we propose a domain-agnostic continual multi-task learning (DA-CMTL) framework that leverages simulation-to-real (Sim2Real) transfer. DA-CMTL adopts a multi-head architecture to jointly perform glucose prediction and hypoglycemia event classification, allowing task-specific modeling while leveraging shared temporal features. This unified design enhances task synergy and supports efficient inference within real-time AID applications. Moreover, Sim2Real transfer -- where a model trained in simulated environments is deployed in real-world applications -- is employed to enable generalization while reducing the cost associated with real-world data collection. In this context, we conceptualize Sim2Real not merely as a data domain transfer but as a generalization strategy that leverages physiologically diverse, simulated scenarios to build robust representations. Crucially, model performance was ultimately evaluated on real-world datasets collected under free-living conditions, confirming its applicability beyond controlled simulations. Unlike traditional methods, such as data augmentation or SMOTE, which improve data diversity by modifying or oversampling existing samples, Sim2Real leverages physiologically validated simulators to generate synthetic patient profiles with systematic variability. This includes adverse and infrequent conditions, such as prolonged hypoglycemia or atypical glucose-insulin responses, which are often underrepresented in real-world datasets. Incorporating such difficult-to-capture scenarios into training enables the model to generalize across a broader range of clinically meaningful conditions while reducing reliance on large-scale data collection from real patients. Similar approaches have been applied in various healthcare AI, including diabetes management, tumor segmentation, and autonomous surgical navigation, demonstrating that simulation-based training can achieve high performance and safety in clinical environments. While simulated data enables scalable training, the transition to real-world application introduces a critical barrier: domain shift, which arises from discrepancies in data characteristics between simulated and real-world environments. These distributional differences can degrade model performance when applied across domains. To address this issue, elastic weight consolidation (EWC), a continual learning (CL) method, is incorporated. EWC facilitates sequential learning from diverse simulated datasets while preventing catastrophic forgetting by introducing a regularization term in the loss function. This allows the model to retain knowledge from previously learned domains, thereby improving robustness and ensuring reliable deployment in practical settings. We refer to this property as domain-agnostic, indicating the model's ability to generalize without relying on domain-specific adaptation. The proposed DA-CMTL framework advances algorithm development for insulin delivery systems through the integration of multi-task learning (MTL) while mitigating both data acquisition costs and domain-specific performance issues via Sim2Real transfer and CL techniques.
The primary contributions of this research are threefold:
The remainder of this paper is organized as follows: section "Results" presents our experimental results, section "Discussion" discusses key findings and clinical implications, and section "Methods" details the model architecture and training methodology.