Research

Affordable and real‑time antimicrobial resistance prediction from multimodal electronic health records

April 27, 2025

Introduction

The discovery of penicillin marked the beginning of antibiotic use, revolutionizing the treatment of infections and significantly reducing mortality rates. However, the rise in broad-spectrum antibiotic prescriptions and improper usage has led to increased antimicrobial resistance (AMR), posing a significant clinical and economic challenge. AMR is currently projected to be a leading global health threat by 2050, potentially causing around 10 million deaths annually if not controlled.[1]

Acquiring susceptibility results for antibiotic prescriptions can be time-consuming; in some regions, antibiotics are used without prior cultures. Faster methods, like DNA assessments, are expensive and not widely accessible. Artificial intelligence (AI) advancements offer promising solutions to improve speed and affordability in healthcare. Machine learning (ML) can enhance diagnosis, support clinical decision-making, and manage antibiotic prescriptions. Previous studies mainly utilized ML for AMR prediction, focusing on using decision algorithms or AutoML approaches on ICU EHR (Electronic Health Records) data. While these methods have shown promise, they often involve manual feature selection, rely on data quality, or face challenges with class imbalance.

Researchers at BioMedia Lab, MBZUAI have developed a method that aims to integrate seamlessly with existing clinical platforms, offering affordable, accessible, and real-time predictions without additional tests or interventions. Their method provides robust predictions and supports clinical decision-making using deep learning on multimodal EHR data—including time-invariant data, time-series data, and clinical notes. This research, published in Nature Scientific Reports, represents a novel attempt to address AMR using deep learning techniques to leverage the full potential of multimodal EHR data, advancing the field of microbiology and contributing to more effective AMR strategies.

This work makes several key contributions:

  • Affordable, Real-Time Solutions: The researchers present a cost-effective and real-time method for predicting antimicrobial resistance (AMR) using advanced deep-learning techniques that integrate multiple types of data.
  • Comprehensive Performance Evaluation: The study evaluates prediction performance across various criteria, including data volume, dataset imbalance, and time-series features, focusing on decision-making during ICU stays.
  • Utilizing Existing EHR Data: By using readily available and cost-effective EHR, the researchers provide clinicians with timely insights based on patients’ historical data, enhancing decision-making in AMR management.

Overall Framework

The approach, detailed in Fig. 1 and inspired by prior studies, begins with preprocessing raw EHR data using the FIDDLE pipeline [2]. This pipeline extracts critical information about ICU stays and patients’ medical histories before ICU transfer, organizing the data into three modalities: time-series, time-invariant, and clinical notes.

For prediction tasks, the methodology targets two main areas: forecasting pathogen resistance to antibiotics and predicting resistance based on the antibiotic taken or the infectious pathogen.

Data encoding is managed as follows: time-invariant data is processed using a linear layer, clinical notes are encoded with ClinicalBert, and time-series data is handled using LSTM, StarTransformer, or a transformer encoder.

The researchers explore various fusion mechanisms, including MAGBERT [3], which employs a multimodal adaptation gate and has demonstrated strong performance in prior research; Tensor Fusion [4], which integrates information from multiple modalities; and Attention Fusion [5], which uses attention mechanisms for integrating modalities. Additionally, the Multimodal InfoMax (MMIM) [6] approach is utilized to maximize mutual information between and within modalities, incorporating three loss functions: Ltask for predicting antimicrobial resistance (AMR), LBA for inter-modality mutual information, and LCPC for intra-modality mutual information.

Fig 1. The researchers begin their pipeline with raw EHR data, which is reformulated using the FIDDLE framework into two modalities: time-invariant and time-series, along with clinical notes. These three modalities are then encoded and processed through four different fusion mechanisms. Notably, the MMIM method, which employs three losses for distinct tasks, outperforms other fusion methods by maximizing mutual information both at the input modalities and fusion levels. A fusion network is subsequently utilized to predict resistance.

Results

Cohort

In this study, researchers used the MIMIC-IV database, which contains detailed electronic health records (EHR) for patients in critical care at Beth Israel Deaconess Medical Center from 2008 to 2019. This comprehensive dataset includes several types of information:

  • Time-Invariant Data: Patient demographics, admissions, ICU stays, diagnoses, procedures, and medical history before ICU admission.
  • Time-Series Data: Microbiology events, input and output events, non-microbiology lab events, charted events, and procedure events.
  • Clinical Notes: Discharge and radiology notes, which were tokenized using the WordPiece method.

The study focused on patients who underwent susceptibility tests during their ICU stays. The objective was to predict whether these patients would exhibit resistance or susceptibility to specific conditions.

Data management was handled using the FIDDLE preprocessing pipeline, which prepares the MIMIC-IV data for analysis. FIDDLE operates in three stages:

  1. Pre-Filtering: Removes rare variables.
  2. Transformation: Converts data into time-series and time-invariant matrices.
  3. Post-Filtering: Refines these matrices.

FIDDLE is designed to simplify the data preparation process by minimizing the number of decisions required, making fewer assumptions, and effectively addressing missing values through a carry-forward method. Although initially developed for the MIMIC-III database, it was adapted for MIMIC-IV.

In a clinical setting, predictions from the machine learning model can be made anytime after patient admission, provided sufficient data is available. The prediction time, denoted as T, is when the lab order is generated from the MIMIC dataset and can occur at any point between admission (n) and discharge (D).

Figure 2. This figure shows the healthcare setting in which our approach can be used to support antibiotic stewardship programs in hospitals. 

Data Exploration

In analyzing the microbiology data from MIMIC-IV, the researchers encountered 27 types of antibiotics and 647 types of pathogens. The dataset also includes information on the types of culture samples collected from patients, with urine cultures being the most prevalent.

To determine the most effective approach for predicting antimicrobial resistance (AMR), two perspectives were explored for assigning resistance labels: one based on patients infected with the same pathogen and the other based on patients tested with the same antibiotic. By examining patients with the same pathogen, the researchers aimed to understand if the pathogen’s resistance strength could offer insights into the patient’s response.    

The initial exploration involved assessing the availability of various antibiotics and pathogens in the dataset. Figure 3a highlights the top six antibiotics prescribed, with Gentamicin being the most frequently used, thus making it a key focus for further analysis. Figure 3b illustrates the top seven pathogens based on their presence in the dataset.

To identify the most appropriate pathogen for the model, the researchers examined imbalance factors, starting with ICU stays involving the three most common pathogens shown in Figure 3b. Imbalance is a significant issue because antibiotic resistance cases are relatively rare compared to susceptibility cases, which can skew the model’s predictions toward the more common category and reduce the reliability of the results.

The analysis revealed a high imbalance rate, indicating low resistance rates among the antibiotics prescribed for these pathogens. Therefore, the researchers focused on P. aeruginosa, balancing the number of ICU stays with the imbalance rate. Based on the choice of antibiotics and pathogens, FIDDLE was used to create separate datasets, carefully considering each factor when defining the labels.

Figure 3: The most common antibiotics and pathogens present in the MIMIC-IV dataset.

Dataset after merging modalities

In the study focusing on Gentamicin, researchers evaluated predictions at three specific time points: T=3, T=4, and T=10 hours. The study involved merging various modalities to prepare the data for machine learning models, as the FIDDLE pipeline produces separate matrices for each data type.

For T=3, the dataset included 6,095 time-series input features and 70 time-invariant input features, with 1,322 ICU stays and an imbalance factor of 12.35. At T=4, there were 5,992 time-series features and 72 time-invariant features, with 1,245 ICU stays and an imbalance factor of 11.45. For T=10, the dataset comprised 7,038 time-series features and 67 time-invariant features, with 1,287 ICU stays and an imbalance factor of 14.30.

In the study on Pseudomonas aeruginosa (P. aeruginosa), the dataset featured 4,924 time-series input features and 69 time-invariant input features, with an imbalance factor of 3.15.

To evaluate the effectiveness of integrating multiple data types, various models were tested on different data subsets, including time-invariant data only, time-series data only, text data only, and a combination of all three modalities. These models were assessed across different datasets and parameters to determine how well each approach addressed the antimicrobial resistance (AMR) prediction problem.

Baseline Models

For the antibiotic resistance task, researchers worked with datasets involving patients who underwent susceptibility tests for Gentamicin using MIMIC-IV.

In the first dataset, with a prediction time of 4 and a time granularity of 1, the experiments used a batch size of 20, a learning rate of 1e−4, BERT size of 768, and binary cross-entropy (BCE) with logits and class weights as the loss function. Results showed that with MAGBERT as the attention mechanism, StarBert achieved the highest performance. 

For the second dataset, with a prediction time of 3 and a time granularity of 1, researchers used a batch size of 20, a learning rate of 5e−5, and BERT size of 768. Here, BertEncoder provided the best performance. Similar configurations were tested on a dataset with a high imbalance ratio of 14.3, but the results were insufficient for further experimentation due to the high imbalance rate.

Additionally, in predicting AMR from the pathogen perspective, the best results using the same encoders showed that BertStar achieved the highest performance. Other tested combinations yielded less favorable results and were excluded from the analysis.

Table 1. The result of the MIMIC-IV datasets on the test set was the weighted BCE loss and a batch size of 20. The fusion models use the MAGBERT mechanism. The table also presents the average AUROC and its confidence interval (CI).

Comparison of Fusion Mechanisms

The study evaluated several fusion mechanisms using the same datasets, comparing their performance in predicting antimicrobial resistance. The results, detailed in Table 2, include outcomes for attention fusion, tensor fusion, and Multimodal InfoMax (MMIM).

For the Gentamicin task, tensor fusion outperformed attention fusion and MAGBERT, particularly when LSTM was used as the time-series encoder. MMIM achieved the highest performance, with the best AUROC and AUPR scores for both T=3 and T=4 datasets. On the Pseudomonas aeruginosa dataset, MMIM also provided the best AUROC, while tensor fusion delivered the highest AUPR.

Confidence intervals for AUROC scores were approximately ±0.05 for Gentamicin and ±0.03 for Pseudomonas aeruginosa, reflecting the reliability of the results.

Method

The researchers focus on adapting the FIDDLE framework for the antibiotic resistance (AMR) task, specifically utilizing Gentamicin and P. aeruginosa datasets. They begin by extracting data from the MIMIC-IV database, targeting patients with antibiotic sensitivity test results. The Gentamicin cohort includes 13,658 ICU stays, with a significant imbalance between sensitive and resistant cases. The P. aeruginosa dataset, being smaller, comprises 2,103 ICU stays and involves five antibiotics. The researchers filter the data by removing patients under 18 and those discharged or deceased before the prediction hour. They assign labels based on resistance results within the onset hour.

They used three EHR data modalities to conduct their analysis: time-invariant, time-dependent, and clinical notes. They encode time-invariant data using a fully connected layer with ReLU activation, time-series data is encoded using models such as LSTM, StarTransformer, or the original transformer encoder. Clinical notes are encoded using ClinicalBERT. These encoded features from all modalities are then fused for further analysis.

In exploring fusion mechanisms, the researchers discuss four approaches initially designed for multimodal sentiment analysis (MSA) and adapt them for AMR classification using EHR data. Their baseline fusion mechanism is the multimodal attention gating (MAGBERT) using BERT, while they also explore three other mechanisms: tensor fusion, attention fusion, and Multimodal InfoMax (MMIM). Attention fusion addresses long-range dependencies between modalities, whereas tensor fusion captures unimodal, bimodal, and trimodal interactions. MMIM, which outperforms the other mechanisms, is discussed in greater detail.

The MMIM approach maximizes mutual information (MI) between and within modalities to enhance prediction accuracy. The fusion network in their model comprises two parts: one for prediction and another for MI estimation. At the input level, the researchers optimize MI by estimating correlations between pairs of modalities using a multivariate Gaussian distribution, with clinical notes as the dominant modality. They compute entropy using a Gaussian Mixture Model (GMM), accounting for the imbalanced distribution of classes in EHR data.

At the fusion level, MMIM aims to extract modality-invariant information by maximizing MI between the modalities and their fused outcome. This is achieved through Contrastive Predictive Coding (CPC), where the correlation between fused and individual modality representations is optimized. The researchers combine the MI losses at the input and fusion levels with the cross-entropy loss to balance their contributions during training.

Discussion

This study investigates how fusing different modalities in EHR data influences the performance of antimicrobial resistance (AMR) classification. The researchers aim to capture complex patterns within the diverse EHR data by leveraging multimodal deep-learning techniques. 

Performance Analysis of the AMR Task

In their experiments, the researchers tested different prediction times for the Gentamicin and P. aeruginosa datasets. For the Gentamicin dataset, with a prediction time of 4 hours, time-series encoders alone did not achieve high AUROC or AUPR scores. Using BERT solely for text data yielded similar results. However, when the researchers combined the data and employed the MAGBERT fusion method, AUROC improved by 10% with StarBert. The MMIM fusion mechanism produced the best results for Gentamicin, yielding higher AUROC and AUPR scores. To address class imbalance, they used weighted cross-entropy loss, which contributed to improved performance.

In AMR tasks, there is often a trade-off between AUROC and AUPR. This is typical with imbalanced data: a low number of positive cases can lead to a lower AUPR, while AUROC remains high if the model is still effective at distinguishing between classes.

Table 2. Results of the three attention mechanisms implemented as alternatives to MAGBERT. The table includes the outcomes of the Gentamicin and the P. aeruginosa tasks, and the average AUROC for each cohort is shown with its confidence interval (CI). “Attn” refers to the attention fusion mechanism, “Outer” refers to the tensor fusion mechanism, “MMIM” refers to Multimodal Infomax, “W.BCE” refers to weighted binary cross entropy, “C.E.” refers to cross entropy. All the Gentamicin experiments are done with a learning rate of 5e − 5, while the P. aeruginosa ones used a learning rate of 1e − 5.

Performance Analysis of the Gentamicin Task

For Gentamicin resistance prediction, a 3-hour prediction window with 1-hour granularity led to lower performance compared to a 4-hour window. Single-modality models, such as those using only time-series data, had inadequate AUROC scores for clinical use.

Fusion methods improved performance. MAGBERT with BertEncoder achieved the highest AUROC and AUPR, while BertStar had a similar AUROC but a lower AUPR. Models focusing on clinical notes generally outperformed those centered on time-series data.

Tensor fusion with LstmBert outperformed LstmBert with MAGBERT. MMIM provided the best results, with higher AUROC and AUPR using LSTM for time-series data.

Overall, merging modalities for the 3-hour window resulted in fewer patients and features compared to the 4-hour window, leading to reduced performance. The 10-hour prediction window showed weak results due to high data imbalance, emphasizing that data imbalance impacts performance more than the number of features.

Performance Analysis of P. aeruginosa

For P. aeruginosa, the focus was on time-series encoders like LSTM and Star. As with Gentamicin, using multiple modalities—particularly MMIM—improved performance compared to single-modality approaches.

The best results were achieved with LstmBert using MMIM, which provided the highest AUROC. The tensor fusion mechanism offered the best AUPR. The P. aeruginosa dataset showed a higher AUPR than the Gentamicin models, owing to its lower imbalance rate.

However, predicting based solely on the pathogen is less useful for clinical decisions, as it doesn’t offer much guidance on which antibiotic to prescribe. Therefore, it’s less personalized and less informative for practitioners compared to methods that consider more detailed patient data.

Fusion Mechanisms

In their study, researchers evaluated four fusion mechanisms for electronic health records (EHR) data:

  • Attention Fusion: Struggled with EHR data due to its design for synchronized data, such as in sentiment analysis. The lack of synchronization in EHR data hindered its effectiveness.
  • Tensor Fusion: Performed better by creating distinct regions for each modality and their combinations through an outer product. This approach more effectively captured interactions between different data types.
  • Multimodal InfoMax (MMIM): Excelled as the best performer. It maximizes mutual information between modalities and their fusion results, effectively integrating time-series and time-invariant data. Its hierarchical design captures complex relationships and interactions.
  • Comparative Mechanisms: Focused on the relationship between time-series data and clinical notes but treated all modalities equally, which was less effective given that some modalities can be more informative than others.

Overall, MMIM provided the best results due to its ability to handle complex interactions and effectively utilize different types of data.

Conclusion

Addressing antimicrobial resistance (AMR) necessitates thoroughly analyzing electronic health record (EHR) data and machine learning models. The FIDDLE pipeline, suitable for the MIMIC database, leverages diverse EHR modalities for effective data processing and multimodal fusion. Among the fusion mechanisms tested for AMR prediction tasks, Multimodal InfoMax (MMIM) proved to be the most effective.

This research highlights the importance of prediction time for time-series data in determining model performance. Advanced machine learning techniques demonstrated significant potential in predicting antimicrobial resistance, suggesting a promising intersection between artificial intelligence and microbiology.

Future work should explore additional antibiotics and pathogens, integrate solutions with prescription algorithms, and extend the framework to outpatient and non-ICU settings. Enhancing models with irregular time-series encoders could further improve performance.

To learn more, check out the resources below:

You Might Also Like

No Comments

Leave a Reply

Disclaimer: All views and opinions expressed in these posts are those of the author(s) and do not represent the views of MBZUAI.

Subscribe to Stay Updated