Privacy-preserving federated learning and uncertainty quantification in medical imaging

原題: Privacy-preserving federated learning and uncertainty quantification in medical imaging 著者: N Koutsoubis, A Waqas, Y Yilmaz… | 会議: pubs.rsna.org 2025 | 引用: 0 PDF: koutsoubis25a.pdf


Privacy-preserving Federated Learning and Uncertainty Quantification in Medical Imaging Nikolas Koutsoubis, MS1,3 (http://orcid.org/0000-0001-6195-9360) Asim Waqas, PhD2 (http://orcid.org/0000-0002-6834-4710) Yasin Yilmaz, PhD3 Ravi P. Ramachandran, PhD4 (http://orcid.org/0000-0003-0011-4561) Matthew B. Schabath, PhD2 Ghulam Rasool, PhD1,3 (http://orcid.org/0000-0001-8551-0090) Author affiliations, funding, and conflicts of interest are listed at the end of this article. https://doi.org/10.1148/ryai.240637 Artificial Intelligence (AI) has demonstrated strong potential in automating medical imaging tasks, with potential applications across disease diagnosis, prognosis, treatment planning, and posttreatment surveillance. However, privacy concerns surrounding patient data remain a major barrier to the widespread adoption of AI in clinical practice, as large and diverse training datasets are essential for developing accurate, robust, and generalizable AI models. Federated Learning offers a privacy-preserving solution by enabling collaborative model training across institutions without sharing sensitive data. Instead, model parameters, such as model weights, are exchanged between participating sites. Despite its potential, federated learning is still in its early stages of development and faces several challenges. Notably, sensitive information can still be inferred from the shared model parameters. Additionally, postdeployment data distribution shifts can degrade model performance, making uncertainty quantification essential. In federated learning, this task is particularly challenging due to data heterogeneity across participating sites. This review provides a comprehensive overview of federated learning, privacy-preserving federated learning, and uncertainty quantification in federated learning. Key limitations in current methodologies are identified, and future research directions are proposed to enhance data privacy and trustworthiness in medical imaging applications.

©RSNA, 2025 This review article provides an in-depth analysis of the latest advancements in federated learning, privacy preservation, and uncertainty quantification in medical imaging. It also highlights current challenges and explores potential opportunities for improvement in these areas. Abbreviations AI = Artificial Intelligence, ML = Machine Learning, IID = Independent and Identically Distributed, PFL = Personalized Federated Learning, PPFL = Privacy Preserving Federated Learning Essentials • Federated learning enables multi-institutional training of AI models on medical imaging data without direct data sharing, overcoming key privacy barriers while maintaining model performance.


• Despite privacy benefits, federated learning remains vulnerable to information leakage through gradient updates; privacy-preserving strategies such as differential privacy and homomorphic encryption reduce this risk but introduce accuracy and efficiency trade-offs. • Uncertainty quantification in federated learning enhances model trustworthiness yet remains underutilized due to challenges posed by data heterogeneity and computational complexity. Advances in artificial intelligence (AI), driven by deep learning and the availability of large datasets and computational resources, continue to transform medical imaging. AI models trained on radiologic data, such as mammograms, CT scans, and MRIs, are poised to become invaluable tools in both clinical and research settings (1–3). However, curating large, annotated, domain- specific datasets remains challenging, due to privacy regulations and other factors. Unlike conventional AI model development methods that require pooling data at a single location, federated learning enables decentralized model development without sharing data (4,5). It allows for large-scale model training by sharing model gradient updates between sites rather than the training data. This enables multiple sites to act as clients and train a global model on the server, which is then shared with all sites. Federated learning has the potential to address many challenges related to data sharing for AI model training in medical imaging (6). However, it also presents unique challenges. First, data heterogeneity across different sites often violates the independent and identically distributed (IID) assumption, leading to issues such as poor model convergence, biased outcomes, and reduced generalization. These non-IID issues can stem from variations in imaging protocols, patient demographics, and disease prevalence across sites. Second, some studies have shown that private data can be extracted from the gradient updates communicated between federated learning sites (7). Methods such as differential privacy (8) and homomorphic encryption (9) have been proposed to enhance communication security; however, there may be an inherent trade-off between privacy preservation and model performance (10). The third challenge is uncertainty quantification, which involves measuring the AI’s confidence in its predictions (11). This is crucial for the trustworthiness and reliability of AI in clinical settings (11). Almost all AI models based on deep neural networks require output calibration for accurate uncertainty quantification (12). Due to the likelihood of non-IID data and potential class imbalance in datasets at client sites, traditional uncertainty quantification methods must be modified for federated learning models (13). Federated learning, with strong privacy preservation and uncertainty quantification, has the potential to revolutionize medical imaging through development of generalizable, robust, and trustworthy AI models using large-scale multi-institutional datasets. This work reviews state-of-the-art methods in federated learning, privacy-preserving federated learning (PPFL), and uncertainty quantification, outlining the potential of these advancements to transform medical imaging. The paper is organized as follows: Sections 2, 3, and 4 review federated learning, PPFL, and uncertainty quantification in federated learning, respectively. Section 5 covers the real-world applications of federated learning in medical imaging and summarizes the current challenges and opportunities. Figure 1 presents the organization of this review paper and Figure 2 presents the summary of the topics covered in this review. Readers are encouraged to refer to Supplementary Material (section 7) for more detailed technical aspects of


the various federated learning topics discussed in this paper. A GitHub repository with links to papers reviewed in this work is provided here: Awesome List. The primary contributions of this work include: • A review of the current state-of-the-art federated learning methods, from the past five years, for learning from distributed data while addressing non-IID datasets, privacy-preservation requirements, and uncertainty quantification challenges. • Exploration of five real-world use cases of federated learning in medical imaging and insights gained from these success stories. We also present current challenges in federated learning, PPFL, and uncertainty quantification related to medical imaging, along with potential opportunities for future research. Federated Learning Federated learning was originally proposed to train AI models on edge devices without exposing private data (14). This led to a paradigm shift in how machine learning (ML) models could be trained on sensitive and private data in distributed settings. The original federated learning algorithm, FedAvg, trains local models on client data and sends gradient information to a central server to create a global model that, theoretically, can outperform all local models (14). This section focuses on federated learning algorithms and presents state-of-the-art advancements. A summary of the topics is shown in Figure 3. Federated Learning Algorithms-Characterization and Types Federated learning can be classified as centralized or decentralized, depending on whether a central server is used to aggregate updates and construct the global model. Centralized federated learning is the more common approach, where a server orchestrates the learning process by collecting and combining client updates. In contrast, decentralized federated learning allows clients to communicate directly, which can be advantageous when a central server is impractical or undesirable due to privacy or connectivity constraints. Recently, personalized federated learning (PFL) has gained attention as an enhancement of traditional centralized federated learning (15). PFL addresses the inherent data heterogeneity among clients, such as variations in data distributions (non-IID data), computational resources, and specific local requirements. Instead of creating a single global model, PFL focuses on developing models tailored to individual clients while still leveraging shared knowledge across federated learning sites. PFL models are generally trained within a centralized federated learning framework. Given their unique approach to personalization and adaptation in heterogeneous environments, the PFL algorithms reviewed in this paper are presented in a separate section. Table 1 summarizes all federated learning algorithms discussed. Centralized Federated Learning As illustrated in Figure 3A, centralized federated learning requires a dedicated central server for parameter aggregation and constructing the federated model. It is the most common form of federated learning implemented for various ML tasks. These algorithms offer technical advancements for (1) learning from distributed, heterogeneous, and non-IID data, (2) optimizing learning for both global and local models to avoid catastrophic forgetting, and (3) stabilizing


training across federated runs, locally and globally, to ensure model convergence. Notable centralized federated learning algorithms in recent years include FedProX (16), FedBN (17), FedGen (18), Federated Online Laplace Approximation (FOLA) (19), Train Convexify Train (TCT) (20), Federated Cross-Correlation and Continual Learning (FCCL) (21), and FedFA (22). A technical summary of each algorithm can be found in Supplement S1. Decentralized Federated Learning Decentralized federated learning implementations do not rely a central server to coordinate learning, as shown in Figure 3B (23–25). Depending on the application, decentralized federated learning may enhance privacy and security while increasing robustness and fault tolerance by eliminating single points of failure. It also improves scalability by distributing workloads across the network. Decentralized federated learning methods such as Swarm Learning (23), ProxyFL (24), and Fog-FL (25) offer alternative approaches to conducting federated learning experiments when a centralized server is not practical. Additional information about these decentralized methods can be found in Supplement S2. Personalized Federated Learning (PFL) - Addressing Client Data Heterogeneity In multi-institutional collaborations, patient demographics often vary widely across sites, a challenge amplified by geographic separation. These demographic differences can lead to substantial variation in datasets used to train a federated learning model across sites (26). In some cases, this can prevent model convergence or lead to underperformance on local data. As shown in Figure 3C, PFL develops tailored models for clients to address the data heterogeneity across sites while leveraging shared learning within the federated learning network (15). Some PFL methods, like pFedBayes (26), FedPop (27), and Self-Aware PFL (28), use probabilistic techniques to mitigate the effects of high data heterogeneity. Other methods, such as FedAP, use batch normalization layers to enhance performance. Detailed information on PFL algorithms can be found in Supplement S3. Privacy-preserving Federated Learning (PPFL) Ensuring secure processing of protected and identifiable information is crucial in the medical field, where governing regulations strictly prohibit sharing patient data to prevent privacy breaches. Federated learning addresses this by keeping data localized at each site. However, privacy risks persist, as gradient updates exchanged between clients and the server can inadvertently reveal information about training data, leading to privacy leaks (1). In this section, we present several topics related to PPFL, as depicted in Figure 4 and Table 2. Additional PPFL methods are discussed in detail in Supplement S4. Differential Privacy One of the most popular methods for PPFL is differential privacy, which introduces noise into the gradients to prevent private information leakage (Fig 4A) (8). It provides mathematical guarantees of privacy preservation but potentially at the cost of model accuracy and convergence (10). Noising before aggregation federated learning (nbAFL), proposed by Wei et al ensures


differential privacy by adding artificial noise to the model parameters on the client side before aggregation, reducing the risk of privacy breaches (29). To optimize the trade-off between privacy and model performance, nbAFL employs a K-random scheduling technique, where K clients are randomly selected for each aggregation round, making it harder for attackers to extract useful information from the updates. The optimal value of K must be carefully determined to balance privacy and model convergence, a concept known as privacy budget allocation. Homomorphic Encryption As shown in Figure 4B, homomorphic encryption allows mathematical operations to be performed directly on encrypted data, producing encrypted results that, when decrypted, match the results as if the operations had been applied to the original plaintext data (30,31). This enables secure data sharing with third parties for processing, without exposing the underlying plaintext data. Somewhat homomorphic encryption is a subtype of homomorphic encryption that permits a limited number of arithmetic operations and is generally more efficient (32). Somewhat homomorphically encrypted federated learning was used to train models for brain tumor segmentation from MRIs and predict biomarkers from histopathology slides in colorectal cancer (33). These models achieved performance comparable to standard federated learning models while providing additional privacy guarantees, demonstrating that encryption does not necessarily compromise model accuracy (33). Notably, these methods encrypted only the vulnerable parts of the federated learning pipeline, resulting in less than a 5% increase in training time and computational cost. Uncertainty Quantification in Federated Learning Uncertainty quantification in deep learning refers to measuring how confident a model is in its predictions. This is particularly important in medical settings, where both prediction accuracy and reliability are critical for informed decision-making (34). Uncertainty quantification is vital for fostering trust, reliability, and user acceptance of an AI model (11,34). It plays a critical role in monitoring model performance post deployment and serves as an early warning system for potential performance degradation, enabling timely human intervention. Additionally, uncertainty quantification can inform decisions on whether to use personalized or global models, assist in detecting out-of-distribution samples, and support active learning during model training. However, in federated learning, uncertainty quantification faces unique challenges due to the non-IID nature of data across participating sites, which often exhibit differing distributions, class imbalances, and other site-specific issues. This section reviews various uncertainty quantification methods specifically designed to address these complexities. Figure 5 and Table 3 provide an overview of uncertainty quantification methods in federated learning, with additional information in Supplement S5. Uncertainty quantification methods can actively enhance federated learning performance in several ways. First, uncertainty-aware client selection can prioritize clients with high-confidence data, improving model convergence. Second, local uncertainty estimates can inform weighting during aggregation, mitigating the impact of low-quality or noisy updates. Third, uncertainty


quantification can enable robust deployment by flagging out-of-distribution inputs and guiding fallback mechanisms, such as human review. Finally, in personalized FL, uncertainty estimates can help balance global knowledge with local specialization, improving both generalizability and site-specific performance. Uncertainty Quantification Using Model Ensembling Model ensembling is a widely used uncertainty quantification method in federated learning, leveraging its distributed nature by treating multiple clients as an ensemble of models (Fig 5A) (13). Three key ensembling approaches are ensemble of local models, ensemble of global models, and ensemble based on multiple coordinators (13). The ensemble of local models prioritizes privacy and simplicity by treating each client’s model as an independent member, though it diverges from the collaborative nature of federated learning. The ensemble of global models preserves collaboration but increases computational and communication overhead due to repeated model training with different random seeds. The ensemble based on multiple coordinators improves scalability by distributing clients into subgroups with their coordinators but introduces coordination complexity and the risk of learning fragmentation. Fed-ensemble (35) further expands on these three approaches to address associated limitations. More information about ensembling is provided in Supplement S5. Uncertainty Quantification Using Conformal Prediction Conformal prediction is a statistical framework that provides a reliable confidence measure for predictions made by ML models (Fig 5B) (36). Conformal prediction defines a nonconformity measure to assess how different a new example is from previously seen data, generating prediction regions likely to contain the true label or value. Conformal prediction is particularly beneficial in federated learning; however, data heterogeneity among clients violates the assumption of exchangeability, which is fundamental to traditional conformal prediction methods. To address this, Lu et al introduced the concept of partial exchangeability and developed the federated conformal prediction framework. This framework which retains rigorous theoretical guarantees and demonstrates strong empirical performance across computer vision and medical imaging datasets, making it a practical solution for uncertainty quantification in heterogeneous federated learning environments (37). More details are provided in Supplement S5. Uncertainty Quantification Using Bayesian Federated Learning In Bayesian federated learning, shown in Figure 5D, each client learns a posterior probability distribution function (PDF) over its parameters (38,39). The learned PDF is communicated by the clients to the server to aggregate the local PDFs and learn a global PDF that can serve all the clients. The posterior PDF can be used for uncertainty quantification in the model’s output. Various approximation methods for approximating the posterior PDF, like MC-dropout and Stochastic Weight Averaging Gaussians (SWAG), have also been proposed (13).


Uncertainty Quantification and Model Output Calibration Uncertainty quantification methods assess and communicate how confident a model is in its predictions, which is crucial for reliable deployment and decision-making. While these methods directly quantify uncertainty in model outputs, model calibration corrects a model’s tendency to be overconfident, particularly due to the Softmax function, thus aligning predicted probabilities with actual performance (11). By calibrating the Softmax output, a more accurate assessment of the model’s confidence is achieved. Luo et al recently introduced the Classifier Calibration with Virtual Representations (CCVR) algorithm, which calibrates a global model to improve performance on non-IID data in heterogeneous settings (40). The authors found that posttraining calibration significantly improves classification accuracy across various federated learning algorithms and datasets (40). Another recently proposed method, Federated Calibration (FedCal), performs local and global calibration of models (41). Additional information on calibration methods for federated learning can be found in Supplement S5. While not a standalone uncertainty quantification method, model calibration is an important postprocessing technique that aligns predicted confidence scores (eg, softmax outputs) with empirical accuracy. By correcting for over-or under-confidence, particularly in the presence of non-IID data, calibration enhances the trustworthiness of model predictions. However, unlike methods such as Bayesian inference or conformal prediction, calibration does not directly estimate epistemic or aleatoric uncertainty. Federated Learning in Medical Imaging With growing research in the field, real-world applications of federated learning in medical imaging are beginning to demonstrate its clinical potential. This section presents federated learning implementation tools, real-world clinical case studies, and the future outlook of federated learning in medical imaging, including challenges and opportunities. Planning a Medical Imaging Federated Learning Project Implementing a federated learning project for medical imaging involves several key steps to ensure success and compliance with privacy standards set by participating institutions and government regulations. Success can be measured by validating a model that outperforms all local models. The implementation process begins by defining the specific medical imaging problem, such as disease classification, pixel-level segmentation of organs, or the identification of malignant masses in radiologic scans. The next step involves selecting the participating institutions, such as hospitals or imaging laboratories, and determining which site will act as the central server. Site selection is based on ability to collect and preprocess necessary training data needed, train the model, and share updates with the server site over the internet. After identifying collaborators, an appropriate federated learning software framework, such as NVIDIA FLARE, is selected and customized to meet the project’s specific needs. This customization may include implementing privacy-preserving techniques and uncertainty quantification algorithms, as well as configuring site-specific software for data loading and resultant storage. The ML model architecture, inputs,


and outputs are also determined at this stage before deploying the federated learning software framework at both the server and client sites. Before model training begins, each site must prepare its data according to standardized preprocessing and labeling steps agreed upon beforehand. The federated training process commences once the environment is fully set up, with both central server and client configurations in place. During this phase, each client trains the model locally and sends updates to the central server, which aggregates these updates and redistributes the updated model for further training. This iterative process continues until the model converges. If implemented, uncertainty quantification guides the training process. After training, the model is evaluated both locally at each site and globally across all sites to assess its performance. Upon achieving satisfactory results, the model is deployed for clinical use or further research, with ongoing monitoring to ensure its effectiveness. The uncertainty quantification data can guide the model selection process for deployment, allow users to monitor the system performance, and trigger a manual review of the model output if necessary. The entire process is thoroughly documented, and reports are prepared to share findings with the research community. Finally, the model is maintained and periodically updated with new data or improved algorithms, ensuring its relevance and accuracy over time, while collaboration between participating sites continues to drive ongoing learning and improvement. Federated Learning Implementation Tools To streamline the federated learning model training, validation, and deployment process, several open-source frameworks and software development kits have been developed. NVIDIA

(cont.) Federated Learning Application Runtime Environment (FLARE) は、よく知られたオープンソースのソフトウェア開発キット(42)です。NVIDIA-FLARE は、ディフェレンシャルプライバシーや同型暗号化を含む、さまざまな連合学習アルゴリズム、ワークフロー、およびプライバシー保護技術をサポートしています。OpenFL は、クライアントが暗号化されたチャネルを介して中央集約アグリゲートサーバーに接続する静的なネットワークトポロジーで動作する別のオープンソースの Python ライブラリです(43)。ワークフローは、実装前にすべてのサイト間で合意されたフェデレーションプランによって決定されます。もともと医学画像のために設計されましたが、OpenFL は他のアプリケーションにも適応可能です。FedBioMed は、連合学習の生物医学的アプリケーション向けに特別に設計された別のオープンソースフレームワークであり、分散トレーニングの管理、不均一なデータの処理、および生物医学研究におけるプライバシーとセキュリティの確保のためのツールとライブラリを提供しています(44)。Argonne Privacy-Preserving Framework (APPFL) は、シミュレーション設定において PPFL 実験のさまざまな側面を実装し、テストし、検証するためのツールを提供するオープンソースの Python パッケージです(45)。

医学画像における連合学習の研究 • 脳腫瘍のための Federated Tumor Segmentation (FeTS): FeTS-1.0 は、医学画像における最初の大規模な現実世界の連合学習の取り組みであり、データをローカルに保持しながら複数の地理的に異なる機関間でコンセンサスモデルをトレーニングするための最適な重み集約アプローチを特定することを目的としていました(46,47)。課題として提示された FeTS は、脳腫瘍セグメンテーションのためにトレーニングされた連合モデルの一般化可能性を、見知らぬ機関固有データに対して評価し、現実世界の医療環境における連合学習の可能性を示しました。これに基づき、FeTS-2.0 課題は膠芽腫検出のためのサンプル外一般化可能性に焦点を当て、歴史上最大規模の現実世界の医学的連合学習展開をオーケストレーションし、6 つの大陸にまたがる 71 のサイトが含まれました。この取り組みにより、これまでにない最大の膠芽腫データセットが作成され、6,314 人の患者を含んでいます。この連合学習モデルを活用することで、手術可能な標的となる腫瘍については 33%、全腫瘍範囲については 23% の区別精度の向上が、パブリックにトレーニングされたモデルと比較して達成されました。この課題は、医療におけるモデルパフォーマンスを向上させるための連合学習の可能性を実証し、さらなる研究への道を開きました。重要な洞察として、データ品質の問題はトレーニング後にのみ明らかになることが多く、それはモデルのパフォーマンスをパブリックにトレーニングされたモデルと比較することで明らかにされました。また、データ品質が不十分な場合、単により多くのデータを追加しても必ずしも顕著な改善につながらないことも観察されました。 このプロジェクトは、FedAvg アルゴリズム(14)を使用した中央集約連合学習を採用し、OpenFL フレームワーク(43)に基づいて構築されました。 • COVID-19 の転帰予測のための連合学習:Dayan らは、データを共有することなく世界中の 20 の異なる機関からの COVID-19 データでモデルをトレーニングするために連合学習を使用しました(48)。Electronic Medical Record CXR AI Model (EXAM) は、COVID-19 の患者の将来の酸素必要性を予測するために開発されました。このモデルは、転帰を予測する際に受信者動作特性曲線下の平均面積(AUC)が 0.92 を超える成果を達成しました。連合モデルは、個別の機関でトレーニングされたモデルと比較して、AUC で 16% の改善と一般化可能性で 38% の改善を提供しました。この研究は 4 つの大陸からのデータを組み込み、堅牢なモデルパフォーマンスを確保するために 3 つの独立したサイトで検証されました。EXAM フレームワークは、集約手法として FedAvg アルゴリズムを使用した中央集約連合学習を利用しました(14)。この研究ではまた、設定においてディフェレンシャルプライバシーを実装し、パフォーマンスを維持しながら強化されたプライバシーを確保できることを示しました。この研究は、連合学習の最も大規模な現実世界のアプリケーションの一つであり、大規模な医学 AI モデルトレーニングを可能にするその可能性を示すものです。強調された一つの限界は、データの分散型性質が、連合トレーニング結果を超えたさらなる分析を困難にすることでした。それでもなお、著者は、限られたデータリソースを持つ機関に対して高性能モデルを提供する連合学習の能力が、臨床応用における ML の進展にとって無価値であると強調しました。 • 乳がんのための ODELIA: 分散型医療人工知能のためのオープンコンソーシアム (ODELIA) は、Swarm Learning を通じて医療 AI を変革することを目的として 2023 年 1 月 1 日に発足した EU 資金による研究イニシアチブです(49)。Swarm Learning は、患者データを共有することなく共同モデルトレーニングを可能にし、医学研究におけるデータプライバシーの問題に対処します。5 年間にわたり、ODELIA はオープンソースの Swarm Learning フレームワークを開発し、それを適用して MRI スキャンで乳がんを検出するための AI アルゴリズムを作成することを計画しており、広範な分散データベースを利用します(49)。このアプローチは、AI の開発速度、パフォーマンス、および一般化可能性を向上させ、最終的にヨーロッパ全体における患者ケアの改善につながると期待されています。Swarm Learning を実装することにより、ODELIA は医療 AI におけるデータ収集の課題、特に倫理的および法的障壁がデータ共有を妨げるがんスクリーニングにおいて、これらの課題を克服しようとしています。コンソーシアムには、オーストリア、ドイツ、スペイン、ギリシャ、オランダ、ベルギー、スイス、そしてケンブリッジ大学(イギリス)からの研究所を含む、ヨーロッパ全体から 12 の学術および産業パートナーが含まれています。このフレームワークは、分散型 FL を実施するプロセスを合理化します。 • RACOON における現実世界の連合学習:ドイツ放射線科共同ネットワーク (RACOON) の研究者たちは、38 の病院が関与する全国イニシアチブとして、現実世界の連合学習実験を実施し、連合学習インフラストラクチャの開発および展開のための詳細なガイドを公開しました(50)。このガイドは、放射線科設定で連合学習を成功裡に実装するための主要なステップ、課題、および現在の解決策を概説しています。著者たちは、中央集約連合学習アプローチを使用して肺病理検出のセグメンテーションモデルをトレーニングするために、インフラストラクチャを 6 つの病院に展開しました。著者たちは、連合学習の追加の複雑さを正当化するために、ローカルモデルトレーニングやアンサンブルなどの単純な代替手段と比較してアプローチを行いました。このガイドはまた、連合学習ワークフローを設定するための組織構造、法的要件、実験デザイン、および評価戦略についても取り扱っています。 • 乳密度分類のための現実世界の連合学習:Roth らは、連合学習が現実世界の設定において従来のディープラーニング手法よりも優れていることを実証しました(51)。彼らの研究には、7 つの臨床機関からのデータを使用して乳密度分類のためのモデルをトレーニングすることが含まれていました。結果は、ローカルにトレーニングされたモデルと比較して平均 6.3% の改善を示し、外部テストデータで評価した際に一般化可能性において相対的に 45.8% の向上を示しました。この研究は、特にデータが限られている設定におけるモデルの一般化可能性を向上させるための連合学習の有効性について実証的な証拠を提供します。

課題と機会 私たちは、技術的およびアルゴリズムの観点から連合学習、PPFL、不確実性定量化の概要を示しました。さらに、連合学習プロジェクトを実装するための主要なステップを概説し、現実世界の医学画像タスクへのその応用を実証する 5 つの事例研究レビューを行いました。近年の大幅な進歩にもかかわらず、連合学習はまだ初期段階にあります。連合学習が医学画像における ML モデル開発の標準的なアプローチとなるためには、いくつかの課題に対処する必要があります。これらの課題は、研究者がこの分野における連合学習の状態をさらに探索し改善するための機会を提供します。

  1. 行政的課題:連合学習プロジェクトを実装する前に、参加するすべての機関からの利害関係者との関与が不可欠です。これらの利害関係者には通常、研究者、医学画像チーム、情報技術およびサイバーセキュリティ専門家、契約および合意管理チーム、そして病院管理者が含まれます。これらのグループへの関与は、技術的実装から法的・倫理的考慮事項に至るまでプロジェクトのすべての側面に対処することを保証します。患者データのプライバシーとセキュリティに関する規制基準、特に HIPAA や GDPR に関連するものへの準拠を確保するために、関連する機関審査委員会または倫理委員会の倫理的承認を取得する必要があります。さらに、機関間の「重み共有」に関する正式な合意を確立しなければなりません。これらの合意は、モデルの重みが「プレーン」形式か「暗号化」形式で共有されるかを概説し、データセキュリティおよびプライバシー法(HIPAA や GDPR など)への準拠に関連する懸念に対処する必要があります。これらの合意にはまた、各機関の責任、データガバナンス、データ転送プロトコル、およびデータ侵害が発生した場合の予備計画を明記すべきです。連合学習プロジェクト開始前にこれらすべての課題を包括的に対処することは、円滑なコラボレーションを確保し、関係者間の信頼を維持するために不可欠です。

  2. アノテーション付きデータセットの必要性:連合学習がアノテーション付きデータの必要性を排除するわけではないことを認識することが重要です。各参加サイトは、ローカルモデルをトレーニングするためのデータセットを作成および注釈付けるために依然として多大なリソースを投資する必要があります。連合学習コミュニティは、自己教師あり学習、アクティブ学習、継続的学習、転移学習における進行中の作業に連合環境で構築し拡張すべきです。有望な研究分野の一つには、多様で臨床的に関連性のあるデータセットを作成するために生成 AI モデルを使用することが含まれます。しかし、その可能性にもかかわらず、AI 生成画像の臨床的有用性を支持する現在の証拠は限られています。これは、連合モデルをトレーニングするためのそれらの効果を検証するためのさらなる研究の必要性を浮き彫りにしています。

  3. プライバシーとパフォーマンスのトレードオフ:連合学習におけるもう一つの重要な課題は、プライバシーとモデルパフォーマンスの間にある本質的なトレードオフです(29)。プライバシー保護を強化しながらモデルの有効性を損なわない方法でプライバシー予算を効率的に割り当てるために、さらなる研究が必要です。代替ノイズタイプおよびノイズ注入方法を探索することは、ディフェレンシャルプライバシーの有効性を向上させるための有望な方向を提供します。さらに、同型暗号化および部分的同型暗号化を含む暗号化方法は、暗号化モデルと非暗号化モデル間のパフォーマンスギャップを最小限に抑えるために連合学習設定に適応する必要があります。同時に、サーバーとクライアント間の通信効率も、連合学習アルゴリズムの全体的な有効性を評価する際の重要な考慮事項です。

  4. PFL におけるパーソナライゼーション versus 一般化:PFL は、個々のクライアントの特定のニーズに合わせてモデルを調整するという利点を提供し、ローカルデータ上でのパフォーマンス向上につながることができます。しかし、このパーソナライゼーションは、過学習および一般化可能性の低下を含む課題をもたらす可能性があります。これらは連合学習が通常維持しようとする要因です。連合実行中に計算されたモデルの重みに関する不確実性情報を統合することは、PFL モデルが学習を最適化し、一般化を強化するのに役立つ可能性があります。不確実性定量化ガイド付き PFL は、連合知識を効果的に捉えつつローカルデータ上で良好に機能する、より一般化可能でパーソナライズされたモデルを生み出す可能性があります。さらに、適合的予測に基づく不確実性定量化方法は、まだ開発の初期段階ですが、PFL モデルの一般化可能性とパーソナライゼーションをさらに向上させるための有望な見通しを示しています。

  5. 連合学習における不確実性定量化のための計算要件:計算効率は、アンサンブルおよびベイズアプローチにおいて特に、連合モデルの不確実性定量化における未解決の課題です。モデルアンサンブルは、異なる初期化シードで複数のモデルをトレーニングする必要があり、時間とリソースを要します。同様に、ベイズ連合学習では、モデルの重みに対する PDF を表現するために追加パラメータを持つローカルモデルをトレーニングする必要があり、計算負荷がさらに増加します。計算効率的かつスケーラブルな不確実性定量化方法を発展させることは、連合学習における不確実性定量化の普及を促進するでしょう。

  6. 展開後のパフォーマンスモニタリング:不確実性定量化手法を使用して、分布外およびノイズデータを識別するための展開後パフォーマンスモニタリングは、重要だが比較的新しい研究領域です。不確実性定量化はモデルのパフォーマンスの監視を可能にし、人間のループアプローチを通じてモデルのパフォーマンス低下の原因を診断し対処することを可能にします。このプロセスは、即座の問題を解決するだけでなく、フラグされたデータを後のトレーニングサイクルに取り入れることで将来のモデル改善にも貢献します。前述したように、古典的な不確実性定量化手法を洗練させ、連合学習の固有の課題に対して最適化するための大きな機会があります。

結論 連合学習は、研究および臨床の両方の設定において医療画像ワークフローを大幅に改善する可能性があります。集中型、分散型、パーソナライズド連合学習(Personalized Federated Learning)のアプローチが、さまざまな医療課題に対処するために開発されています。機密性の高い患者データを共有することなく機関間で共同モデルトレーニングを可能にすることで、連合学習は重要なプライバシーおよびセキュリティの懸念に対処しつつ、多様なデータセットを活用してモデルのパフォーマンスと一般化可能性を向上させます。ディフェレンシャルプライバシー(differential privacy)、同型暗号化(homomorphic encryption)、その他のハイブリッドアプローチなどの強化されたプライバシー保護技術は、さらにデータセキュリティを強化します。連合学習における不確実性定量化を組み込んだ継続的な研究は、より信頼性の高い AI モデルの開発を支援することを目的としています。この分野における継続的な学際的な取り組みと技術的進展により、医療画像ワークフローのさらなる合理化、精密医療イニシアチブのサポート、そして最終的に世界中の医療提供および患者アウトカムの改善が期待されています。

著者所属: 3 南フロリダ大学電気工学部、4202 E Fowler Ave, Tampa, FL 33620-9951 4 ローワン大学電気・コンピュータ工学科、Glassboro, NJ

受信日 XXX;改訂要求日 XXX;改訂受理日 XXX;受理日 XXX。


連絡先:N.K.(メール:Koutsoubis@usf.edu) 資金:著者は本研究に対する資金を宣言していません。 利益相反の開示:N.K. 関連する関係なし。A.W. 関連する関係なし。Y.Y. 関連する関係なし。R.P.R. 関連する関係なし。M.B.S. 関連する関係なし。G.R. NSF 助成金、2234468, 2234836;NIH 助成金、U01CA200464;フロリダ州 Bankhead Coley 研究プログラム(21B12)。

References References

  1. Pati S, Kumar S, Varma A, et al. Privacy preservation for federated learning in healthcare. Patterns (N Y) 2024;5(7):100974.
  2. Wiggins WF, Magudia K, Schmidt TMS, et al. Imaging AI in Practice: A Demonstration of Future Workflow Using Integration Standards. Radiol Artif Intell 2021;3(6):e210152.
  3. Monti CB, van Assen M, Stillman AE, et al. Evaluating the performance of a convolutional neural network algorithm for measuring thoracic aortic diameters in a heterogeneous population. Radiol Artif Intell 2022;4(2):e210196.
  4. Darzidehkalani E, Ghasemi-Rad M, van Ooijen PMA. Federated Learning in Medical Imaging: Part II: Methods, Challenges, and Considerations. J Am Coll Radiol 2022;19(8):975–
  5. Kaissis GA, Makowski MR, Rückert D, Braren RF. Secure, privacy-preserving and federated machine learning in medical imaging. Nat Mach Intell 2020;2(6):305–311.
  6. Zhang F, Kreuter D, Chen Y, et al; BloodCounts! consortium. Recent methodological advances in federated learning for healthcare. Patterns (N Y) 2024;5(6):101006.
  7. Jere MS, Farnan T, Koushanfar F. A Taxonomy of Attacks on Federated Learning. IEEE Secur Priv 2021;19(2):20–28.
  8. Dwork C. Differential Privacy. In: Bugliesi M, Preneel B, Sassone V, Wegener I, eds. Automata, Languages and Programming. ICALP 2006. Lecture Notes in Computer Science, Vol
  9. Springer, 2006; 1–12.
  10. Gentry C. A Fully Homomorphic Encryption Scheme. PhD thesis. Stanford University, 2009. https://crypto.stanford.edu/craig/craig-thesis.pdf.
  11. Xu L, Jiang C, Qian Y, Li J, Zhao Y, Ren Y. Privacy-Accuracy Trade-Off in Differentially- Private Distributed Classification: A Game Theoretical Approach. IEEE Trans Big Data 2017;7(4):770–783.
  12. Dera D, Bouaynaya NC, Rasool G, Shterenberg R, Fathallah-Shaykh HM. PremiUm-CNN: Propagating Uncertainty Towards Robust Convolutional Neural Networks. IEEE Trans Signal Process 2021;69:4669–4684.
  13. Ahmed S, Dera D, Hassan SU, Bouaynaya N, Rasool G. Failure detection in deep neural networks for medical imaging. Front Med Technol 2022;4:919046.

  1. Linsner F, Adilova L, Däubener S, Kamp M, Fischer A. Approaches to Uncertainty Quantification in Federated Deep Learning. In: Kamp M, Koprinska I, Bibal A, et al, eds. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, Vol 1524. Springer, 2021; 128–145.
  2. McMahan B, Moore E, Ramage D, Hampson S, Aguera y Arcas B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In: Singh A, Zhu J, eds. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. PMLR 2017;54:1273–
  3. https://proceedings.mlr.press/v54/mcmahan17a.html.
  4. Lu W, Wang J, Chen Y, et al. Personalized Federated Learning with Adaptive Batchnorm for Healthcare. IEEE Trans Big Data 2024;10(6):915–925.
  5. Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V. Federated Optimization in Heterogeneous Networks. In: Proceedings of Machine Learning and Systems 2020;2:429–450. https://proceedings.mlsys.org/paper_files/paper/2020/file/1f5fe83998a09396ebe6477d9475ba0c- Paper.pdf.
  6. Li X, Jiang M, Zhang X, Kamp M, Dou Q. FedBN: Federated Learning on Non-IID Features via Local Batch Normalization. In: International Conference on Learning Representations. 2021. https://openreview.net/pdf?id=6YEQUn0QICG.
  7. Zhu Z, Hong J, Zhou J. Data-Free Knowledge Distillation for Heterogeneous Federated Learning. Proc Mach Learn Res 2021;139:12878–12889.
  8. Khan H, Bouaynaya NC, Rasool G. Brain-Inspired Continual Learning: Robust Feature Distillation and Re-Consolidation for Class Incremental Learning. IEEE Access 2024;12:34054–
  9. Yu Y, Wei A, Karimireddy SP, Ma Y, Jordan M. TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv Neural Inf Proc Syst 2022;35:30882–30897.
  10. Huang W, Ye M, Du B. Learn from Others and Be Yourself in Heterogeneous Federated Learning. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022; 10133–10143.
  11. Zhou T, Zhang J, Tsang DHK. FedFA: Federated Learning with Feature Anchors to Align Features and Classifiers for Heterogeneous Data. IEEE Trans Mobile Comput 2024;23(6):6731–
  12. Warnat-Herresthal S, Schultze H, Shastry KL, et al; Deutsche COVID-19 Omics Initiative (DeCOI). Swarm learning for decentralized and confidential clinical machine learning. Nature 2021;594(7862):265–270.
  13. Kalra S, Wen J, Cresswell JC, Volkovs M, Tizhoosh HR. Decentralized Federated Learning through Proxy Model Sharing. Nat Commun 2023;14(1):2899.

  1. Butt M, Tariq N, Ashraf M, et al. A Fog-Based Privacy-Preserving Federated Learning System for Smart Healthcare Applications. Electronics 2023;12(19):4074.
  2. Zhang X, Li Y, Li W, Guo K, Shao Y. Personalized Federated Learning via Variational Bayesian Inference. In: Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Proceedings of the 39th International Conference on Machine Learning Vol 162 of Proceedings of Machine Learning Research. PMLR 2022;162:26293–26310. https://proceedings.mlr.press/v162/zhang22o.html.
  3. Kotelevskii N, Vono M, Durmus A, Moulines E. FedPop: A Bayesian Approach for Personalised Federated Learning. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Advances in Neural Information Processing Systems, Vol 35. Curran Associates, 2022; 8687–8701. https://proceedings.neurips.cc/paper_files/paper/2022/file/395409679270591fd2a70abc694cf5a1 -Paper-Conference.pdf.
  4. Chen H, Ding J, Tramel EW, et al. Self-aware personalized federated learning. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Advances in Neural Information Processing Systems, Vol 35. Curran Associates, 2022; 20675–20688. https://proceedings.neurips.cc/paper_files/paper/2022/file/8265d7bb2db42e86637001db2c46619 f-Paper-Conference.pdf.
  5. Wei K, Li J, Ding M, et al. Federated Learning With Differential Privacy: Algorithms and Performance Analysis. IEEE Trans Inf Forensics Secur 2020;15:3454–3469.
  6. Dhiman S, Nayak S, Mahato GK, Ram A, Chakraborty SK. Homomorphic Encryption based Federated Learning for Financial Data Security. In: 2023 4th International Conference on Computing and Communication Systems (I3CS). IEEE, 2023; 1–6.
  7. Stripelis D, Saleem H, Ghai T, et al. Secure neuroimaging analysis using federated learning with homomorphic encryption. In: SPIE Medical Imaging 2021. https://www.spiedigitallibrary.org/conference-proceedings-of-spie/12088/2606256/Secure- neuroimaging-analysis-using-federated-learning-with-homomorphic- encryption/10.1117/12.2606256.short.
  8. Acar A, Aksu H, Uluagac AS, Conti M. A Survey on Homomorphic Encryption Schemes: Theory and Implementation. ACM Comput Surv 2019;51(4):1–35.
  9. Truhn D, Tayebi Arasteh S, Saldanha OL, et al. Encrypted federated learning for secure decentralized collaboration in cancer image analysis. Med Image Anal 2024;92:103059.
  10. Dera D, Ahmed S, Bouaynaya NC, Rasool G. TRustworthy Uncertainty Propagation for Sequential Time-Series Analysis in RNNs. IEEE Trans Knowl Data Eng 2023;36(2):882–896.
  11. Shi N, Lai F, Al Kontar R, Chowdhury M. Fed-ensemble: Ensemble Models in Federated Learning for Improved Generalization and Uncertainty Quantification. IEEE Trans Automat Sci Eng 2024;21(3):2792–2803.

  1. Gammerman A, Vovk V, Vapnik V. Learning by transduction. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, UAI’98. Morgan Kaufmann, 1998; 148–155.
  2. Lu C, Yu Y, Karimireddy SP, Jordan MI, Raskar R. Federated conformal predictors for distributed uncertainty quantification. In: Proceedings of the 40th International Conference on Machine Learning, ICML’23. PMLR 2023;202:22942–22964. https://proceedings.mlr.press/v202/lu23i.html.
  3. Bhatt S, Gupta A, Rai P. Federated Learning with Uncertainty via Distilled Predictive Distributions. In: Yanıkoğlu B, Buntine W, eds. Proceedings of the 15th Asian Conference on Machine Learning Vol 222 of Proceedings of Machine Learning Research. PMLR 2024;222:153–168. https://proceedings.mlr.press/v222/bhatt24a.html.
  4. Al-Shedivat M, Gillenwater J, Xing E, Rostamizadeh A. Federated Learning via Posterior Inference: A New Perspective and Practical Algorithms. In: International Conference on Learning Representations. https://openreview.net/forum?id=GFsU8a0sGB.
  5. Luo M, Chen F, Hu D, Zhang Y, Liang J, Feng J. No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data. Advances in Neural Information Processing Systems 2021;34:5972–5984. https://proceedings.neurips.cc/paper_files/paper/2021/file/2f2b265625d76a6704b08093c652fd79 -Paper.pdf.
  6. Peng H, Yu H, Tang X, Li X. FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler. In: Salakhutdinov R, Kolter Z, Heller K, et al, eds. Forty-first International Conference on Machine Learning Vol 235 of Proceedings of Machine Learning Research. PMLR. https://openreview.net/forum?id=XecUTmB9yD.
  7. Roth HR, Cheng Y, Wen Y, et al. NVIDIA FLARE: Federated Learning from Simulation to Real-World. arXiv 2210.13291 [preprint] https://arxiv.org/abs/2210.13291. Posted October 24,
  8. Updated April 28, 2023. Accessed DATE.
  9. Foley P, Sheller MJ, Edwards B, et al. OpenFL: the open federated learning library. Phys Med Biol 2022;67(21):214001.
  10. Cremonesi F, Vesin M, Cansiz S, et al. Fed-BioMed: Open, Transparent and Trusted Federated Learning for Real-world Healthcare Applications. arXiv 2304.12012 [preprint] https://arxiv.org/abs/2304.12012. Posted April 24, 2023. Accessed DATE.
  11. Ryu M, Kim Y, Kim K, Madduri RK. APPFL: Open-Source Software Framework for Privacy-Preserving Federated Learning. In: 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2022; 1074–1083.
  12. Pati S, Baid U, Zenk M, et al. The Federated Tumor Segmentation (FeTS) Challenge. arXiv 2105.05874 [preprint] https://arxiv.org/abs/2105.05874. Posted May 12, 2021. Updated May 14,
  13. Accessed DATE.

  1. Pati S, Baid U, Edwards B, et al. Federated learning enables big data for rare cancer boundary detection. Nat Commun 2022;13(1):7346.
  2. Dayan I, Roth HR, Zhong A, et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat Med 2021;27(10):1735–1743.
  3. ODELIA. New research project odelia launches to revolutionise artificial intelligence in healthcare using swarm learning. https://odelia.ai/. Accessed June 17, 2024.
  4. Bujotzek MR, Akünal Ü, Denner S, et al. Real-world federated learning in radiology: hurdles to overcome and benefits to gain. J Am Med Inform Assoc 2025;32(1):193–205.
  5. Roth HR, Chang K, Singh P, et al. Federated Learning for Breast Density Classification: A Real-World Implementation. In: Albarqouni S, Bakas S, Kamnitsas K, et al, eds. Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. DART DCL 2020 2020. Lecture Notes in Computer Science, Vol 12444. Springer, 2020; 181–191.
  6. Liu L, Jiang X, Zheng F, et al. A Bayesian Federated Learning Framework with Online Laplace Approximation. IEEE Trans Pattern Anal Mach Intell 2024;46(1):1–16.
  7. Truex S, Baracaldo N, Anwar A, et al. A Hybrid Approach to Privacy-Preserving Federated Learning. AISec’19. Association for Computing Machinery, 2019; 1–11.
  8. Xu R, Baracaldo N, Zhou Y, Anwar A, Kadhe S, Ludwig H. DeTrust-FL: Privacy-Preserving Federated Learning in Decentralized Trust Setting. In: 2022 IEEE 15th International Conference on Cloud Computing (CLOUD). IEEE, 2022; 417–426.
  9. Qi T, Wu F, Wu C, He L, Huang Y, Xie X. Differentially Private Knowledge Transfer for Federated Learning. Nat Commun 2023;14(1):3785.
  10. So J, E. Ali R, Güler B, Jiao J, Avestimehr AS. Securing secure aggregation: mitigating multi-round privacy leakage in federated learning. AAAI 2023;37(8):9864–9873.
  11. Wang T, Yang Q, Zhu K, Wang J, Su C, Sato K. LDS-FL: Loss Differential Strategy Based Federated Learning for Privacy Preserving. IEEE Trans Inf Forensics Secur 2024;19:1015–1030.
  12. Plassier V, Makni M, Rubashevskii A, Moulines E, Panov M. Conformal prediction for federated uncertainty quantification under label shift. In: Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proceedings of the 40th International Conference on Machine Learning Vol 202 of Proceedings of Machine Learning Research. PMLR 2023;202:27907–27947. https://proceedings.mlr.press/v202/plassier23a.html.
  13. Makhija D, Ghosh J, Ho N. Privacy Preserving Bayesian Federated Learning in Heterogeneous Settings. arXiv 2306.07959 [preprint] https://arxiv.org/abs/2306.07959. Posted June 13, 2023. Accessed DATE.
  14. Sanderson BM. Uncertainty Quantification in Multi-Model Ensembles. In: Oxford Research Encyclopedia of Climate Science.

https://oxfordre.com/climatescience/display/10.1093/acrefore/9780190228620.001.0001/acrefore -9780190228620-e-707. 61. Krizhevsky A, Nair V, Hinton G. CIFAR-10 (Canadian Institute for Advanced Research). https://academictorrents.com/details/463ba7ec7f37ed414c12fbb71ebf6431eada2d7a. 62. Krizhevsky A. Learning multiple layers of features from tiny images. University of Toronto. http://www.cs.toronto.edu/~kriz/cifar.html. Accessed 5, 13. 63. Kuznetsova A, Rom H, Alldrin N, et al. The Open Images Dataset V4. Int J Comput Vis 2020;128:1956–1981.

Figure 1: Organization of the review paper. The figure outlines the structure of the paper, beginning with an introduction to federated learning (FL) in medical imaging. It progresses through the classification of FL algorithms into centralized, decentralized, and personalized (PFL) categories, followed by discussions on privacy-preserving methods and uncertainty quantification (UQ). The review concludes with applications of FL in medical imaging, including real-world use cases, challenges, and opportunities. This visual representation highlights the interconnected topics covered in the review and provides readers with a clear roadmap for understanding the paper’s flow and content.


Figure 2: An overview of Federated Learning (FL), Privacy Preserving Federated Learning (PPFL), and uncertainty quantification (UQ) is presented. Combining FL with strong privacy preservation and uncertainty quantification methods can help the medical imaging community develop large-scale Mult institutional AI models that are truly generalizable, robust, and trustworthy.


Figure 3: An overview of federated learning (FL) algorithm types is presented. (A) In centralized FL, sites train a local model and pass the learned information to a central server to generate the global model, the global model is then passed to the local sites for further training. (B) Decentralized FL removes the need for a central server allowing for direct communication between sites. (C) Personalized FL leverages a central server while making a specific model for each site. Having a personalized model at each site is ideal in FL deployments with high data heterogeneity.


Figure 4: A summary of privacy-preserving FL (PPFL) methods is presented. (A) Differential Privacy (DP) involves the addition of artificial noise into other gradient information before it is communicated, hindering the ability of an attacker to extract useful information. (B) Homomorphic Encryption (HE) allows for mathematical operations to be performed on encrypted cyphertexts, and then once decrypted the results are as if the math was performed on plaintext. HE is useful in situations where the central server cannot be trusted. (C) Various other methods of PPFL include hybrid approaches of DP and HE, knowledge transfer, secure aggregation framework with multiround privacy, loss differential strategies, and decentralized trust. More information about these other PPFL approaches is described in Supplement S4.


Figure 5: A summary of uncertainty quantification (UQ) methods in Federated Learning (FL is presented. (A) Model ensembling refers to the process of training various models; the final result is the average of their predictions. (B) Conformal Prediction (CP) is a method of UQ that provides a set of possible predictions, where the more uncertain the model is the more possible predictions it will provide. (C) Model calibration is a postprocessing reliability enhancement technique that adjusts predicted confidence scores to better reflect true correctness likelihood. While not a direct uncertainty quantification method, it improves the trustworthiness of model outputs by mitigating overconfidence, especially in misclassified predictions, and aligns predicted probabilities with actual observed frequencies. (D) Bayesian FL is another method of UQ that tracks the variance of the model during training and at inference time. The variance will go up as the model becomes more uncertain providing a measure of model uncertainty.


Table 1 List and Characteristics of Federated Learning (FL) Algorithms Algorithm Central Server Local Forgetting Summary FedAvg (14) ü X Train local models across various clients and then average the gradient updates at the central server to update the global mode; first proposed method of FL. FedProx (16) ü X Excels in heterogeneous settings; generalization of the FedAvg algorithm; allows for partial updates to be sent to the server instead of simply dropping them from a federated round; adds proximal term that prevents any one client from having too much of an impact on the global model. FedBN (17) ü X Addresses the issue of non-IID data by leveraging batch normalization; follows a similar procedure to Fed-Avg but assumes local models have batch norm layers and excludes their parameters from the averaging step. FedGen (18) ü X Learns a generator model on the server to ensemble user models’ predictions, creating augmented samples that encapsulate consensual knowledge from user models; generate augmented samples that are shared with users to regularize local model training, leading to better accuracy and faster convergence. FOLA (52) ü ü Bayesian federated learning framework utilizing online Laplace approximation to address local catastrophic forgetting and data heterogeneity; maximizes the posteriors of the server and clients simultaneously to reduce aggregation error and mitigate local forgetting. Swarm Learning (23) X ü Model parameters are shared via a swarm network, and the model is built independently on private data at the individual sites; only preauthorized clients are allowed to execute transactions; on-boarding new clients can be done dynamically. TCT (20) ü ü Train-Convexify-Train: Learn features with an off-the-shelf method (ie, Fe-davg) and then optimize a convexified problem obtained using the model’s empirical neural tangent kernel approximation; involves two stages where the first stage learns useful features from the data, and the second stage learns to use these features to generate a well- performing model. FedAP (15) ü X Learns similarities between clients by calculating distances between batch normalization layer statistics obtained from a pretrained model; these similarities are used to aggregate client models; each client preserves its batch normalization layers to maintain personalized features; the server aggregates client model parameters weighted by client similarities in a personalized manner to generate a unique final model for each client. pFedBays (26) ü X Weight uncertainty is introduced in client and server neural networks; to achieve personalization, each client updates its local distribution parameters by balancing its construction error over private data. FCCL (21) ü ü Federated cross-correlational and continual learning uses unlabeled public data to address heterogeneity across models and non-IID data, enhancing model generalizability; constructs a cross-correlation matrix on model outputs to encourage class invariance and diversity; employs knowledge distillation, utilizing both the updated global model and the trained local model to balance interdomain and intradomain knowledge to mitigate local forgetting. Self-FL (28) ü ü Self-aware personalized FL method that uses intraclient and interclient uncertainty estimation to balance the training of its local personal model and global model. Fedpop (27) ü X Each client has a local model composed of fixed population parameters that are shared across clients, as well as random effects that explain heterogeneity in the local data. FedFA (22) ü X Feature anchors are used to align features and calibrate classifiers across clients simultaneously; this enables client models to be updated in a shared feature space with consistent classifiers during local training.


ProxyFL (24) ü X Clients maintain two models, a private model that is never shared and a publicly shared proxy model that is designed to preserve patient privacy; proxy models allow for efficient information exchange among clients without needing a centralized server; clients can have different model architectures. FogML (25) X X Fog computing nodes reside on the local area networks of each site; fog nodes can preprocess data and aggregate updates from the locally trained models before transmitting, reducing data traffic over sending raw data. Note.—IID = independent and identically distributed.

Table 2 List of PPFL Algorithms having Differential Privacy (DP), Homomorphic Encryption (HE) Algorithm DP HE Summary Hybrid Approach (53) ü ü Combining DP with secure multiparty computation enables this method to reduce the growth of noise injection as the number of parties increases without sacrificing privacy; the trust parameter allows for maintaining a set level of trust.
NbAFL (29) ü X Noising before aggregation FL (NbAFL) Uses K-random scheduling to optimize the privacy and accuracy trade-off by introducing artificial noise into the parameters of each client before aggregation.
DeTrust-FL (54) X X Provides secure aggregation of model updates in a decentralized trust setting; implements a decentralized functional encryption scheme where clients collaboratively generate decryption key fragments based on an agreed participation matrix.
SHEFL (33) ü ü Somewhat homomorphically encrypted FL (SHEFL); only communicating encrypted weights; all model updates are conducted in an encrypted space.
PrivateKT (55) ü X Private knowledge transfer method that uses a small subset of public data to transfer knowledge with local DP guarantee; selects public data points based on informativeness rather than randomly to maximize the knowledge quality.
Multi- RoundSecAgg (56) ü X Provides privacy guarantees over multiple training rounds; develops a structured user section strategy that guarantees the long-term privacy of each use.
LDS-FL (57) X X Maintain the performance of a private model preserved through parameter replacement with multiuser participation to reduce the efficiency of privacy attacks.

  • Note.—DP = differential Privacy, FL = Federated Learning.

Table 3 Uncertainty Quantification Methods in Federated Learning Algorithm CP Dist Pred Bayes Cal Summary CCVR (40) X X X ü Classifier calibration with Virtual Representation (CCVR) Found a greater bias in representations learned in the deeper layers of a model trained with FL; they show that the classifier contains the greatest bias toward local client data and that classification performance can be greatly improved with posttraining classifier calibration Fed-ensemble (35) X X X X Extends ensembling methods to FL; characterizes uncertainty in predictions by using the variance in the predictions as a measure of knowledge uncertainty. DP-fedCP (58) ü X X X Differentially Private Federated Average Quantile Estimation (DP-fedCP); the method is designed to construct personalized CP sets in an FL scenario. FCP (37) ü X X X Federated CP, a framework for extending CP to FL that addresses the non- IID nature of data in FL. FedPPD (38) X ü X X Framework for FL with uncertainty, where, in every round, each client infers the posterior distribution over its parameters and the posterior predictive distribution (PPD); PPD is sent to the server. FedBNN (59) X X ü X FL framework based on training a customized local Bayesian model for each client. FedCal (41) X X X ü Performs local and global calibration of models. FedCAL uses client-specific

References (cont.) parameters for local calibration to effectively correct out-put misalignment without sacrificing prediction accuracy. Values are then aggregated via weight averaging to minimize global calibration error Note.—CP = Conformal Prediction, Dist Pred = Distilled Prediction, Bayes = Bayesian, Cal = Calibration.


Supplementary Material S1 Centralized FL Algorithms In the following, we provide a chronological list of centralized FL algorithms. • FedProx: FedProx is a generalization of the original FedAvg algorithm (16). The two distinguishing features of the FedProX include (1) allowing partial updates to be sent to the server instead of dropping them from a federated round and (2) adding a proximal term to prevent any client from contributing too much to the global model, thereby increasing model stability. • FedBN: FedBN leverages batch normalization (batch-norm) to reduce the effect of non-IID data (17). FedBN follows a similar architecture to FedAvg, involving the transmission of local updates and their aggregation on a central server, but it treats the batch-norm parameters as site-specific and excludes them from the averaging process. • FedGeN: Knowledge distillation is an emerging approach in FL that addresses data heterogeneity by extracting and sharing knowledge from an ensemble of client models (18). FedGeN employs a data-free method for knowledge distillation in FL and has demonstrated better accuracy and faster convergence in heterogeneous data settings, particularly in medical imaging tasks like multiorgan segmentation (18). • FOLA: Local catastrophic forgetting is a significant challenge in FL, where local models lose specific knowledge of their data when updated with global model weights, similar to issues in continual learning (19). To address this, the Federated Online Laplace Approximation (FOLA) algorithm combines Bayesian principles with an online approximation approach to estimate probabilistic parameters for both global and local models, reducing aggregation errors and mitigating local forgetting (52). • Train-Convexify-Train: Heterogeneous, specifically, non-convex data, where the relationship between variables does not form a convex shape in the feature space, could lead to local models with different optima, making it difficult for the global model to converge (20). Train-Convexify-Train procedure tackles this by first using FedAvg to learn features and then refining the model, resulting in up to 37% accuracy improvement on heterogeneous data. • FCCL: Federated Cross-Correlation and Continual Learning (FCCL) addresses local forgetting by using unlabeled public data to construct a cross-correlation matrix on model logit outputs, promoting generalizable representations across non-IID data (21). FCCL balances knowledge retention by employing knowledge distillation, where the global model helps retain interdomain information, and the local model preserves intradomain information, reducing the risk of catastrophic forgetting during local updates. • FedFA: FedFA was proposed to address the data heterogeneity challenge using feature anchors to align features and calibrate classifiers across clients simultaneously (22). This enables local models to be updated in a shared feature space with consistent classifiers during local training. The FedFA algorithm encompasses a server- side component where both class feature anchors and the global model undergo aggregation. S2 Decentralized FL Algorithms In the following, we present some recent decentralized FL algorithms. • Swarm Learning: integrates edge computing with blockchain-based networking, making it possible to coordinate an FL run without a central server (23). This approach leverages decentralized hardware and distributed ML with blockchain to securely manage site onboarding, leader election, and parameter aggregation. Sharing model parameters through a swarm network enables independent model training on private data at individual sites. Security


and confidentiality are ensured through the blockchain’s restricted execution to preauthorized clients and dynamic onboarding of new participants. • ProxyFL: ProxyFL enhances communication efficiency by using proxy models for information exchange, allowing clients to maintain private models that are never shared (24). This approach supports model heterogeneity, enabling each client to have a unique model architecture while ensuring privacy through DP techniques. ProxyFL outperformed existing methods with reduced communication overhead and stronger privacy protections (24). • Fog-FL: Fog-FL enhances computing efficiency and reliability by utilizing a decentralized fog computing infrastructure that operates between the data source and the cloud, bringing compute, storage, and networking services closer to the network edge where data are generated (25). S3 Personalized FL (PFL) Algorithms • FedAP: FedAP identifies similarities between clients by analyzing the batch-norm layer statistics from a pretrained model and uses these similarities to guide the aggregation process (15). Each client retains its batch-norm layers to preserve personalized features, while the server aggregates model parameters based on client similarities to create unique models for each site. FedAP has demonstrated over 10% improvement in accuracy and faster convergence compared with state-of-the-art FL algorithms across diverse health care datasets (15). • pFedBayes: Personalized FL via Bayesian inference (pFedBayes) integrates Bayesian variational inference and weight uncertainty to mitigate model overfitting and improve personalization by minimizing construction error on private data. pFedBayes also minimizes Kullback–Leibler divergence with the global distribution from the server (26). This method allows each client to refine its local model by balancing the accuracy on its private data with alignment to the global distribution. • FedPop: FedPop addresses the challenges of FL, including their struggle with personalization in cross-silo and cross-device settings, especially for new clients or those with limited data and a lack of UQ (27). FedPop integrates population modeling with fixed parameters and random effects to explain data heterogeneity and introduces federated stochastic optimization algorithms based on the Monte Carlo Markov chain. This results in increased robustness to client drift, better inference for new clients, and enables UQ with minimal computational overhead. • Self-Aware PFL: A key challenge in PFL is balancing the improvement of local models with global model tuning, especially when personal and global objectives differ. Inspired by Bayesian hierarchical models, self-aware PFL introduces a self-aware method that allows clients to automatically balance local and global training based on interclient and intraclient UQ (28). The method employs uncertainty-driven local training and aggregation, replacing conventional fine-tuning techniques. S4 Other PPFL Methods In addition to Differential Privacy (DP) and Homomorphic Encryption (HE), many other methods have been proposed in conjunction with the aforementioned methods to preserve privacy in FL, as follows: • Hybrid Approach: Truex et al combined DP with Secure Multiparty Computation (SMC) to balance the trade-off between data privacy and model accuracy (53). Their method mitigates the noise growth that typically increases with the number of parties in DP-based FL systems while maintaining a predefined level of trust. A tunable trust parameter, t, specifies the minimum number of honest, parties required for the system to function securely. As t decreases, indicating less trust, more noise is added by each honest party to guard against potential colluders.


• PrivateKT: PrivateKT leverages DP to implement private knowledge transfer using a small subset of public data selected based on their information content (55). PrivateKT involves three steps: (i) knowledge extraction, where clients use private data to make predictions on selected public data; (ii) knowledge exchange, where DP is applied to these predictions before sending them to the central server; and (iii) knowledge aggregation, where the server aggregates these predictions into a knowledge buffer. PrivateKT also uses importance sampling to focus on data with higher uncertainty, enhancing knowledge quality and a knowledge buffer to store past aggregated predictions. PrivateKT reduced the performance gap with centralized learning by up to 84% under a strict privacy budget. • Multi-RoundSecAgg: Traditional secure aggregation methods in FL focus on preserving privacy in a single training round (56). However, this can lead to significant privacy leaks over multiple rounds due to partial user selection. Multi-RoundSecAgg addresses this issue by introducing a secure aggregation framework with multiround privacy guarantees, employing a structured user selection strategy that ensures long-term privacy while maintaining fairness and participation balance (56). • Loss Differential Strategy for Parameter Replacement (LDS-FL): LDS-FL implements PPFL by maintaining the performance of a private model through selective parameter replacement among multiple participants (57). LDS-FL introduces a public participant that shares parameters, enabling private participants to construct loss differential models that resist privacy attacks. The authors demonstrated that LDS-FL provides robust privacy guarantees against membership attacks, reducing attack accuracy by over 10% while only slightly impacting model accuracy, making it a strong alternative to DP and HE (57). • DeTrust-FL: DeTrust-FL offers a decentralized solution to enhance privacy by securely aggregating model updates without relying on a central server (54). It addresses vulnerabilities to inference attacks, such as disaggregation, through a decentralized functional encryption scheme. in this approach, clients generate decryption key fragments using a transparent participation matrix. Additionally, DeTrust-FL employs batch partitioning to prevent attacks and encrypts model updates with round labels to stop replay attacks, achieving state-of-the-art communication efficiency and reducing dependency on centralized trust entities. S5 Uncertainty Quantification • Model Ensembling: Model ensembling is a popular uncertainty estimation method and involves running inference with an ensemble of models and taking the average (60). This naturally extends to FL because of the distributed nature of the FL setup involving multiple clients that can serve as multiple models. The approach in (13) Integrates multiple ensembling methods into an uncertainty estimation framework for FL. The variations of FL ensembling used include (13): – Ensemble of local models: This method is a naive way of incorporating deep ensemble-based uncertainty estimation into FL (13). This method treats each worker’s local model as an ensemble member. Not all the workers communicate with the coordinator, which leads to a m number of separately trained models. These models are then used for final prediction. However, the main idea of FL is lost here due to the lack of communication (13). – Ensemble of global models: In this approach, the idea of FL is preserved, however computational overhead is increased (13). Each worker trains S ML models with different random initialization seeds to train each model. For each S model, an FL workflow is executed. This can quickly become computationally expensive as S increases (13). – Ensemble based on multiple coordinators: These methods split the workers into subgroups and assign a coordinator to each subgroup (13). FL is carried out as normal among the subgroups, and the outputs of each subgroup are averaged to produce the final prediction. Each method presents advantages and challenges, necessitating careful consideration when used in FL applications in real-world settings. The ensemble of local models emphasizes privacy


and simplicity by treating each worker’s model as an independent ensemble member. While this approach maximizes data privacy and is straightforward to implement, it diverges from the collaborative essence of FL. It may result in inconsistent model performance due to isolated training environments. Conversely, the ensemble of global models aligns with the collaborative learning principle of FL, enhancing model robustness by integrating diverse perspectives. However, this method significantly increases computational and communication demands, posing scalability challenges as the number of clients grows. The third approach, employing multiple coordinators, offers improved scalability by distributing the workload and tailoring learning strategies within subgroups. However, this method introduces additional complexity in coordination and risks learning fragmentation across subgroups. To navigate these trade-offs, considering hybrid or adaptive ensembling strategies that balance computational efficiency with the benefits of collaborative learning could be beneficial. Ultimately, selecting an ensembling method should be guided by the application’s specific needs, including privacy requirements, available computational resources, and data heterogeneity. • Fed-ensemble (35): It extends ensembling methods for FL using a different approach. Instead of aggregating local models to update a single global model, this method uses random permutations to update a group of K models and obtains predictions using model averaging. This method imposes no additional computational costs and can readily be utilized within established FL algorithms. The authors empirically show that the proposed approach performs superior to other methods on many datasets. It also excels in heterogeneous settings, which is consistent with many FL applications like medical imaging. Fed-ensemble can characterize uncertainty in predictions by using the variance in the predictions as a measure of knowledge uncertainty. Shi et al (35) propose performing ensemble FL that updates K models over local datasets. Point predictions are obtained by model averaging. The authors show that the Fed-ensemble excels at uncertainty quantification when tested on CIFAR-10 (61), CIFAR-100 (62), MNIST, and the OpenImagesv4 dataset (63) in both homogeneous and heterogeneous settings. Using NTK, they show that predictions at new data points from all K models converge to samples from the same, limiting the Gaussian process in sufficiently over-parameterized regimes (35). The server sends one of the K models to every client in each training round to train on local data. The server then aggregates the updated model from each client; this way, the burden on clients is not increased, and all K models eventually see all the clients’ data. To obtain uncertainty predictions in an ensemble of models, the sample variance can be used to estimate the uncertainty. Fed-ensemble can appropriately characterize knowledge uncertainty on regions without labeled data. Fed-ensemble enhances existing FL techniques by systematically quantifying uncertainty and increasing model capacity without raising communication costs. Unlike Fedavg, which tends to be overconfident in predictions, Fed- ensemble offers convergence guarantees and effectively manages data heterogeneity through ensembling, outperforming methods that rely on strong regularizers. • Federated Conformal Prediction (FCP): FCP is a method for extending CP to FL that addresses the non- IID nature of data in FL (37). The heterogeneity of the FL datasets violates the fundamental tenet of exchangeability in CP. This Tenet implies that the calibration and test data have identical distributions. To solve this violation, the authors propose using partial exchangeability, which is a generalization of exchangeability (37). FCP makes no assumptions between clients P1, …, PK. Specifically, this assumption does not require independence or identical distributions among clients. FCP provides rigorous theoretical guarantees and high empirical performance on several computer vision and medical imaging datasets (37). • Classifier Calibration with Virtual Representation (CCVR): CCVR calibrates a global model to improve Rperformance on non-IID data in heterogeneous settings (40). The authors found a greater bias in representations learned in the deeper layers of a model trained with FL. They show that the classifier contains the greatest bias and that postcalibration can greatly improve classification performance. Specifically, the classifiers learned on different clients show the lowest feature similarity. The classifiers tend to get biased toward the classes


over-represented in the local client data, leading to poor performance in under-represented classes. This classifier bias is a key reason behind performance degradation on non-IID federated data. Regularizing the classifier during federated training brings minor improvements (40). However, posttraining calibration of the classifier significantly improves classification accuracy across various FL algorithms and datasets (40). CCVR generates virtual representations using Gaussian probability distributions fitted on client feature statistics. CCVR then retrains the classifier on these virtual representations while fixing the feature extractor. Experimental results show state-of-the- art accuracies on common benchmark datasets like CIFAR-10. CCVR is built on top of the off-the-shelf feature extractor and requires no transmission of the representations of the original data, thus raising no additional privacy concerns.