dailymachinelearning

By James Asher

Daily summaries of the latest Machine Learning research papers from Arxiv.

2025-05-01 • Found 44 papers

A 3D Pocket-Aware and Affinity-Guided Diffusion Model for Lead Optimization

Anjie Qiao, Junjie Xie, Weifeng Huang, Hao Zhang, Jiahua Rao, Shuangjia Zheng, Yuedong Yang, Zhen Wang, Guo-Bo Li, Jinping Lei
  • Diffleop is a 3D diffusion-based generative model that incorporates binding affinity as a guiding factor for molecular optimization.
  • The model uses an E(3)-equivariant graph neural network (EGNN) to predict and guide binding affinity during the denoising process.
  • Bond diffusion is introduced to improve the chemical realism of generated molecules by incorporating covalent bond information.
  • Diffleop outperforms baseline models in binding affinity, molecular quality, and drug-like properties.
  • The model eliminates the need for post-processing, streamlining the molecular optimization workflow.
Read More
Abstract
This paper introduces Diffleop, a novel 3D pocket-aware and affinity-guided diffusion model designed for molecular lead optimization in drug discovery. Diffleop addresses key limitations in existing 3D generative models by explicitly incorporating binding affinity as a guiding factor during the molecular generation process. The model employs an E(3)-equivariant graph neural network (EGNN) to predict binding affinities and guide the denoising process, ensuring the generation of molecules with enhanced binding affinity and realistic chemical structures. Additionally, Diffleop introduces bond diffusion, which incorporates covalent bond information directly into the generative process, eliminating the need for post-processing and improving the quality of generated molecules. Experimental evaluations demonstrate that Diffleop outperforms baseline models across multiple metrics, particularly in binding affinity and drug-like properties, making it a promising tool for accelerating drug discovery.
Methodology
Diffleop employs a forward diffusion process where Gaussian noise is added to atom coordinates and discrete noise is applied to atom and bond types. During the reverse denoising process, an E(3)-equivariant graph neural network (EGNN) predicts binding affinities and guides the generation of molecules with optimized properties. Bond diffusion is incorporated by introducing fake bond types and performing diffusion on fully connected molecular graphs, ensuring chemically realistic outputs.
Results
Diffleop demonstrated superior performance compared to baseline models across various metrics, particularly in binding affinity optimization. The generated molecules exhibited improved drug-like properties and realistic chemical structures. The incorporation of bond diffusion and affinity guidance significantly enhanced the quality and relevance of the generated molecules.
Implications
Diffleop has the potential to accelerate the drug discovery process by automating and improving molecular lead optimization. Its ability to generate high-affinity, chemically realistic molecules could reduce reliance on traditional, labor-intensive methods and enable the discovery of novel drug candidates with greater efficiency.
View on arXiv

A Brief Review for Compression and Transfer Learning Techniques in DeepFake Detection

Andreas Karathanasis, John Violos, Ioannis Kompatsiaris, Symeon Papadopoulos
  • Model compression techniques like pruning, KD, and quantization can significantly reduce computational demands while preserving accuracy in deepfake detection.
  • Transfer learning methods, including fine-tuning and adapter-based approaches, enable efficient adaptation of pre-trained models to specific deepfake detection tasks.
  • High compression levels (up to 90%) are achievable without significant accuracy loss in same-domain scenarios, but domain generalization remains a challenge.
  • The study provides a comparative evaluation of compression and transfer learning techniques using three datasets: Synthbuster, RAISE, and ForenSynths.
  • Edge computing environments benefit from these techniques, enabling real-time deepfake detection with reduced latency and energy consumption.
Read More
Abstract
This paper explores the integration of model compression and transfer learning techniques to address the computational and memory constraints of deploying deepfake detection models on edge devices. The authors investigate methods such as pruning, knowledge distillation (KD), quantization, fine-tuning, and adapter-based techniques to reduce model size and computational demands while maintaining detection accuracy. Using three datasets—Synthbuster, RAISE, and ForenSynths—the study evaluates the effectiveness of these techniques in scenarios where training and validation data originate from the same deepfake model, as well as in cross-domain settings where the testing data comes from unseen deepfake models. The results demonstrate that high levels of compression (up to 90%) can be achieved without significant performance degradation in same-domain settings. However, the study highlights challenges in domain generalization, where performance drops when models are tested on unseen deepfake datasets. The paper provides a comparative analysis of the techniques and discusses their trade-offs, offering insights into resource-efficient deepfake detection for edge computing applications.
Methodology
The authors evaluate a baseline deepfake detection model and apply three compression techniques—pruning with fine-tuning, knowledge distillation without fine-tuning, and quantization—as well as six transfer learning approaches, including fine-tuning and adapter-based methods. Experiments are conducted on three datasets (Synthbuster, RAISE, and ForenSynths) to assess performance under same-domain and cross-domain conditions. Metrics such as accuracy and inference time are used to compare the effectiveness of these techniques.
Results
The study finds that compression techniques can achieve up to 90% reduction in model size without significant accuracy loss in same-domain settings. Transfer learning methods effectively adapt models to specific tasks, but domain generalization remains a challenge, with performance dropping when tested on unseen deepfake datasets. Quantization and pruning are particularly effective for reducing computational demands, while fine-tuning helps restore performance after compression.
Implications
The findings have significant implications for deploying deepfake detection models on resource-constrained edge devices, enabling real-time detection while preserving data privacy. The study also highlights the need for further research into domain generalization techniques to improve robustness against unseen deepfake models. These advancements could enhance the accessibility and scalability of deepfake detection in applications such as content moderation, digital forensics, and privacy protection.
View on arXiv

A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces

Juliana Barbosa, Ulhas Gondhali, Gohar Petrossian, Kinshuk Sharma, Sunandan Chakraborty, Jennifer Jacquet, Juliana Freire
  • Wildlife trafficking is increasingly facilitated by online marketplaces, leaving digital traces that can be analyzed to combat illegal trade.
  • The proposed method uses LLMs to generate pseudo-labels for a small, diverse sample of ads, reducing labeling costs while maintaining high accuracy.
  • Specialized classifiers trained on pseudo-labeled data achieve up to 95% F1 scores, outperforming direct LLM labeling approaches.
  • The approach supports diverse research questions by enabling the creation of tailored classifiers for specific wildlife trafficking scenarios.
  • Real-world use cases demonstrate the method's effectiveness in identifying ads for various wildlife products, enabling broader analyses of trafficking activities.
Read More
Abstract
This paper addresses the challenge of identifying wildlife trafficking activities in online marketplaces, a critical issue that impacts biodiversity, ecological stability, and public health. The rise of e-commerce platforms has facilitated the illegal trade of wildlife products, leaving digital traces that can be analyzed to disrupt trafficking networks. However, identifying relevant ads from the vast number of online listings is akin to finding a needle in a haystack. Traditional machine learning approaches require extensive labeled data, which is costly and time-consuming to generate. The authors propose a cost-effective strategy leveraging large language models (LLMs) to generate pseudo-labeled data for a small, representative sample of ads. These pseudo-labels are then used to train specialized classifiers tailored to specific research questions. The method minimizes labeling costs while maintaining high accuracy. Experimental results demonstrate that the proposed classifiers achieve up to 95% F1 scores, outperforming direct LLM-based labeling at a fraction of the cost. The paper also presents real-world use cases, such as identifying ads for animal-derived products, small leather goods, and shark-related items, showcasing the method's versatility and effectiveness.
Methodology
The authors propose a two-step approach: (1) Use LLMs to generate pseudo-labels for a small, representative sample of ads, selected based on the research question. (2) Train specialized machine learning classifiers on the pseudo-labeled data to identify wildlife-related ads at scale. The method includes an automated sampling strategy to ensure diversity and relevance in the labeled data.
Results
The proposed classifiers achieved up to 95% F1 scores, significantly outperforming direct LLM-based labeling in terms of both accuracy and cost-efficiency. The approach was validated through real-world use cases, demonstrating its ability to identify wildlife trafficking ads across various product categories and research questions.
Implications
This method provides a scalable and cost-effective solution for monitoring wildlife trafficking in online marketplaces, enabling researchers and policymakers to gain deeper insights into trafficking networks. It has the potential to support conservation efforts, inform regulatory actions, and disrupt illegal wildlife trade by leveraging digital traces left by traffickers.
View on arXiv

A Generalized Meta Federated Learning Framework with Theoretical Convergence Guarantees

Mohammad Vahid Jamali, Hamid Saber, Jung Hyun Bae
  • The paper extends meta federated learning to optimize the average loss after an arbitrary number of fine-tuning steps, improving model personalization for heterogeneous data distributions.
  • A variant of the FedAvg algorithm is proposed, with theoretical guarantees on convergence speed and behavior of the meta loss functions.
  • The framework supports practical first-order and Hessian-free approximations to reduce computational complexity.
  • Experiments on real-world datasets show improved accuracy and faster convergence compared to conventional meta FL methods.
  • The approach is particularly suited for personalized federated learning in settings with significant data heterogeneity.
Read More
Abstract
This paper introduces a generalized framework for meta federated learning (FL) that extends conventional approaches by optimizing the average loss of agents after an arbitrary number of fine-tuning steps, rather than just one. This approach addresses challenges in highly heterogeneous data distributions across agents, where multiple fine-tuning steps are often required for effective model adaptation. The authors propose a variant of the federated averaging (FedAvg) algorithm tailored to this framework and provide a comprehensive theoretical convergence analysis. The analysis characterizes the convergence speed and behavior of the meta loss functions under both exact and approximated conditions. Experimental results on real-world datasets demonstrate that the proposed framework achieves superior accuracy and faster convergence compared to traditional meta FL methods, highlighting its potential for personalized federated learning in practical scenarios.
Methodology
The authors generalize the meta FL objective function to account for multiple fine-tuning steps and derive the corresponding gradient updates. They propose a modified FedAvg algorithm and analyze its convergence properties under both exact and approximated conditions. The framework incorporates practical approximations, such as first-order and Hessian-free methods, to reduce computational overhead. Theoretical results are complemented by empirical evaluations on real-world datasets.
Results
The proposed framework achieves superior accuracy and faster convergence compared to conventional meta FL approaches, as demonstrated through experiments on real-world datasets. The theoretical analysis provides bounds on convergence speed and characterizes the behavior of the meta loss functions under various conditions.
Implications
This framework has significant implications for personalized federated learning, particularly in scenarios with highly heterogeneous data distributions. It enables more effective model adaptation for individual agents, improving performance in applications such as edge computing, healthcare, and IoT, where data privacy and personalization are critical.
View on arXiv

A Hamiltonian Higher-Order Elasticity Framework for Dynamic Diagnostics (2HOED)

Ngueuleweu Tiwang Gildas
  • 2HOED extends classical Hamiltonian mechanics to analyze time-varying elasticities in complex systems, incorporating position, velocity, acceleration, and jerk.
  • The framework provides interpretable energy metrics (e.g., System Power, Inertia) that reveal systemic dynamics, resilience, and tipping points.
  • 2HOED is computationally efficient, requiring only rolling regressions and derivatives, making it accessible without large datasets or black-box models.
  • The methodology is domain-agnostic and applicable to diverse fields, including economics, climate science, epidemiology, and logistics.
  • The framework complements existing tools like econometrics, machine learning, and causal inference by offering dynamic, real-time diagnostics and policy insights.
Read More
Abstract
This paper introduces the Hamiltonian Higher-Order Elasticity Dynamics (2HOED) framework, a novel, domain-agnostic methodology that integrates principles of classical mechanics into the analysis of complex systems. By extending the Hamiltonian formalism to include higher-order elasticity terms (position, velocity, acceleration, and jerk), 2HOED provides a dynamic, energy-based diagnostic tool for understanding systemic behavior across disciplines such as economics, climate science, epidemiology, and supply chain logistics. The framework transforms time-varying elasticities into interpretable energy metrics like System Power, Inertia, and Marginal Response, enabling early detection of tipping points, resilience thresholds, and feedback loops. Unlike traditional econometric models or machine learning approaches, 2HOED is computationally lightweight, transparent, and requires minimal data, making it accessible for real-time diagnostics. The paper demonstrates the utility of 2HOED through an application to the Kuznets environmental theory, analyzing the relationship between CO2 emissions and GDP growth. The framework bridges gaps between econometrics, machine learning, and causal inference, offering a unified approach to dynamic diagnostics and policy design.
Methodology
The 2HOED framework begins by estimating a time-varying elasticity (e.g., CO2 vs GDP) as a 'position' variable. Successive derivatives (velocity, acceleration, jerk) are computed empirically, and these are embedded into a Hamiltonian energy function. This yields interpretable metrics such as System Power, Marginal Response, and Sensitivity to shocks. The approach relies on rolling regressions and simple mathematical operations, avoiding the need for large datasets or complex machine learning models. The framework is illustrated using the Kuznets environmental theory to analyze GDP-CO2 dynamics.
Results
The application of 2HOED to the Kuznets environmental theory revealed dynamic insights into the relationship between CO2 emissions and GDP growth. Peaks in System Power indicated stored systemic stress, while jerk spikes highlighted impending regime shifts. The framework successfully identified leverage points for policy intervention and provided a transparent, real-time diagnostic map of the system's energy dynamics.
Implications
2HOED has broad implications across disciplines, offering a scalable and interpretable tool for dynamic diagnostics. It can be used to anticipate crises, design adaptive policies, and engineer robust systems in fields such as economics, climate science, epidemiology, and supply chain management. By bridging gaps between econometrics, machine learning, and causal inference, the framework enables decision-makers to better understand and influence complex systems in real time.
View on arXiv

A Survey on Parameter-Efficient Fine-Tuning for Foundation Models in Federated Learning

Jieming Bian, Yuanzhe Peng, Lei Wang, Yin Huang, Jie Xu
  • PEFT methods enable efficient fine-tuning of foundation models by updating only a small subset of parameters, reducing computational costs.
  • The survey categorizes PEFT methods into Additive, Selective, and Reparameterized approaches, analyzing their use in federated learning settings.
  • Federated Learning provides a privacy-preserving framework for collaborative model fine-tuning across distributed clients without sharing raw data.
  • The integration of PEFT and FL addresses challenges such as data heterogeneity, communication overhead, and computational constraints.
  • Future research directions include scaling PEFT for larger models, theoretical analysis of federated PEFT, and sustainable methods for resource-limited environments.
Read More
Abstract
This survey explores the integration of Parameter-Efficient Fine-Tuning (PEFT) techniques with Federated Learning (FL) to adapt large-scale foundation models for specific downstream tasks while addressing computational and privacy challenges. PEFT methods, which selectively update a small subset of model parameters, are categorized into three main types: Additive PEFT, Selective PEFT, and Reparameterized PEFT. The paper systematically reviews how these methods are applied in federated settings to tackle issues such as data heterogeneity, communication efficiency, computational constraints, and privacy concerns. It also organizes the literature based on application domains, including natural language processing (NLP) and computer vision (CV). The survey concludes with a discussion of open research directions, such as scaling PEFT for larger models, conducting theoretical analyses, and developing sustainable approaches for resource-constrained environments.
Methodology
The paper conducts a systematic review of existing literature on PEFT methods and their application in federated learning. It categorizes PEFT techniques into three types: Additive (e.g., adapter and prompt tuning), Selective (fine-tuning subsets of parameters), and Reparameterized (e.g., LoRA). The survey also organizes studies based on application domains, such as NLP and CV, and discusses how these methods address FL-specific challenges.
Results
The survey highlights that PEFT methods can achieve performance comparable to full fine-tuning while significantly reducing computational costs. It also demonstrates that integrating PEFT with FL effectively addresses privacy concerns and enables collaborative model adaptation in distributed settings. The review identifies gaps in the literature, such as the need for scalable and sustainable approaches for larger foundation models.
Implications
The integration of PEFT with FL has significant implications for privacy-sensitive domains like healthcare, finance, and law, where collaborative model fine-tuning is essential but data sharing is restricted. The findings also pave the way for more efficient and scalable methods to adapt foundation models in resource-constrained environments, potentially democratizing access to advanced AI capabilities.
View on arXiv

A comparative study of deep learning and ensemble learning to extend the horizon of traffic forecasting

Xiao Zheng, Saeed Asadi Bagloee, Majid Sarvi
  • Long-term traffic forecasting (up to 30 days) is challenging due to the need to model periodicity, long-range dependencies, and error accumulation.
  • Time embeddings significantly enhance model performance by capturing seasonality and event-related patterns.
  • RNNs with time embeddings outperformed the state-of-the-art Transformer-based Informer model by 31.1% for 30-day-ahead forecasts.
  • XGBoost, an ensemble learning method, demonstrated competitive performance with DL models while being computationally efficient.
  • The study provides insights into the impact of factors like input sequence length, holiday traffic, data granularity, and training data size on forecasting accuracy.
Read More
Abstract
This paper investigates the performance of machine learning (ML) methods for long-term traffic forecasting, focusing on forecasting horizons of up to 30 days. The study compares ensemble learning (eXtreme Gradient Boosting, XGBoost) and deep learning (DL) methods, including Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Transformer-based models like Informer. The authors emphasize the challenges of long-term forecasting, such as capturing long-range dependencies, modeling periodicity, and mitigating error accumulation. Using real-world traffic datasets from signalized arterials and freeways with two years of training data and one year of evaluation data, the study highlights the importance of time embeddings for incorporating seasonality and event factors. Experimental results reveal that while Transformer-based models excel at capturing long-range dependencies, simpler RNNs with time embeddings outperform them for 30-day forecasts. XGBoost, despite its simplicity, performs competitively with DL methods. The paper also explores the effects of input sequence length, holiday traffic, data granularity, and training data size, providing insights for improving long-term traffic forecasting.
Methodology
The study evaluates one ensemble learning method (XGBoost) and several deep learning methods (RNN, LSTM, and Transformer-based Informer) on real-world traffic datasets. Time embeddings are used to incorporate timestamp and holiday features. The models are trained on two years of data and evaluated on one year of data. The experiments analyze the effects of input sequence length, holiday traffic, data granularity, and training data size.
Results
RNNs with time embeddings outperformed Informer by 31.1% for 30-day forecasts, demonstrating the importance of periodicity modeling. XGBoost performed competitively with DL methods despite its simplicity. The study also highlighted the importance of time embeddings in improving long-term forecasting accuracy. Computational costs and other factors like data granularity and input sequence length were also analyzed.
Implications
The findings provide valuable guidance for developing more effective long-term traffic forecasting models. The insights on time embeddings and periodicity modeling can inform future research and practical applications in Intelligent Transportation Systems (ITS). The competitive performance of XGBoost suggests that simpler, computationally efficient models can be viable alternatives to deep learning for certain forecasting tasks.
View on arXiv

ABG-NAS: Adaptive Bayesian Genetic Neural Architecture Search for Graph Representation Learning

Sixuan Wang, Jiao Yin, Jinli Cao, Mingjian Tang, Hua Wang, Yanchun Zhang
  • ABG-NAS introduces a Comprehensive Architecture Search Space (CASS) to explore diverse propagation and transformation operations in GNNs.
  • The Adaptive Genetic Optimization Strategy (AGOS) dynamically balances exploration and exploitation for efficient architecture search.
  • The Bayesian-Guided Tuning Module (BGTM) optimizes hyperparameters periodically, improving scalability and robustness.
  • ABG-NAS consistently outperforms manually designed GNNs and existing NAS methods on benchmark datasets.
  • The framework is designed to handle diverse and complex graph structures, enhancing generalizability across tasks.
Read More
Abstract
ABG-NAS is a novel framework for automating the design of graph neural network (GNN) architectures to improve graph representation learning. The framework addresses the limitations of existing GNNs, such as their inability to adapt to diverse and complex graph structures, by introducing three key components: a Comprehensive Architecture Search Space (CASS), an Adaptive Genetic Optimization Strategy (AGOS), and a Bayesian-Guided Tuning Module (BGTM). CASS expands the search space to include diverse propagation and transformation operations, enabling the discovery of architectures that can handle intricate graph characteristics. AGOS employs a dynamic balance between exploration and exploitation to ensure efficient and diverse architecture search. BGTM periodically optimizes hyperparameters, enhancing scalability and robustness. Empirical evaluations on benchmark datasets (Cora, PubMed, Citeseer, and Cora-Full) demonstrate that ABG-NAS outperforms manually designed GNNs and state-of-the-art neural architecture search (NAS) methods, showcasing its potential to advance graph representation learning.
Methodology
ABG-NAS combines evolutionary algorithms and Bayesian optimization to automate GNN architecture search. The Comprehensive Architecture Search Space (CASS) allows for flexible combinations of propagation and transformation operations. The Adaptive Genetic Optimization Strategy (AGOS) ensures efficient search by balancing exploration and exploitation. The Bayesian-Guided Tuning Module (BGTM) periodically adjusts hyperparameters to improve performance and scalability.
Results
ABG-NAS demonstrated superior performance on benchmark datasets (Cora, PubMed, Citeseer, and Cora-Full), outperforming both manually designed GNNs and state-of-the-art NAS methods. The results highlight its ability to generate scalable and adaptive architectures that generalize well across diverse graph structures.
Implications
ABG-NAS has the potential to significantly advance graph representation learning by automating the design of GNN architectures that are adaptable to diverse and complex graph structures. This can benefit a wide range of applications, including node classification, link prediction, and subgraph search, in domains such as social networks, biological networks, and recommendation systems.
View on arXiv

Artificial Intelligence for Personalized Prediction of Alzheimer's Disease Progression: A Survey of Methods, Data Challenges, and Future Directions

Gulsah Hancerliogullari Koksalmis, Bulent Soykan, Laura J. Brattain, Hsin-Hsiung Huang
  • AI techniques like RNNs, GNNs, and digital twins are being leveraged for personalized prediction of AD progression.
  • Data challenges, including high dimensionality, missing data, and class imbalance, hinder progress but can be mitigated using synthetic data generation methods like VAEs and GANs.
  • Multimodal data integration and model interpretability are critical for advancing personalized AD prediction models.
  • Open challenges include robust external validation, clinical integration, and addressing ethical concerns in AI applications.
  • Future research directions include hybrid modeling approaches, causal inference, and federated learning to enhance model robustness and applicability.
Read More
Abstract
This paper provides a comprehensive survey of artificial intelligence (AI) methodologies for personalized prediction of Alzheimer's Disease (AD) progression. The authors highlight the critical need for individualized models due to the high variability in AD progression across patients. The survey categorizes key AI approaches, including state-space models for temporal dynamics, deep learning techniques like Recurrent Neural Networks (RNNs) for sequence modeling, Graph Neural Networks (GNNs) for leveraging relational data, and AI-driven digital twins for personalized simulations. The paper also addresses significant data challenges, such as high dimensionality, missing data, and class imbalance, and explores AI-driven solutions like synthetic data generation using Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). The authors emphasize the importance of multimodal data integration, model interpretability, and generalizability. They also identify open challenges, including external validation, clinical integration, and ethical considerations, while proposing future directions such as hybrid models, causal inference, and federated learning.
Methodology
The paper reviews and categorizes AI methodologies for personalized AD progression prediction, focusing on state-space models, RNNs, GNNs, and digital twins. It also examines data augmentation techniques like VAEs and GANs to address dataset limitations. The survey synthesizes existing research to identify trends, strengths, and gaps in the field.
Results
The survey consolidates knowledge on AI methods for AD progression prediction, highlighting the strengths and limitations of current approaches. It underscores the growing trend toward multimodal data integration and the need for interpretable and generalizable models. No new experimental results are presented, as this is a review paper.
Implications
The findings have significant implications for developing clinically relevant AI tools to improve personalized care planning and prognosis for AD patients. The survey also provides a roadmap for future research, emphasizing the need for robust, ethical, and clinically integrated AI solutions.
View on arXiv

Capturing Conditional Dependence via Auto-regressive Diffusion Models

Xunpeng Huang, Yujin Han, Difan Zou, Yian Ma, Tong Zhang
  • AR diffusion models are theoretically shown to better capture conditional dependencies in data compared to vanilla diffusion models.
  • Theoretical analysis establishes that AR diffusion models achieve KL divergence convergence under mild assumptions, with only a moderate increase in computational complexity.
  • Empirical results validate that AR diffusion models outperform vanilla diffusion models in scenarios with clear conditional dependence structures.
  • When no conditional dependence exists, AR diffusion models do not exhibit significant advantages over vanilla diffusion models.
  • The study bridges theoretical insights with practical performance, providing guidance for future applications of AR diffusion models.
Read More
Abstract
This paper investigates the limitations of vanilla diffusion models, such as DDPM, in capturing conditional dependence structures in data, which are critical for modeling high-level relationships like physical laws or object stability. The authors propose leveraging auto-regressive (AR) diffusion models to address this issue. They provide the first theoretical analysis of AR diffusion models, demonstrating that these models achieve better convergence to the true conditional distributions compared to vanilla diffusion models. The study shows that AR diffusion models reduce the gap in approximating data conditional distributions while maintaining practical inference times. Empirical results on synthetic datasets confirm that AR diffusion models effectively capture conditional dependencies when such structures are present in the data, whereas vanilla diffusion models fail to do so. However, when no clear conditional dependence exists, AR diffusion models do not outperform vanilla models.
Methodology
The authors analyze AR diffusion models by formulating their training process and deriving theoretical guarantees for their convergence to the true conditional distributions. They compare the KL divergence behavior of AR diffusion models and vanilla diffusion models, showing that AR models avoid divergence blow-ups. The study also evaluates the computational efficiency of AR diffusion models, demonstrating that their gradient complexity scales moderately with the number of data patches. Experiments on synthetic datasets are conducted to validate the theoretical findings.
Results
Theoretical analysis shows that AR diffusion models achieve better KL divergence convergence to the true conditional distributions compared to vanilla diffusion models. Empirical experiments confirm that AR diffusion models effectively capture conditional dependencies in data with clear structures, while vanilla models fail. However, in the absence of such structures, AR diffusion models do not outperform vanilla models. The computational cost of AR diffusion models is only moderately higher than that of vanilla models, making them practical for large-scale applications.
Implications
The findings suggest that AR diffusion models are well-suited for tasks requiring the modeling of conditional dependencies, such as physical simulations, causal inference, and structured data generation. Their ability to capture high-level relationships in data could improve the performance of generative models in domains like video generation, text-to-image synthesis, and other applications where conditional dependencies are critical.
View on arXiv

Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables

Yaru Liu, Yiqi Gu, Michael K. Ng
  • Introduces self-adaptive weighted auxiliary variables to improve optimization in deep learning models.
  • Proves consistency between the reformulated weighted loss and the original mean squared error loss for both FNNs and PINNs.
  • Demonstrates superior robustness and accuracy of SAPM models compared to conventional least squares and standard penalty models.
  • Addresses key challenges in deep learning optimization, including vanishing gradients and high non-convexity.
  • Provides theoretical guarantees and numerical validation for the proposed models.
Read More
Abstract
This paper introduces a novel optimization framework for deep learning, addressing challenges such as the high non-convexity of loss functions and the vanishing gradient problem. The authors propose a self-adaptive weighted penalty model (SAPM) that incorporates auxiliary variables to decouple deep neural network layers, reformulating the optimization problem for improved efficiency and robustness. Unlike previous auxiliary variable methods, which suffer from inconsistency between the reformulated and original loss functions, the SAPM approach ensures consistency through self-adaptive weights. The framework is applied to both fully connected neural networks (FNNs) and physics-informed neural networks (PINNs) for problems involving partial differential equations (PDEs). Theoretical proofs and numerical experiments demonstrate that the proposed SAPM models outperform conventional least squares and standard penalty models in terms of robustness, accuracy, and learning error reduction.
Methodology
The authors extend the auxiliary variable framework by introducing self-adaptive weights to the quadratic penalty terms, ensuring consistency between the reformulated and original loss functions. They develop models for both fully connected neural networks (SAPM-FNN) and physics-informed neural networks (SAPM-PINN) associated with first-order linear PDEs. Theoretical analysis is conducted to prove consistency, and numerical experiments are performed to validate the models' effectiveness and robustness.
Results
The proposed SAPM models achieve smaller learning errors and exhibit greater robustness to random initialization compared to conventional least squares and standard penalty models. The consistency constants for SAPM are shown to scale as O(L) for FNNs and O(dL^2) for PINNs, where L is the network depth and d is the problem dimension. Numerical experiments confirm these theoretical results and demonstrate the practical advantages of SAPM in both regression tasks and PDE-based problems.
Implications
The proposed SAPM framework has significant implications for improving optimization in deep learning, particularly in scenarios where gradient descent struggles due to non-convexity or vanishing gradients. It can be applied to a wide range of problems, including regression tasks with fully connected neural networks and scientific computing problems involving physics-informed neural networks. The method's robustness and accuracy make it a promising tool for advancing deep learning applications in both academic and industrial settings.
View on arXiv

Efficient LLMs with AMP: Attention Heads and MLP Pruning

Leandro Giusti Mugnaini, Bruno Lopes Yamamoto, Lucas Lauton de Alcantara, Victor Zacarias, Edson Bollis, Lucas Pellicer, Anna Helena Reali Costa, Artur Jordao
  • AMP is a structured pruning method that removes attention heads and MLP neurons based on activation magnitudes, ensuring uniform pruning across layers.
  • The method achieves up to 1.49 percentage points improvement in task accuracy compared to state-of-the-art pruning techniques, with a 30% pruning ratio.
  • AMP provides up to 1.25Ă— inference speedup without requiring specialized hardware, making it practical for real-world applications.
  • The approach is applicable to multiple LLM families and requires minimal fine-tuning for performance recovery.
  • AMP aligns with Green AI principles, offering cost savings and environmental benefits through reduced computational demands.
Read More
Abstract
This paper introduces AMP (Attention Heads and MLP Pruning), a novel structured pruning method designed to compress large language models (LLMs) by removing less critical components within Multi-Head Attention (MHA) and Multilayer Perceptron (MLP) layers. Unlike existing pruning techniques, AMP evaluates structural importance by projecting input data onto weights, enabling efficient identification of unimportant structures. The method achieves significant compression ratios (up to 30%) while maintaining competitive performance on zero-shot tasks. AMP also delivers practical inference speedups (up to 1.25Ă—) without requiring specialized hardware, making it suitable for deployment in resource-constrained environments. The authors validate AMP across different LLM families, including LLaMA and Phi, and demonstrate its alignment with Green AI principles by reducing computational costs and CO2 emissions.
Methodology
AMP evaluates the importance of attention heads and MLP neurons by projecting input data onto weights, using activation magnitudes as a criterion for pruning. The method applies uniform pruning across all layers and requires minimal fine-tuning to recover performance. It is tested on various LLM families, including LLaMA and Phi, to ensure generalizability.
Results
AMP achieves a 30% pruning ratio with minimal impact on zero-shot task performance, surpassing state-of-the-art pruning methods by up to 1.49 percentage points in accuracy. It also delivers up to 1.25Ă— inference speedup, demonstrating its efficiency and practicality for deployment in resource-constrained environments.
Implications
AMP has significant implications for the deployment of LLMs in low-resource and time-critical settings, enabling efficient model compression without sacrificing performance. Its hardware-agnostic nature and alignment with Green AI principles make it a valuable tool for reducing computational costs and environmental impact, particularly in large-scale AI applications.
View on arXiv

Enhanced Semi-Supervised Stamping Process Monitoring with Physically-Informed Feature Extraction

Jianyu Zhang, Jianshe Feng, Yizhang Zhu, Fanyu Qi
  • Introduces a semi-supervised anomaly detection framework for stamping processes using accelerometer data.
  • Proposes a hybrid feature extraction method combining data-driven techniques and physically-informed insights.
  • Develops a 'golden baseline' model using only normal samples to address imbalanced datasets.
  • Introduces a novel deviation score to quantify anomaly levels in real-time.
  • Demonstrates superior anomaly detection performance on a real-world stamping dataset.
Read More
Abstract
This paper introduces a novel semi-supervised framework for real-time anomaly monitoring in stamping processes, leveraging accelerometer signals and physically-informed feature extraction. The framework addresses challenges such as imbalanced sample distributions, noise, and limited anomaly data by combining data-driven methods with physical insights. A hybrid feature extraction algorithm is proposed to preprocess raw accelerometer data, suppress noise, and extract key features informed by both physical mechanisms and statistical methods. A semi-supervised anomaly detection model is then developed, utilizing only normal samples to construct a 'golden baseline' model. This model quantifies anomalies using a novel deviation score, enabling effective detection of process irregularities. The proposed approach is validated on a real-world dataset from a stamping manufacturing workshop, demonstrating superior performance in detecting anomalies compared to traditional methods. The framework aims to reduce batch defects, enhance production yield, and provide actionable insights for process improvement.
Methodology
The study employs a hybrid feature extraction algorithm that integrates digital signal processing techniques (e.g., noise suppression and drift correction) with physically-informed approaches to extract meaningful features from accelerometer data. A semi-supervised learning model is then constructed using only normal samples to establish a baseline for anomaly detection. Anomalies are quantified using a novel deviation score, enabling real-time monitoring. The framework is validated using a real-world dataset from a stamping manufacturing workshop.
Results
The proposed framework outperforms traditional methods in detecting process anomalies, effectively addressing challenges such as imbalanced datasets and noise. The hybrid feature extraction method enhances the quality of input data, while the semi-supervised model achieves high precision in identifying anomalies. The approach successfully reduces the risk of batch defects and improves production yield in the stamping process.
Implications
This framework has significant implications for advanced manufacturing, particularly in high-throughput and precision production environments. By enabling real-time anomaly detection and reducing batch defects, it can enhance production efficiency, ensure product quality, and minimize downtime. The integration of physically-informed insights also provides a foundation for root cause analysis and process improvement, making it a valuable tool for smart manufacturing systems.
View on arXiv

FAST-Q: Fast-track Exploration with Adversarially Balanced State Representations for Counterfactual Action Estimation in Offline Reinforcement Learning

Pulkit Agrawal, Rukma Talwadker, Aditya Pareek, Tridib Mukherjee
  • Introduces adversarially balanced state representations using Gradient Reversal Learning to mitigate policy-specific biases and enable counterfactual action estimation.
  • Supports offline counterfactual exploration alongside static data exploitation, addressing challenges of sparse and disparate state-action spaces.
  • Proposes a Q-value decomposition strategy for multi-objective optimization, facilitating explainable recommendations tailored to short-term and long-term objectives.
  • Demonstrates superior performance over SOTA methods in a volatile gaming platform, achieving improvements in player returns, engagement, and cost efficiency.
  • Addresses the need for robust generalization in offline RL by normalizing state representations across policies.
Read More
Abstract
The paper introduces FAST-Q, a novel offline reinforcement learning (RL) framework designed to address challenges in counterfactual action estimation and policy optimization in high-stakes, volatile environments such as online gaming platforms. Traditional offline RL approaches struggle with overestimation of Q-values for out-of-distribution actions, exacerbated by sparse and biased state-action distributions in static datasets. FAST-Q tackles these issues by leveraging adversarially balanced state representations, enabling robust generalization across policies and improving counterfactual reasoning. The framework also incorporates a Q-value decomposition strategy for multi-objective optimization, balancing short-term and long-term goals while providing explainable recommendations. Experimental results on a real-world gaming platform demonstrate significant improvements in player engagement, lifetime value (LTV), and platform dwell time, along with reduced recommendation costs, outperforming state-of-the-art (SOTA) offline RL methods.
Methodology
FAST-Q employs Gradient Reversal Learning to construct adversarially balanced state representations, reducing policy-specific biases in state-action mappings. It integrates counterfactual exploration with static data exploitation to improve generalization across sparse and disparate state spaces. Additionally, a Q-value decomposition strategy is introduced to optimize multi-objective goals, balancing immediate engagement and long-term retention. The framework is evaluated on a real-world gaming platform with highly variable player behavior and engagement patterns.
Results
{'player_returns': 'At least 0.15% increase in player returns.', 'lifetime_value': '2% improvement in lifetime value (LTV).', 'engagement': '0.4% enhancement in recommendation-driven engagement.', 'dwell_time': "2% improvement in players' platform dwell time.", 'cost_efficiency': '10% reduction in recommendation-associated costs.'}
Implications
FAST-Q has significant implications for real-world applications in recommendation systems, particularly in volatile and high-stakes environments like online gaming. By enabling robust counterfactual reasoning and multi-objective optimization, it can improve user engagement, retention, and cost efficiency. The framework's explainability and adaptability to dynamic user behavior make it a valuable tool for personalized recommendations in other domains, such as e-commerce and healthcare.
View on arXiv

Fairness in Graph Learning Augmented with Machine Learning: A Survey

Renqiang Luo, Ziqi Xu, Xikun Zhang, Qing Qing, Huafei Huang, Enyan Dai, Zhe Wang, Bo Yang
  • GL-ML combines graph learning with specialized machine learning techniques to address limitations like over-smoothing, scalability, and privacy concerns.
  • Fairness challenges in GL-ML are more complex than in traditional graph learning due to dual computational structures and new biases introduced by machine learning techniques.
  • The paper identifies unique fairness issues, such as dual-perspective fairness in federated graph learning and attention biases in Graph Transformers.
  • Four critical techniques for improving fairness in GL-ML are explored, including fairness-aware encoding and counterfactual augmentation.
  • The survey highlights the need for systematic research to address fairness challenges in high-stakes applications like recommendation systems, criminal justice, and loan approval.
Read More
Abstract
This paper provides a comprehensive survey of fairness challenges and solutions in Graph Learning augmented with Machine Learning (GL-ML). While GL-ML techniques have significantly advanced graph learning by addressing issues like over-smoothing, scalability, and dynamic adaptability, they also introduce unique fairness challenges. These challenges arise from the interplay between traditional graph learning mechanisms and the specialized machine learning techniques used for augmentation. The paper systematically categorizes these fairness issues, such as dual-perspective fairness in federated graph learning and attention biases in Graph Transformers, and highlights how they differ from fairness concerns in traditional graph learning. Additionally, the survey explores four critical techniques for improving fairness in GL-ML, including fairness-aware encoding, counterfactual augmentation, and fairness controllers. By identifying the root causes of fairness challenges and proposing a taxonomy of solutions, the paper lays the groundwork for future research in this emerging field.
Methodology
The authors conduct a systematic review of existing literature on fairness in graph learning and GL-ML. They categorize fairness challenges and solutions, focusing on the interplay between graph learning mechanisms and machine learning augmentations. The paper also introduces a taxonomy of fairness-aware techniques and provides illustrative examples of fairness issues in real-world applications.
Results
The survey identifies key fairness challenges unique to GL-ML, such as dual-perspective fairness in federated learning and biases in Graph Transformers. It also highlights four critical techniques for addressing these challenges and provides a taxonomy to guide future research. The findings emphasize the need for tailored fairness solutions in GL-ML to ensure equitable outcomes in diverse applications.
Implications
The insights from this survey have significant implications for the development of fair GL-ML systems in high-stakes domains like criminal justice, disaster response, and financial decision-making. By addressing fairness challenges, researchers and practitioners can ensure that GL-ML technologies are more equitable and socially responsible. The taxonomy and techniques proposed in the paper also provide a foundation for future innovation in fairness-aware graph learning.
View on arXiv

FedHERO: A Federated Learning Approach for Node Classification Task on Heterophilic Graphs

Zihan Chen, Xingbo Fu, Yushun Dong, Jundong Li, Cong Shen
  • FedHERO addresses the limitations of traditional FGL methods in handling heterophilic graphs by introducing a dual-channel GNN architecture.
  • The global channel learns from shared latent graphs to capture common patterns across clients, while the local channel retains unique structural information from the original graph.
  • FedHERO enhances privacy by avoiding direct sharing of sensitive graph data and instead sharing insights derived from latent graphs.
  • The framework achieves superior performance in node classification tasks compared to existing FGL methods, particularly in scenarios with diverse neighbor distribution patterns.
  • FedHERO sets a new precedent for federated learning on graph data with varying levels of heterophily.
Read More
Abstract
This paper introduces FedHERO, a novel Federated Graph Learning (FGL) framework designed to address the challenges of node classification tasks on heterophilic graphs, where nodes of different classes are more likely to connect. Traditional FGL methods assume homophilic graphs, leading to poor performance when aggregating models trained on heterophilic graphs due to conflicting patterns in node neighbor distributions. FedHERO employs a dual-channel Graph Neural Network (GNN) architecture consisting of a global channel and a local channel. The global channel leverages a shared structure learning model to generate latent graphs that capture common patterns across clients, enabling effective knowledge sharing. The local channel operates on the original graph, preserving unique structural information and enhancing privacy. This design allows FedHERO to balance the learning of universal patterns with the retention of local graph-specific insights. Extensive experiments demonstrate that FedHERO outperforms existing FGL methods in handling heterophilic graph data, achieving superior performance in node classification tasks.
Methodology
FedHERO employs a dual-channel GNN architecture. The global channel uses a shared structure learning model to generate latent graphs that capture common patterns across clients, enabling effective aggregation of local models. The local channel operates on the original graph, preserving unique structural information. This design allows the framework to balance global knowledge sharing with local specificity. The shared structure learning model ensures that the global channel reduces reliance on the original neighbor distribution, while the local channel enhances privacy by retaining sensitive information within each client.
Results
FedHERO outperforms existing FGL methods in extensive experiments on node classification tasks involving heterophilic graphs. The framework demonstrates improved generalization and predictive performance by effectively handling diverse neighbor distribution patterns across clients. The dual-channel design enables FedHERO to achieve a balance between global knowledge sharing and local graph-specific learning, leading to significant performance gains.
Implications
FedHERO has significant implications for federated learning on graph data, particularly in privacy-sensitive domains such as financial networks, healthcare, and social networks. By effectively handling heterophilic graphs, the framework enables collaborative learning across diverse datasets without compromising privacy or performance. This approach could pave the way for more robust and generalizable federated learning methods in graph-based applications.
View on arXiv

Frequency Feature Fusion Graph Network For Depression Diagnosis Via fNIRS

Chengkai Yang, Xingping Dong, Xiaofen Zong
  • Introduced a novel temporal biomarker for depression diagnosis by leveraging frequency-domain features via discrete Fourier transform (DFT).
  • Proposed a phased Temporal Graph Convolutional Network (TGCN) architecture to model distinct task periods in fNIRS data (silent, task, and other silent).
  • Built a large-scale fNIRS dataset with 1,086 subjects, significantly larger than prior datasets, and used propensity score matching (PSM) for class balance.
  • Validated the interpretability of the model using SHapley Additive exPlanation (SHAP), aligning results with known medical findings.
  • Achieved improved F1 scores compared to baseline methods, addressing the challenge of imbalanced datasets in depression diagnosis.
Read More
Abstract
This paper introduces a novel graph neural network (GNN)-based approach for depression diagnosis using functional near-infrared spectroscopy (fNIRS) data. The authors address key limitations in prior work, including small datasets, lack of frequency-domain analysis, and insufficient modeling of task-specific temporal features. They propose a Frequency Feature Fusion Graph Network (FFF-GN) that incorporates a new temporal biomarker derived from the discrete Fourier transform (DFT) and a phased Temporal Graph Convolutional Network (TGCN) architecture. The model is trained on a significantly larger dataset of 1,086 subjects, over 10 times larger than previous datasets, and further refined using propensity score matching (PSM) to ensure balanced class distributions. Experimental results demonstrate that the proposed method achieves improved F1 scores and interpretability, validated using SHapley Additive exPlanation (SHAP). This work contributes to advancing automated, interpretable, and effective tools for depression diagnosis in clinical settings.
Methodology
The authors developed a Frequency Feature Fusion Graph Network (FFF-GN) that integrates spatial and temporal features from fNIRS data. Temporal features were transformed into the frequency domain using DFT, and a Frequency Point Biserial Correlation Attention Module (FAM) was designed to capture correlations between frequency components and classification labels. The model employs a phased TGCN structure to process data from distinct task periods. A large dataset of 1,086 subjects was used, with propensity score matching applied to create a balanced subset. SHAP was utilized to ensure interpretability of the model's predictions.
Results
The proposed model demonstrated superior performance in depression diagnosis, achieving higher F1 scores compared to baseline methods on both the full dataset and the PSM-refined subset. The incorporation of frequency-domain features and task-specific modeling significantly enhanced the representation of temporal characteristics in brain channels. The use of SHAP confirmed the interpretability and clinical relevance of the model.
Implications
This work has the potential to improve automated depression diagnosis by providing a robust, interpretable, and scalable approach. The integration of frequency-domain features and task-specific modeling could inspire future research in brain imaging-based mental health diagnostics. Additionally, the large dataset and use of SHAP for interpretability make the method more applicable to real-world clinical settings.
View on arXiv

GLIP-OOD: Zero-Shot Graph OOD Detection with Foundation Model

Haoyan Xu, Zhengtao Yao, Xuzhi Zhang, Ziyi Wang, Langzhou He, Yushun Dong, Philip S. Yu, Mengyuan Li, Yue Zhao
  • GLIP-OOD is the first framework to enable zero-shot OOD detection in graph-structured data using a graph foundation model (GFM).
  • The framework outperforms supervised graph learning methods despite requiring no labeled in-distribution (ID) data.
  • GLIP-OOD leverages large language models (LLMs) to generate pseudo-OOD labels, enabling fine-grained OOD detection in practical scenarios.
  • The method achieves state-of-the-art performance on four benchmark text-attributed graph datasets.
  • This work bridges the gap in zero-shot OOD detection between vision, text, and graph domains.
Read More
Abstract
This paper introduces GLIP-OOD, a novel framework for zero-shot out-of-distribution (OOD) detection in graph-structured data using a graph foundation model (GFM). Unlike traditional graph OOD detection methods that rely on labeled in-distribution (ID) data, GLIP-OOD operates without any node-level supervision. The authors demonstrate that GFMs, when provided only with class label names, can achieve superior OOD detection performance compared to supervised methods. To address practical scenarios where OOD label names are unavailable, the framework employs large language models (LLMs) to generate semantically meaningful pseudo-OOD labels from unlabeled data. These pseudo-labels allow the GFM to better distinguish between ID and OOD classes. The proposed method achieves state-of-the-art results on four benchmark text-attributed graph datasets, marking a significant step forward in zero-shot OOD detection for graphs.
Methodology
The authors use a graph foundation model (GFM) to perform zero-shot OOD detection by directly leveraging class label names. To handle scenarios where OOD label names are unavailable, they employ large language models (LLMs) to generate pseudo-OOD labels from unlabeled graph data. These pseudo-labels are then used to augment the label space, allowing the GFM to capture nuanced semantic boundaries between ID and OOD classes. The framework is evaluated on four benchmark datasets in a transductive setting.
Results
GLIP-OOD achieves state-of-the-art performance on four benchmark text-attributed graph datasets, surpassing supervised graph learning methods that rely on large amounts of labeled ID data. The framework demonstrates the inherent OOD detection capabilities of GFMs and the effectiveness of LLM-generated pseudo-OOD labels in improving detection accuracy.
Implications
GLIP-OOD has significant implications for deploying machine learning systems in dynamic and open-world environments, such as social networks, citation graphs, and e-commerce platforms. By eliminating the need for labeled ID data, the framework reduces the cost and effort associated with training graph models, making it a practical solution for real-world applications where unseen or anomalous data frequently arise.
View on arXiv

Generative QoE Modeling: A Lightweight Approach for Telecom Networks

Vinti Nayar, Kanica Sachdev, Brejesh Lall
  • Introduces a VQ-HMM framework for lightweight QoE modeling, balancing accuracy and computational efficiency.
  • Validates the use of Vector Quantization (VQ) for transforming continuous features into discrete symbols for sequence modeling.
  • Demonstrates competitive performance on publicly available time-series datasets with low inference latency.
  • Highlights the applicability of the framework in resource-constrained environments and NR/NRR scenarios.
  • Positions the approach as a scalable alternative to deep learning models for real-time QoE estimation.
Read More
Abstract
This paper introduces a lightweight generative modeling framework for Quality of Experience (QoE) prediction in telecommunication networks, emphasizing computational efficiency, interpretability, and predictive accuracy. The proposed approach leverages a Vector Quantization (VQ) preprocessing step to transform continuous network features into discrete categorical symbols, which are then modeled using a Hidden Markov Model (HMM). This VQ-HMM pipeline is designed to capture temporal QoE dynamics effectively while enabling probabilistic inference on unseen data. The framework is validated on publicly available time-series datasets that include both objective metrics and subjective QoE scores. The results demonstrate that the proposed method achieves competitive prediction accuracy with low inference latency, making it suitable for real-time, resource-constrained environments such as edge platforms and wearable devices. The study also highlights the potential of this approach for No Reference (NR) and Near Reference (NRR) scenarios, where direct ground truth for QoE is unavailable. By offering a scalable alternative to deep learning models, the paper addresses the challenges of computational complexity and latency in QoE estimation.
Methodology
The proposed framework uses Vector Quantization (VQ) to preprocess continuous network features into discrete categorical symbols. These symbols are then modeled using a Hidden Markov Model (HMM) to capture temporal dependencies in QoE data. The generative nature of HMMs allows for probabilistic inference, semi-supervised learning, and synthetic data generation. The approach is evaluated on publicly available time-series datasets containing both objective metrics and subjective QoE scores.
Results
The VQ-HMM framework achieves competitive QoE prediction accuracy while maintaining low inference latency. The method is validated on real-world datasets, demonstrating its suitability for real-time applications in resource-constrained environments. The generative structure of the model also supports data augmentation and probabilistic inference on unseen data.
Implications
The proposed framework has significant implications for real-time QoE estimation in telecommunication networks, particularly in scenarios with limited computational resources or strict latency requirements. Its lightweight design makes it suitable for deployment on edge platforms and wearable devices. Additionally, the ability to handle NR and NRR scenarios expands its applicability to contexts where direct QoE ground truth is unavailable.
View on arXiv

Graph Synthetic Out-of-Distribution Exposure with Large Language Models

Haoyan Xu, Zhengtao Yao, Ziyi Wang, Zhan Cheng, Xiyang Hu, Mengyuan Li, Yue Zhao
  • Introduces GOE-LLM, a framework for OOD detection in graphs that eliminates the need for real OOD samples.
  • Leverages LLMs for zero-shot OOD annotation and synthetic OOD node generation.
  • Demonstrates significant performance improvements over baseline methods without OOD exposure.
  • Achieves comparable results to methods that rely on real OOD data.
  • Provides a scalable and practical solution for OOD detection in text-attributed graphs.
Read More
Abstract
This paper addresses the challenge of out-of-distribution (OOD) detection in graph-based machine learning, particularly in text-attributed graphs (TAGs). OOD detection is critical for ensuring model robustness in open-world and safety-sensitive applications. Existing methods often rely on real OOD samples for training, which can be impractical due to the difficulty of acquiring representative OOD data. The authors propose GOE-LLM, a novel framework that leverages Large Language Models (LLMs) to generate synthetic OOD samples, eliminating the need for real OOD data. GOE-LLM introduces two pipelines: (1) identifying pseudo-OOD nodes using LLMs in a zero-shot setting, and (2) generating synthetic OOD nodes via LLM-prompted text generation. These pseudo-OOD nodes are incorporated into the training process to enhance the OOD detection capabilities of graph models. Experimental results on multiple benchmark datasets demonstrate that GOE-LLM significantly outperforms baseline methods that do not use OOD exposure and achieves performance comparable to methods relying on real OOD data.
Methodology
The proposed GOE-LLM framework consists of two pipelines: (1) using LLMs to identify pseudo-OOD nodes in a zero-shot setting by prompting the model to classify unlabeled nodes as OOD if they do not belong to any in-distribution (ID) classes, and (2) generating synthetic OOD nodes by prompting LLMs to create text descriptions of OOD samples, which are then integrated into the graph using graph structure learning techniques. The augmented graph, containing both ID and pseudo-OOD nodes, is used to train the ID classifier.
Results
GOE-LLM significantly outperforms state-of-the-art graph OOD detection methods that do not use OOD exposure. It achieves performance comparable to methods that rely on real OOD data, demonstrating the effectiveness of synthetic OOD generation using LLMs. The framework is validated across multiple benchmark datasets, highlighting its robustness and generalizability.
Implications
The proposed method has significant implications for real-world applications in safety-critical and open-world settings, such as social networks, recommendation systems, and e-commerce platforms. By eliminating the need for real OOD data, GOE-LLM provides a scalable and cost-effective solution for enhancing the robustness of graph-based machine learning models in detecting distributional shifts.
View on arXiv

LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning

Neha Prakriya, Zijian Ding, Yizhou Sun, Jason Cong
  • LIFT combines LLMs and GNNs to automate pragma insertion for HLS, addressing the challenges of manual optimization in FPGA programming.
  • The framework incorporates domain-specific training to handle bidirectional pragma context, learn from invalid designs, and capture structural impacts of pragmas.
  • LIFT achieves significant performance gains over state-of-the-art methods, improving design performance by up to 3.52Ă— compared to AutoDSE and 2.16Ă— compared to HARP.
  • The method leverages both the sequential modeling capabilities of LLMs and the structural reasoning strengths of GNNs for better generalization and scalability.
  • LIFT demonstrates the potential of integrating advanced AI techniques for domain-specific hardware synthesis tasks.
Read More
Abstract
This paper introduces LIFT, a novel framework that leverages large language models (LLMs) fine-tuned with graph neural networks (GNNs) to automate the insertion of optimization pragmas in High-Level Synthesis (HLS) for FPGA programming. HLS tools simplify FPGA programming by abstracting low-level hardware descriptions into C/C++ code, but achieving high performance still requires manual pragma insertion, which is time-consuming and demands expert knowledge. LIFT addresses this challenge by combining the sequential modeling capabilities of LLMs with the structural reasoning strengths of GNNs to generate performance-critical pragmas. The framework incorporates domain-specific training strategies to overcome challenges unique to HLS, such as bidirectional pragma context dependencies, learning from both valid and failed designs, and understanding the structural impact of pragmas on microarchitecture. Experimental results demonstrate that LIFT significantly outperforms state-of-the-art methods like AutoDSE and HARP, achieving up to 3.52Ă— and 2.16Ă— performance improvements, respectively, and a 66Ă— improvement over GPT-4o.
Methodology
LIFT fine-tunes a large language model (LLM) using supervision from a graph neural network (GNN). The GNN captures structural and semantic information about the code, such as control and data dependencies, while the LLM models long-range token dependencies. The training process incorporates domain-specific strategies to address challenges like bidirectional pragma context, learning from invalid designs, and understanding the structural impact of pragmas. The framework is trained on datasets like HLSyn and DB4HLS, which include both valid and invalid design configurations.
Results
LIFT achieves an average performance improvement of 3.52Ă— over AutoDSE, 2.16Ă— over HARP, and 66Ă— over GPT-4o. These results highlight its ability to generate high-quality pragma configurations that significantly enhance the performance of HLS designs.
Implications
LIFT has the potential to democratize FPGA programming by reducing the need for expert knowledge in pragma insertion, thereby accelerating the adoption of FPGAs in data centers and other high-performance computing applications. Its integration of LLMs and GNNs could inspire similar approaches in other domain-specific optimization tasks.
View on arXiv

Learning Heterogeneous Performance-Fairness Trade-offs in Federated Learning

Rongguang Ye, Ming Tang
  • HetPFL introduces Preference Sampling Adaptation (PSA) to dynamically adjust preference sampling distributions for heterogeneous clients, improving local Pareto front learning efficiency.
  • Preference-aware Hypernet Fusion (PHF) ensures high-quality global Pareto fronts by aggregating client hypernets based on their strengths for different preferences.
  • The framework achieves linear convergence with an error rate of O(1/t), under weaker assumptions than prior methods.
  • Extensive experiments on four datasets show that HetPFL outperforms seven baselines, with significant improvements in both local and global Pareto front quality.
  • HetPFL addresses both efficiency and generalization challenges in federated learning, balancing performance and fairness trade-offs effectively.
Read More
Abstract
This paper introduces HetPFL, a novel framework for federated learning (FL) that addresses the challenges of balancing performance and fairness across heterogeneous clients. Existing methods often assume a uniform preference sampling distribution for all clients, which fails to account for the variability in local Pareto fronts. Additionally, prior approaches focus primarily on local Pareto fronts, neglecting the global Pareto front, which represents the trade-offs on the aggregated global dataset. HetPFL overcomes these limitations through two key components: Preference Sampling Adaptation (PSA) and Preference-aware Hypernet Fusion (PHF). PSA dynamically adjusts the preference sampling distribution for each client using a data-driven metric called HyperVolume Contribution (HVC), improving the efficiency of local Pareto front learning. PHF, on the other hand, aggregates client-specific hypernets in a preference-aware manner to enhance the global Pareto front. The authors prove that HetPFL achieves linear convergence under weaker assumptions compared to existing methods. Experimental results on four datasets demonstrate that HetPFL significantly outperforms seven baseline methods, achieving approximately 1.75% and 5.5% improvements in local and global Pareto front quality, respectively.
Methodology
HetPFL employs two main components: (1) Preference Sampling Adaptation (PSA), which uses HyperVolume Contribution (HVC) to quantify the contribution of each preference to the Pareto front and optimizes the sampling distribution via bi-level optimization; and (2) Preference-aware Hypernet Fusion (PHF), which aggregates client-specific hypernets at the server by identifying their strengths for various preferences. This approach ensures efficient learning of both local and global Pareto fronts.
Results
HetPFL achieves approximately 1.75% improvement in local Pareto front quality and 5.5% improvement in global Pareto front quality compared to the best-performing baseline. The framework demonstrates superior performance across four datasets and converges linearly with respect to the number of communication rounds.
Implications
HetPFL has significant implications for federated learning applications where fairness and performance trade-offs are critical, such as healthcare, finance, and IoT. By addressing heterogeneity in client data and ensuring high-quality global models, HetPFL can enhance the deployment of FL systems in real-world scenarios requiring equitable and efficient decision-making.
View on arXiv

MPEC: Manifold-Preserved EEG Classification via an Ensemble of Clustering-Based Classifiers

Shermin Shahbazi, Mohammad-Reza Nasiri, Majid Ramezani
  • MPEC preserves the non-Euclidean manifold structure of EEG signals, addressing limitations of traditional Euclidean-based methods.
  • The method combines covariance matrices and RBF kernels for feature engineering, capturing both linear and non-linear relationships.
  • A modified K-means algorithm, tailored for Riemannian manifold space, ensures local geometric sensitivity during clustering.
  • An ensemble of clustering-based classifiers, combined via stacking, boosts classification accuracy and robustness.
  • MPEC outperforms traditional methods on the BCI Competition IV dataset 2a, demonstrating its effectiveness in EEG classification.
Read More
Abstract
This paper introduces MPEC (Manifold-Preserved EEG Classification), a novel method for classifying EEG signals by preserving their intrinsic manifold structure. EEG signals, which reside in a non-Euclidean, high-dimensional space, are often misclassified by traditional methods that assume a Euclidean structure. MPEC addresses this limitation through two key innovations: (1) a feature engineering phase that combines covariance matrices and Radial Basis Function (RBF) kernels to capture both linear and non-linear relationships among EEG channels, and (2) a clustering phase that employs a modified K-means algorithm tailored for Riemannian manifold space, ensuring sensitivity to local geometric properties. The method further leverages an ensemble of clustering-based classifiers, combined via stacking, to enhance classification accuracy and robustness. The approach is validated on the BCI Competition IV dataset 2a, where it demonstrates significant improvements over traditional Euclidean-based methods. The study highlights the importance of preserving the manifold structure of EEG data for applications in brain-computer interfaces (BCIs), neuroprosthetics, and diagnostics.
Methodology
MPEC employs a multi-step pipeline: (1) Feature engineering using covariance matrices and RBF kernels to capture linear and non-linear relationships; (2) Clustering using a modified K-means algorithm designed for Riemannian manifold space, leveraging a custom distance metric based on Riemannian distance and curvature; (3) Dimensionality reduction by projecting clusters onto the tangent space; and (4) Ensemble learning, where multiple weak classifiers are combined via stacking to improve accuracy while preserving the manifold structure.
Results
MPEC achieved significant improvements in classification accuracy on the BCI Competition IV dataset 2a compared to traditional Euclidean-based methods. The ensemble approach enhanced robustness and sensitivity to the geometric properties of EEG data, demonstrating the effectiveness of preserving manifold structure in EEG classification tasks.
Implications
The proposed MPEC framework has potential applications in brain-computer interfaces (BCIs), neuroprosthetics, and medical diagnostics, where accurate and robust EEG signal classification is critical. By preserving the manifold structure of EEG data, MPEC can improve real-time control, cognitive applications, and diagnostic accuracy in these domains.
View on arXiv

Model Connectomes: A Generational Approach to Data-Efficient Language Models

Klemen Kotar, Greta Tuckute
  • The paper introduces a generational learning framework inspired by biological evolution and individual learning processes.
  • A sparse 'model connectome' is derived through iterative pruning across generations and used to initialize models for data-efficient learning.
  • The connectome model outperforms or matches control models in NLP tasks and aligns better with human behavior and brain data.
  • The approach emphasizes sparse initialization as a means to improve learning in low-data regimes, rather than focusing on model compression.
  • This is the first study to explore the behavioral and neural alignment of generationally pruned models.
Read More
Abstract
This paper introduces a novel framework for training artificial neural networks inspired by the nested learning processes observed in biological systems. The authors propose a generational learning approach, where an 'outer loop' of evolution shapes a sparse 'model connectome'—a binary wiring diagram of excitatory and inhibitory connections—used to initialize a model for an 'inner loop' of data-efficient learning. The framework is tested on language models, specifically GPT-2 architectures, where the connectome is derived through iterative pruning across generations using a large dataset. The resulting sparse connectome model is then trained on a much smaller dataset, mimicking developmental-scale learning. The study demonstrates that the connectome model performs better or on par with control models in natural language processing tasks and aligns more closely with human behavior and brain responses. This work highlights the potential of sparse initialization to enhance learning in low-data regimes, bridging the gap between artificial and biological neural networks.
Methodology
The authors use a two-phase training process: (1) an evolutionary outer loop where a model is trained on a large dataset and iteratively pruned to create a sparse connectome, and (2) a developmental inner loop where the connectome-initialized model is trained on a smaller dataset. The connectome retains only 25% of the original weights, with fixed positive or negative values. The framework is evaluated using GPT-2 models and compared against two controls: a randomly pruned model and a fully-connected model with standard initialization.
Results
The connectome model demonstrated superior or comparable performance to control models on NLP tasks. It also showed stronger alignment with human behavior and brain responses during language processing. Additionally, the connectome model converged faster during training, highlighting its efficiency in low-data regimes.
Implications
This work suggests that incorporating generational learning principles into artificial neural networks can improve data efficiency and alignment with biological systems. The proposed framework could be applied to enhance performance in low-resource settings, improve model interpretability, and advance the integration of neuroscience-inspired methods in AI development.
View on arXiv

Modeling and Performance Analysis for Semantic Communications Based on Empirical Results

Shuai Ma, Bin Shen, Chuanhui Zhang, Youlong Wu, Hang Li, Shiyin Li, Guangming Shi, Naofal Al-Dhahir
  • The paper introduces the ABG formula, the first theoretical model linking end-to-end performance metrics and SNR for semantic communications.
  • The ABG formula is validated for image reconstruction tasks using DL models like SCUNet and Vision Transformer with MS-SSIM as the metric.
  • A closed-form expression is proposed to model the relationship between MS-SSIM and the number of quantized output bits.
  • Adaptive power control and optimal power allocation schemes are developed to maximize energy efficiency and QoS in semantic communication systems.
  • Extensive simulations confirm the accuracy of the ABG formula and the superiority of the proposed power allocation methods.
Read More
Abstract
This paper addresses the challenge of analyzing the performance of semantic communications, which rely on deep learning (DL)-based semantic encoders and decoders. The authors propose the Alpha-Beta-Gamma (ABG) formula, a theoretical model that establishes a relationship between end-to-end performance metrics and signal-to-noise ratio (SNR) for semantic communication systems. The ABG formula is applicable to both image reconstruction and inference tasks, and it provides a tractable framework for performance analysis. For image reconstruction tasks, the formula is validated using popular DL models like SCUNet and Vision Transformer, with the multi-scale structural similarity index measure (MS-SSIM) as the performance metric. The paper also derives a closed-form expression to relate MS-SSIM to the number of quantized output bits of semantic encoders. Additionally, the authors propose adaptive power control and optimal power allocation schemes to enhance energy efficiency and ensure quality of service (QoS) in semantic communications over fading channels. Extensive simulations demonstrate the effectiveness of the ABG formula and the proposed power allocation strategies.
Methodology
The authors derive the ABG formula to model the relationship between end-to-end performance metrics and SNR for semantic communications. They validate the formula using empirical results from DL-based semantic encoders and decoders for image reconstruction tasks. Additionally, they propose adaptive power control and optimal power allocation schemes, leveraging the ABG formula to optimize energy efficiency and QoS in fading channel scenarios. The power allocation schemes are implemented using algorithms like the bisection method.
Results
The ABG formula accurately models the relationship between SNR and end-to-end performance metrics like MS-SSIM for image reconstruction tasks. The proposed closed-form expression effectively captures the dependency of MS-SSIM on the number of quantized output bits. The adaptive power control and optimal power allocation schemes significantly improve energy efficiency and QoS in semantic communication systems. Simulation results validate the effectiveness and superiority of the proposed methods.
Implications
The ABG formula provides a theoretical foundation for analyzing and optimizing semantic communication systems, addressing the black-box nature of DL-based semantic encoders and decoders. The proposed power allocation schemes can be applied to enhance energy efficiency and QoS in future wireless networks, including 6G. This work paves the way for more efficient resource allocation and performance optimization in semantic communication scenarios, particularly in applications like digital twins and the metaverse.
View on arXiv

Multi-Domain Causal Discovery in Bijective Causal Models

Kasra Jalaldoust, Saber Salehkaleybar, Negar Kiyavash
  • Introduces bijective generation mechanisms (BGM) as a general framework for multi-domain causal discovery.
  • Allows for varying noise distributions across domains while maintaining invariant causal functions.
  • Proposes a statistical test to identify parent sets of target variables.
  • Generalizes existing models like additive noise models, LiNGAM, and location-scale noise models.
  • Demonstrates theoretical guarantees and empirical validation of the proposed method.
Read More
Abstract
This paper addresses the problem of causal discovery in multi-domain settings under the assumption of bijective generation mechanisms (BGM). The authors propose a novel framework that leverages the invariance of causal functions across domains while allowing the distribution of exogenous noise to vary. By assuming that the functional relationship between exogenous noise and endogenous variables is bijective and differentiable, the proposed method generalizes existing models such as additive noise models, LiNGAM, and location-scale noise models. The authors also introduce a statistical test to identify the parent set of a target variable. Theoretical analysis demonstrates that the causal graph can be uniquely identified under less restrictive assumptions compared to prior work. Experiments on synthetic and real-world datasets validate the effectiveness of the approach, showing improved performance in recovering causal structures.
Methodology
The authors assume a multi-domain setting where causal functions remain invariant, but noise distributions can vary. They leverage the bijective and differentiable nature of the functional relationship between exogenous noise and endogenous variables. A statistical test is derived to identify parent sets of target variables, enabling the recovery of the causal graph. The framework generalizes several existing causal discovery models and is validated through experiments on synthetic and real-world datasets.
Results
The proposed method successfully recovers causal graphs under less restrictive assumptions compared to prior approaches. Empirical results on synthetic and real-world datasets demonstrate the method's ability to handle varying noise distributions across domains while maintaining accurate causal discovery. The framework outperforms existing methods in terms of both theoretical guarantees and practical performance.
Implications
This work has significant implications for causal discovery in complex, multi-domain settings where noise distributions vary. It can be applied to fields such as economics, biology, and social sciences, where observational data is collected across diverse conditions. The generalization of existing models and the introduction of a robust statistical test make this approach a valuable tool for advancing causal inference research.
View on arXiv

Multi-level datasets training method in Physics-Informed Neural Networks

Yao-Hsuan Tsai, Hsiao-Tung Juan, Pao-Hsiung Chiu, Chao-An Lin
  • Proposes a multi-level datasets training method inspired by multi-grid methods and multi-scale neural networks to address spectral bias and gradient flow pathology in PINNs.
  • Utilizes varying datasets during training, combined with transfer learning techniques, to expose the model to different frequency scales and improve learning efficiency.
  • Demonstrates significant accuracy improvements (30%-60%) on benchmark problems, including high-frequency ODEs, convection-diffusion equations, and lid-driven cavity flows.
  • Applies different optimizers and schedulers to datasets to emphasize frequency diversity and enhance training outcomes.
  • Validates the framework's ability to solve complex PDEs, including high Reynolds number flows (Re = 5000), showcasing its robustness and applicability.
Read More
Abstract
This paper introduces a novel multi-level datasets training method for Physics-Informed Neural Networks (PINNs) to address challenges such as spectral bias and gradient flow pathology, which hinder the accuracy and convergence of PINNs when solving partial differential equations (PDEs) with high-frequency components or numerical stiffness. Inspired by the multi-grid methods in computational fluid dynamics (CFD) and multi-scale neural network architectures, the proposed approach involves training PINNs with datasets of varying frequency scales. This method leverages transfer learning techniques and applies different optimizers and schedulers to each dataset, enabling the model to effectively learn target functions across diverse frequency ranges. The framework is validated on three problems: a high-frequency ordinary differential equation (ODE), a 2D convection-diffusion equation, and the lid-driven cavity flow problem at various Reynolds numbers. The results demonstrate significant improvements in accuracy (30%-60%) and the ability to solve complex PDEs, including cases with high Reynolds numbers (Re = 5000). The study highlights the potential of this approach to enhance PINN performance without extensive hyperparameter tuning or structural modifications.
Methodology
The proposed method involves training PINNs with datasets of varying frequency scales, inspired by multi-grid methods and multi-scale neural networks. Transfer learning techniques are employed to sequentially train the model on datasets, with different optimizers and schedulers applied to emphasize frequency diversity. The approach is validated on three problems: a high-frequency ODE, a 2D convection-diffusion equation, and the lid-driven cavity flow problem, using a V-cycle training strategy.
Results
The proposed method achieved 30%-60% accuracy improvements compared to conventional single-dataset training methods. It successfully solved high-frequency PDEs and demonstrated robustness in handling complex problems, such as lid-driven cavity flows at high Reynolds numbers (Re = 5000). The framework also showed synergy with transfer learning techniques, further enhancing its efficacy for challenging problems.
Implications
The multi-level datasets training method has significant implications for improving the performance of PINNs in solving complex PDEs, particularly those with high-frequency components or numerical stiffness. It offers a computationally efficient alternative to extensive hyperparameter tuning and structural modifications, making it applicable to a wide range of physics and engineering problems, including fluid dynamics, electromagnetics, and biological systems.
View on arXiv

Multimodal Large Language Models for Medicine: A Comprehensive Survey

Jiarui Ye, Hao Tang
  • MLLMs extend the capabilities of LLMs by incorporating multimodal data, enabling applications in medical reporting, diagnosis, and treatment.
  • The survey reviews 330 papers and categorizes six mainstream data modalities used in medical MLLMs, along with their evaluation benchmarks.
  • MLLMs face challenges such as ensuring accuracy, minimizing hallucinations, and meeting professional standards for clinical use.
  • The paper highlights the evolutionary pathway of MLLMs, from early language models to advanced multimodal systems like GPT-4 and Med-Flamingo.
  • Proposed solutions to challenges include improving evaluation benchmarks and refining alignment modules for better multimodal integration.
Read More
Abstract
This paper provides a comprehensive survey of multimodal large language models (MLLMs) and their applications in the medical and healthcare domain. Building on the capabilities of large language models (LLMs), MLLMs integrate multiple data modalities such as text, images, audio, and omics to address complex tasks in medicine. The authors review 330 recent papers and categorize the applications of MLLMs into three primary areas: medical reporting, medical diagnosis, and medical treatment. They highlight the architecture of MLLMs, which typically consist of a core LLM, modality-specific encoders, and alignment modules to integrate multimodal data. The paper also discusses six mainstream data modalities and their corresponding evaluation benchmarks. Finally, the authors identify key challenges in deploying MLLMs in clinical settings, including issues of accuracy, hallucination, professionalism, and fairness, and propose strategies to address these challenges.
Methodology
The authors conducted a systematic review of 330 recent papers on MLLMs, focusing on their architecture, applications, and challenges in the medical domain. They analyzed the components of MLLMs, including LLMs, modality-specific encoders, and alignment modules, and categorized their applications into three main areas. Additionally, they examined evaluation benchmarks and identified key challenges and potential solutions.
Results
The survey demonstrates the significant potential of MLLMs in addressing complex medical tasks by leveraging multimodal data. It provides a detailed taxonomy of applications and benchmarks while outlining the challenges that need to be addressed for clinical adoption. The authors emphasize the importance of improving accuracy, reducing hallucinations, and ensuring fairness in MLLMs for healthcare.
Implications
The findings of this survey suggest that MLLMs could revolutionize medical practice by enabling more accurate and efficient medical reporting, diagnosis, and treatment. However, their successful deployment in clinical settings will require addressing challenges related to accuracy, professionalism, and fairness. The paper provides a roadmap for future research and development in this area, with potential implications for improving patient care and healthcare efficiency.
View on arXiv

NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models

Yi Zhou, Wenpeng Xing, Dezhang Kong, Changting Lin, Meng Han
  • NeuRel-Attack identifies and modifies neurons responsible for safety alignment in LLMs, enabling the generation of harmful outputs.
  • The method involves three steps: Neuron Activation Analysis, Similarity-Based Neuron Identification, and Neuron Relearning for Safety Removal.
  • Experimental results show that highly aligned models, such as Llama-2-7B-Chat-hf, can be de-aligned with minimal fine-tuning.
  • The study highlights a significant vulnerability in current safety alignment mechanisms for LLMs.
  • The authors emphasize the need for robust defenses against adversarial fine-tuning attacks and release their code and dataset for further research.
Read More
Abstract
This paper introduces NeuRel-Attack, a novel adversarial method designed to bypass safety alignment in large language models (LLMs). The approach identifies and modifies neurons responsible for enforcing safety constraints, enabling the generation of harmful or restricted outputs. NeuRel-Attack operates in three stages: (1) Neuron Activation Analysis to detect neurons critical for distinguishing harmful and harmless inputs, (2) Similarity-Based Neuron Identification to locate neurons associated with safety alignment using gradient and cosine similarity thresholds, and (3) Neuron Relearning for Safety Removal, where targeted fine-tuning is applied to disable safety mechanisms. Experimental results demonstrate that even highly aligned models, such as Llama-2-7B-Chat-hf, can be effectively de-aligned with minimal fine-tuning. The findings reveal a critical vulnerability in current alignment techniques and underscore the need for robust defenses against adversarial fine-tuning attacks. The authors release their code and dataset to facilitate further research in this area.
Methodology
NeuRel-Attack employs a three-step process: (1) Neuron Activation Analysis calculates activation patterns to identify neurons critical for distinguishing harmful and harmless inputs. (2) Similarity-Based Neuron Identification uses cosine similarity and gradient thresholds to locate neurons associated with safety alignment. (3) Neuron Relearning applies targeted fine-tuning, using techniques like Low-Rank Adaptation (LoRA), to disable safety constraints in the identified neurons.
Results
The proposed method successfully de-aligns highly aligned LLMs, such as Llama-2-7B-Chat-hf, with minimal fine-tuning. The experiments demonstrate a significant increase in the models' vulnerability to adversarial inputs, effectively bypassing safety mechanisms.
Implications
The findings reveal a critical vulnerability in current LLM safety alignment techniques, emphasizing the need for robust defenses against adversarial fine-tuning attacks. This work has implications for improving the security and ethical deployment of LLMs in real-world applications. Additionally, the release of the code and dataset provides a foundation for further research into adversarial attacks and countermeasures.
View on arXiv

On Advancements of the Forward-Forward Algorithm

Mauricio Ortiz Torres, Markus Lange, Arne P. Raulf
  • The Forward-Forward algorithm eliminates the need for backpropagation by training layers independently using a 'goodness' loss function.
  • Enhancements like convolutional channel grouping, learning rate schedules, and independent block structures reduce test error by 20%.
  • Lightweight FF models with fewer trainable parameters achieve competitive performance, making them suitable for low-power hardware.
  • The paper introduces variations in data generation and inference methods to address challenges in convolutional neural networks.
  • The FF algorithm's memory efficiency and flexibility make it a promising candidate for applications in resource-constrained environments.
Read More
Abstract
This paper explores advancements in the Forward-Forward (FF) algorithm, originally proposed by Geoffrey Hinton in 2022, which offers a memory-efficient alternative to backpropagation for training neural networks. The FF algorithm trains layers independently using a 'goodness' loss function and employs two forward passes (positive and negative) to adjust weights. The authors present enhancements to the algorithm, including convolutional channel grouping, learning rate schedules, and independent block structures, which collectively reduce test error by 20% on challenging datasets like CIFAR-10. Additionally, they propose lightweight FF models with significantly fewer trainable parameters (164,706 to 754,386) that achieve competitive test error rates (21±6%). These improvements make the FF algorithm more suitable for low-power hardware applications, such as those in aerospace projects, while maintaining flexibility and efficiency. The paper also discusses variations in data generation, loss functions, and inference schemes to address the algorithm's limitations and improve its performance.
Methodology
The authors improved the FF algorithm by incorporating convolutional channel grouping, learning rate schedules, and independent block structures during training. They also explored lightweight model architectures and evaluated their performance on datasets like CIFAR-10. Variations in data generation and inference methods were analyzed to address specific challenges in convolutional neural networks.
Results
The proposed improvements led to a 20% reduction in test error on CIFAR-10. Lightweight FF models achieved test error rates of 21±6% with trainable parameters ranging from 164,706 to 754,386, demonstrating their suitability for low-capacity hardware.
Implications
The advancements in the FF algorithm make it a viable alternative to backpropagation for training neural networks, particularly in resource-constrained environments like aerospace applications. Its memory efficiency and flexibility could enable faster and more efficient training and inference on low-power hardware, paving the way for broader adoption in edge computing and embedded systems.
View on arXiv

Orthogonal Factor-Based Biclustering Algorithm (BCBOF) for High-Dimensional Data and Its Application in Stock Trend Prediction

Yan Huang, Da-Qing Zhang
  • BCBOF addresses the sparsity and distance concentration issues in high-dimensional data by leveraging orthogonal factor construction and clustering in an orthogonal subspace.
  • The algorithm preserves local structural patterns, overcoming limitations of traditional linear dimensionality reduction methods like PCA.
  • BCBOF is applied to stock trend prediction, transforming biclustering results into fuzzy rules for a trading strategy system.
  • Experimental results show BCBOF outperforms other biclustering methods in accuracy and stability.
  • The fuzzy inference system based on BCBOF demonstrates higher returns in virtual trading experiments with historical stock data.
Read More
Abstract
This paper introduces the Orthogonal Factor-Based Biclustering Algorithm (BCBOF), a novel approach designed to address challenges in biclustering high-dimensional data. Traditional biclustering methods struggle with the sparsity and distance concentration issues inherent in high-dimensional spaces, as well as the disruption of local structural patterns caused by linear dimensionality reduction techniques. BCBOF mitigates these challenges by constructing orthogonal factors in the vector space of high-dimensional datasets, using these factors to perform clustering in an orthogonal subspace, and subsequently deriving biclustering results for the original data. The algorithm is applied to stock trend prediction, where biclustering results are transformed into fuzzy rules to create a fuzzy inference system for generating trading signals. The system incorporates profit-preserving and stop-loss rules to enhance trading strategies. Experimental evaluations demonstrate that BCBOF outperforms existing biclustering methods across multiple metrics and that the fuzzy inference system yields higher returns in virtual trading experiments using historical stock data.
Methodology
BCBOF constructs orthogonal factors in the vector space of high-dimensional data, performs clustering in the orthogonal subspace, and derives biclustering results for the original dataset. The algorithm is applied to stock trend prediction by transforming biclustering results into fuzzy rules, which are used in a fuzzy inference system for generating trading signals. The system incorporates profit-preserving and stop-loss rules to optimize trading strategies.
Results
BCBOF outperformed existing biclustering methods across multiple evaluation metrics, demonstrating improved accuracy and stability in high-dimensional data analysis. In virtual trading experiments using historical data from 10 A-share stocks, the fuzzy inference system based on BCBOF generated trading strategies that yielded higher returns compared to baseline methods.
Implications
The BCBOF algorithm has significant implications for high-dimensional data analysis, particularly in fields like finance, where capturing local patterns and trends is critical. Its application to stock trend prediction demonstrates its potential to enhance trading strategies and improve investment returns. Beyond finance, BCBOF could be adapted for other domains requiring robust biclustering in high-dimensional spaces, such as genomics, image processing, and recommendation systems.
View on arXiv

Q-function Decomposition with Intervention Semantics with Factored Action Spaces

Junkyu Lee, Tian Gao, Elliot Nelson, Miao Liu, Debarun Bhattacharjya, Songtao Lu
  • Introduces a Q-function decomposition framework based on causal intervention semantics for factored action spaces.
  • Establishes theoretical guarantees for unbiased Q-function decomposition under the no unobserved confounder assumption.
  • Proposes the action decomposed reinforcement learning scheme, which integrates the decomposition into model-free RL algorithms.
  • Demonstrates improved sample efficiency in online continuous control tasks and offline healthcare applications.
  • Highlights the scalability of the approach for environments with large combinatorial action spaces.
Read More
Abstract
This paper addresses the challenge of improving sample efficiency in reinforcement learning (RL) environments with large, discrete, factored action spaces. The authors propose a novel framework for Q-function decomposition that leverages causal intervention semantics to exploit the modular structure of factored action spaces. By projecting Q-functions into lower-dimensional subspaces, the approach avoids the computational burden of enumerating all action combinations. The paper establishes theoretical conditions for unbiased Q-function decomposition using causal effect estimation under the assumption of no unobserved confounders. The authors introduce a practical algorithm, termed action decomposed reinforcement learning, which integrates this decomposition into standard model-free RL methods. Experimental results demonstrate that the proposed method improves sample efficiency in both online continuous control tasks and a real-world offline healthcare application involving sepsis treatment, outperforming state-of-the-art baselines.
Methodology
The authors leverage causal inference principles to decompose Q-functions into lower-dimensional subspaces, using intervention semantics to model the effects of actions on state transitions and rewards. They develop a theoretical framework for unbiased Q-function decomposition and implement it in a practical algorithm that augments standard RL methods like Deep Q-Networks (DQN) and Batch-Constrained Q-Learning (BCQ). The approach is validated through experiments in both simulated continuous control environments and a real-world offline healthcare dataset (MIMIC-III).
Results
The proposed method achieves significant improvements in sample efficiency compared to state-of-the-art baselines. In online continuous control tasks, it reduces the number of samples required to achieve comparable performance. In the offline sepsis treatment environment, the method demonstrates better policy learning from limited data, showcasing its practical utility in real-world applications.
Implications
This work has potential applications in domains where RL is applied to environments with large, structured action spaces, such as robotics, healthcare, and multi-agent systems. The integration of causal inference principles into RL could inspire further research on leveraging domain knowledge to improve sample efficiency and scalability in complex decision-making tasks.
View on arXiv

R^2VFL: A Robust Random Vector Functional Link Network with Huber-Weighted Framework

Anuradha Kumari, Mushir Akhtar, P. N. Suganthan, M. Tanveer
  • Introduces R^2VFL, a robust RVFL variant incorporating the Huber weighting function and class probability to handle noise and outliers.
  • Proposes two methods for computing class centers: average-based (R^2VFL-A) and median-based (R^2VFL-M), enhancing flexibility and robustness.
  • Extensively evaluates the models on 47 UCI datasets, demonstrating superior performance over existing RVFL variants.
  • Highlights the practical applicability of the models in biomedical domains, particularly in EEG signal classification.
  • Employs rigorous statistical testing to validate the robustness and adaptability of the proposed framework.
Read More
Abstract
This paper introduces R^2VFL, a novel enhancement to the Random Vector Functional Link (RVFL) neural network, designed to improve robustness against noise and outliers in data. RVFL is a randomized neural network known for its efficiency and simplicity, but its performance can degrade in the presence of noisy or mislabeled data. To address these challenges, the authors propose incorporating the Huber weighting function and class probability into the RVFL framework. The Huber weighting function reduces the influence of outliers, while the class probability mechanism assigns lower weights to noisy data points. The paper also explores two methods for calculating class centers: (1) the average of all data points in a class and (2) the median of each feature, leading to two variants of the model, R^2VFL-A and R^2VFL-M. Extensive experiments on 47 UCI datasets, including binary and multiclass problems, demonstrate the superior performance of the proposed models compared to existing RVFL variants. Additionally, the models show strong performance in classifying EEG signals, highlighting their potential for real-world biomedical applications.
Methodology
The proposed R^2VFL framework integrates the Huber weighting function to reduce the influence of outliers and incorporates class probability to address class noise. Two approaches for determining class centers are explored: averaging all data points in a class and using the median of each feature. These methods result in two model variants, R^2VFL-A and R^2VFL-M. The models are evaluated using 47 UCI datasets, covering both binary and multiclass classification tasks, and statistical tests are conducted to confirm their effectiveness.
Results
The R^2VFL models outperform existing RVFL variants across 47 UCI datasets, achieving higher accuracy and robustness in the presence of noise and outliers. The median-based variant (R^2VFL-M) demonstrates particular strength in handling extreme values. The models also excel in EEG signal classification, showcasing their practical utility in biomedical applications.
Implications
The proposed R^2VFL framework has significant implications for robust machine learning in noisy and outlier-prone environments. Its strong performance in EEG signal classification suggests potential applications in healthcare, particularly in diagnosing neurological conditions. The framework's adaptability to various data distributions makes it a valuable tool for a wide range of real-world machine learning tasks.
View on arXiv

Recursive KL Divergence Optimization: A Dynamic Framework for Representation Learning

Anthony D Martin
  • RKDO reframes representation learning as a recursive process, aligning evolving conditional distributions rather than static ones.
  • The framework generalizes contrastive, clustering, and dimensionality reduction methods as static cases of a more dynamic process.
  • RKDO achieves approximately 30% lower loss values compared to static methods like I-Con.
  • The recursive updating mechanism reduces computational resource requirements by 60-80%, enabling faster convergence.
  • Theoretical contributions include a proof of linear-rate convergence and analysis of trade-offs between optimization efficiency and generalization.
Read More
Abstract
This paper introduces Recursive KL Divergence Optimization (RKDO), a novel framework for representation learning that generalizes existing methods by framing them as recursive divergence alignment processes over evolving conditional distributions. Unlike static approaches such as Information Contrastive Learning (I-Con), which align fixed neighborhood distributions, RKDO models the temporal dynamics of representation learning by recursively updating both supervisory and learned distributions. This dynamic approach captures the evolution of neighborhood structures during training, offering significant improvements in optimization efficiency and computational resource usage. The authors provide a theoretical foundation for RKDO, including a proof of linear-rate convergence, and demonstrate its empirical advantages across multiple datasets. RKDO achieves approximately 30% lower loss values compared to static methods and reduces computational requirements by 60-80%, making it particularly suitable for resource-constrained applications.
Methodology
RKDO employs recursive updates to align supervisory and learned conditional distributions. The supervisory distribution is updated using an exponential moving average (EMA) of the learned distribution, while the learned distribution is updated based on embeddings and a time-dependent temperature parameter. This recursive process captures the temporal dynamics of representation learning, with the entire response field evolving iteratively. The framework was implemented and compared against the static I-Con approach across multiple datasets.
Results
RKDO demonstrated approximately 30% lower loss values compared to I-Con across all tested datasets. Additionally, it required 60-80% fewer computational resources (e.g., training epochs) to achieve comparable results, highlighting its efficiency in both optimization and resource usage.
Implications
The RKDO framework has significant implications for resource-constrained applications, such as edge computing and large-scale machine learning tasks. Its ability to achieve faster convergence with lower computational costs makes it a promising approach for efficient representation learning in scenarios where computational resources are limited. Additionally, its dynamic perspective on representation learning could inspire new methods that leverage temporal dynamics for improved performance.
View on arXiv

SMOGAN: Synthetic Minority Oversampling with GAN Refinement for Imbalanced Regression

Shayan Alahyari, Mike Domaratzki
  • SMOGAN introduces a two-step framework for addressing imbalanced regression by combining traditional oversampling with GAN-based refinement.
  • The DistGAN component refines synthetic samples using adversarial loss and MMD to ensure alignment with the true data distribution.
  • SMOGAN is modular, allowing users to integrate any initial oversampling method in the first stage.
  • Extensive experiments on 23 datasets show that SMOGAN outperforms existing methods in handling imbalanced regression.
  • The approach is applicable across various domains, including economics, meteorology, and fault diagnosis.
Read More
Abstract
This paper addresses the challenge of imbalanced regression, where the target variable is skewed, leading to poor model performance in underrepresented regions. The authors propose SMOGAN, a two-step framework that combines traditional oversampling techniques with a novel GAN-based refinement process to generate realistic synthetic samples in sparse target regions. In the first stage, an existing oversampling method (e.g., SMOGN) generates initial synthetic samples. In the second stage, these samples are refined using DistGAN, a distribution-aware generative adversarial network. DistGAN employs an adversarial loss augmented with a Maximum Mean Discrepancy (MMD) objective to align the synthetic samples with the true joint feature–target distribution. Experiments on 23 imbalanced regression datasets demonstrate that SMOGAN consistently outperforms baseline oversampling methods, highlighting its effectiveness and generalizability across diverse domains.
Methodology
SMOGAN operates in two stages. In Stage 1, an existing oversampling method (e.g., SMOGN) generates synthetic samples in sparse target regions. In Stage 2, these samples are refined using DistGAN, which consists of a generator and discriminator. The generator optimizes a combined adversarial loss and MMD objective to minimize distribution discrepancies, while the discriminator filters out unrealistic samples by distinguishing between real and synthetic minority samples.
Results
SMOGAN consistently outperformed baseline oversampling methods across 23 benchmark datasets. The inclusion of the DistGAN refinement layer significantly improved the quality of synthetic samples, leading to better model performance in underrepresented target regions. The results demonstrate the framework's effectiveness and generalizability across diverse application domains.
Implications
SMOGAN has significant implications for domains where accurate prediction of rare events is critical, such as financial downturns, extreme weather forecasting, and fault diagnosis. Its modular design allows for easy integration with domain-specific oversampling techniques, making it a versatile tool for addressing imbalanced regression challenges.
View on arXiv

Sparse-to-Sparse Training of Diffusion Models

InĂŞs Cardoso Oliveira, Decebal Constantin Mocanu, Luis A. Leiva
  • Introduces sparse-to-sparse training to diffusion models, reducing computational costs during both training and inference.
  • Proposes three sparsity strategies: Static-DM (static sparsity), RigL-DM, and MagRan-DM (dynamic sparsity).
  • Sparse DMs achieve comparable or better performance than dense models while reducing parameters and FLOPs.
  • Dynamic sparse training with 25–50% sparsity levels is most effective, with conservative prune/regrowth ratios yielding better results for higher sparsity.
  • The approach is validated on two state-of-the-art DMs (Latent Diffusion and ChiroDiff) across six datasets.
Read More
Abstract
This paper introduces the concept of sparse-to-sparse training to diffusion models (DMs) for the first time, aiming to improve both training and inference efficiency. Diffusion models are powerful generative models but are computationally expensive to train and deploy. The authors propose three methods—Static-DM, RigL-DM, and MagRan-DM—that integrate static and dynamic sparsity strategies into two state-of-the-art DMs: Latent Diffusion (for image generation) and ChiroDiff (for sketch generation). Experiments conducted on six datasets demonstrate that sparse DMs can achieve comparable or superior performance to dense counterparts while significantly reducing the number of trainable parameters and floating-point operations (FLOPs). The paper also identifies optimal sparsity levels and pruning/regrowth strategies for effective sparse-to-sparse training.
Methodology
The authors implemented sparse-to-sparse training using three sparsity strategies: Static-DM (fixed sparsity throughout training), RigL-DM (dynamic sparsity with periodic pruning and regrowth), and MagRan-DM (dynamic sparsity with magnitude-based pruning and random regrowth). These methods were applied to two state-of-the-art diffusion models—Latent Diffusion for continuous image data and ChiroDiff for discrete spatiotemporal sequence data. Experiments were conducted on six datasets to evaluate the impact of sparsity on model performance, parameter count, and computational efficiency.
Results
Sparse diffusion models trained using the proposed methods matched or outperformed their dense counterparts in terms of generative quality. The sparse models achieved significant reductions in trainable parameters and FLOPs, with dynamic sparsity strategies (RigL-DM and MagRan-DM) performing particularly well at 25–50% sparsity levels. Conservative prune/regrowth ratios (e.g., 0.05) were found to be effective for higher sparsity levels.
Implications
The proposed sparse-to-sparse training paradigm has the potential to make diffusion models more computationally efficient, reducing their environmental impact and enabling broader adoption in resource-constrained settings. This approach could also inspire further research into sparse training techniques for other types of generative models and tasks beyond image and sketch generation.
View on arXiv

Stable Trajectory Clustering: An Efficient Split and Merge Algorithm

Atieh Rahmani, Mansoor Davoodi, Justin M. Calabrese
  • Introduces stable trajectory clustering to handle transient anomalies in trajectory data.
  • Proposes whole-trajectory and sub-trajectory clustering algorithms based on DBSCAN line segment clustering.
  • Uses mean absolute deviation to distinguish significant outliers from temporary deviations.
  • Demonstrates the effectiveness of the algorithms on real-world trajectory datasets.
  • Improves cluster stability and interpretability by focusing on persistent movement patterns.
Read More
Abstract
This paper introduces a novel approach to trajectory clustering that addresses the challenge of transient anomalies in movement data. The authors propose two algorithms: whole-trajectory clustering and sub-trajectory clustering, both based on DBSCAN line segment clustering. The key innovation is the introduction of a stable trajectory clustering algorithm that uses the mean absolute deviation to filter out insignificant anomalies, preserving the integrity and stability of clusters. This approach improves the interpretability of clustering results by focusing on persistent patterns rather than temporary deviations. The methodology is applied to real-world trajectory datasets, demonstrating the algorithm's effectiveness and sensitivity to parameter variations. The proposed framework is particularly useful for applications such as urban traffic analysis, animal movement studies, and air traffic categorization, where transient anomalies often obscure meaningful clustering patterns.
Methodology
The authors build on DBSCAN line segment clustering to define split and merge procedures for trajectory data. Whole-trajectory clustering considers entire movement histories, while sub-trajectory clustering uses a sliding window model to identify similar sub-sequences. The stable trajectory clustering algorithm employs the mean absolute deviation to filter out transient anomalies, ensuring that clusters reflect consistent patterns across time intervals.
Results
The proposed algorithms were tested on real trajectory datasets, showing improved stability and interpretability of clusters. The stable trajectory clustering algorithm effectively reduced the impact of transient anomalies, leading to more consistent clustering results. The sensitivity analysis demonstrated the robustness of the algorithms to parameter variations.
Implications
This work has significant implications for fields that analyze movement data, such as urban traffic analysis, animal behavior studies, and air traffic management. By filtering out transient anomalies, the stable trajectory clustering algorithm enables more accurate identification of persistent movement patterns, improving decision-making and insights in these domains.
View on arXiv

Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning

Sangyeon Cho, Jangyeong Jeon, Mingi Kim, Junyeong Kim
  • Introduces Synergy-CLIP, a tri-modal extension of the CLIP framework for integrating visual, textual, and audio data equally.
  • Proposes VGG-sound+, a balanced tri-modal dataset designed to facilitate tri-modal representation learning.
  • Introduces the Missing Modality Reconstruction (MMR) task as a novel evaluation metric for multi-modal learning.
  • Demonstrates superior performance of Synergy-CLIP in downstream tasks like zero-shot classification compared to existing baselines.
  • Highlights the potential for advancing human-like cognitive capabilities in AI through robust multi-modal integration.
Read More
Abstract
This paper introduces Synergy-CLIP, a novel framework that extends the Contrastive Language-Image Pre-training (CLIP) model to integrate and align tri-modal data—visual, textual, and audio modalities—equally. The study addresses the limitations of existing multi-modal learning approaches, which predominantly focus on bimodal interactions, by proposing a balanced tri-modal dataset called VGG-sound+. This dataset augments the VGG-sound dataset with textual descriptions to provide equal-scale representation across all three modalities. Synergy-CLIP is validated through various downstream tasks, including zero-shot classification and a novel Missing Modality Reconstruction (MMR) task, which evaluates the model's ability to reconstruct missing modalities. Experimental results demonstrate that Synergy-CLIP outperforms existing baselines in certain tasks, showcasing its potential for robust multi-modal representation learning. The proposed framework and dataset aim to advance the field by enabling models to process and integrate diverse information in a human-like manner.
Methodology
The authors extend the CLIP framework to align and capture latent information across three modalities (visual, textual, and audio) equally. They introduce VGG-sound+, a tri-modal dataset with balanced representation, and train Synergy-CLIP on this dataset. The model's performance is validated through downstream tasks, including zero-shot classification and the novel Missing Modality Reconstruction (MMR) task, which evaluates its ability to reconstruct missing modalities using learned high-dimensional representations.
Results
Synergy-CLIP outperforms existing baselines in downstream tasks, particularly in zero-shot classification. The model also demonstrates strong performance in the MMR task, showcasing its ability to effectively capture synergies between modalities and reconstruct missing information. These results validate the robustness and effectiveness of the proposed framework for multi-modal representation learning.
Implications
The research paves the way for more sophisticated multi-modal AI systems capable of integrating diverse data types in a human-like manner. The introduction of the VGG-sound+ dataset and the MMR task provides valuable resources and evaluation metrics for future research. Potential applications include enhanced multi-modal systems for tasks like video understanding, audio-visual scene analysis, and human-computer interaction.
View on arXiv

TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts

Pradip Kunwar, Minh N. Vu, Maanak Gupta, Mahmoud Abdelsalam, Manish Bhattarai
  • TT-LoRA MoE integrates tensorized low-rank adapters with sparse MoE routing to improve scalability and efficiency in LLM fine-tuning.
  • The framework decouples expert training from routing, preventing inter-task interference and catastrophic forgetting.
  • TT-LoRA experts are highly compressed, reducing parameter counts by over 98% compared to standard LoRA while maintaining performance.
  • A lightweight, task-agnostic routing mechanism dynamically selects the appropriate expert for each input without manual intervention.
  • The method achieves state-of-the-art results in multi-task learning while significantly reducing memory and computational overhead.
Read More
Abstract
The paper introduces TT-LoRA MoE, a novel framework that combines Parameter-Efficient Fine-Tuning (PEFT) with sparse Mixture-of-Experts (MoE) to address scalability and efficiency challenges in large language model (LLM) deployments. The framework operates in two stages: (1) independent training of task-specific, tensorized low-rank adapters (TT-LoRA experts) and (2) dynamic routing using a lightweight, noisy top-1 gating mechanism. This decoupling of expert training and routing eliminates inter-task interference and catastrophic forgetting, while enabling efficient multi-task adaptation. TT-LoRA MoE achieves significant reductions in trainable parameters (e.g., 2% of LoRA parameters) and outperforms existing methods like AdapterFusion in multi-task settings. The approach is validated through extensive experiments, demonstrating state-of-the-art performance, scalability, and memory efficiency.
Methodology
The proposed TT-LoRA MoE framework consists of two stages: (1) independent training of TT-LoRA adapters for each task using tensor-train decomposition to compress parameters, and (2) dynamic routing via a noisy top-1 gating mechanism that selects one expert per input based on base model representations. This decoupling allows for efficient and scalable multi-task adaptation without joint training of experts and routers.
Results
TT-LoRA MoE achieves a 98% reduction in trainable parameters compared to standard LoRA, using only 2% of LoRA parameters for experts and 0.03% of AdapterFusion parameters for routing. It outperforms AdapterFusion by 4 points in multi-tasking scenarios and demonstrates robust task-level optimization. The framework also enables scalable inference with large expert pools while maintaining memory and computational efficiency.
Implications
The TT-LoRA MoE framework has significant implications for deploying large language models in multi-task and dynamic environments. Its parameter efficiency and scalability make it practical for real-world applications, such as natural language processing tasks, where computational resources are limited. Additionally, the decoupling of expert training and routing could inspire future research in modular and adaptive AI systems.
View on arXiv

Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization

Shuai Gong, Chaoran Cui, Xiaolin Dong, Xiushan Nie, Lei Zhu, Xiaojun Chang
  • TRIP introduces token-level routing for prompt mixtures, enabling fine-grained adaptation within images.
  • The framework uses a parameter-free routing mechanism based on token clustering and optimal transport, reducing communication costs.
  • TRIP leverages the zero-shot generalization capability of vision-language models to train unbiased prompt experts.
  • The method achieves state-of-the-art performance on four benchmarks for federated domain generalization.
  • TRIP requires only 1K parameters per communication round, making it highly efficient for federated learning settings.
Read More
Abstract
This paper addresses the challenge of federated domain generalization (FedDG), which aims to train a globally generalizable model across decentralized clients with heterogeneous data while preserving privacy. Existing approaches often rely on a single global prompt or parameterized routing mechanisms, which either fail to capture personalized characteristics or incur high communication costs. The authors propose TRIP (Token-level pRompt mIxture with Parameter-free routing), a novel framework that treats multiple prompts as distinct experts and assigns image tokens to these experts using a parameter-free routing mechanism based on token clustering and optimal transport. This fine-grained token-level routing enables instance-specific adaptation, improving generalization performance. TRIP also incorporates an unbiased learning strategy for prompt experts, leveraging the zero-shot generalization capability of vision-language models (VLMs). Extensive experiments on four benchmarks demonstrate that TRIP achieves state-of-the-art generalization performance while significantly reducing communication overhead, requiring only 1K parameters per round.
Methodology
TRIP employs a mixture of experts (MoE) framework where multiple prompts act as experts. Instead of routing entire images, it assigns individual image tokens to specific experts using a parameter-free routing mechanism based on token clustering and optimal transport. The instance-specific prompt is synthesized by aggregating experts weighted by the number of tokens assigned to each. Additionally, an unbiased learning strategy is used to train prompt experts, leveraging the zero-shot generalization capability of vision-language models.
Results
TRIP achieves state-of-the-art generalization performance across four benchmarks for federated domain generalization. It significantly reduces communication overhead, requiring only 1K parameters per round, compared to existing methods that rely on parameterized routing networks. The fine-grained token-level routing improves the model's ability to adapt to diverse and heterogeneous data distributions.
Implications
The proposed TRIP framework has significant implications for privacy-preserving machine learning in decentralized settings, such as healthcare, finance, and edge computing. Its ability to generalize across heterogeneous data domains while maintaining low communication costs makes it a practical solution for real-world federated learning applications. Additionally, the token-level routing mechanism could inspire further research in fine-grained adaptation for other machine learning tasks.
View on arXiv

Towards proactive self-adaptive AI for non-stationary environments with dataset shifts

David Fernández Narro, Pablo Ferri, Juan M. García-Gómez, Carlos Sáez
  • Introduces a proactive self-adaptive AI framework to handle dataset shifts in non-stationary environments.
  • Uses Functional Data Analysis (FDA) with polynomial spline bases to model and forecast temporal changes in AI parameters.
  • Validated on both a simulated dataset with controlled shifts and a real-world COVID-19 dataset from Mexico.
  • Demonstrates improved performance and resilience compared to baseline models without requiring updated labeled data.
  • Highlights the approach's compatibility with data protection requirements, making it suitable for healthcare applications.
Read More
Abstract
This paper addresses the challenge of maintaining AI model performance in non-stationary environments, particularly in medical settings where dataset shifts—such as prior probability, covariate, and concept shifts—are common. The authors propose a proactive self-adaptive AI framework, termed 'pro-adaptive AI,' which models the temporal trajectories of AI parameters to forecast short-term changes and adapt to dataset shifts without requiring new labeled data. The approach leverages Functional Data Analysis (FDA) with polynomial spline bases to model and predict parameter dynamics over time. The methodology is validated using both a controlled simulated dataset and a real-world COVID-19 dataset from Mexico, spanning 2020 to 2024. Results demonstrate that the proposed approach improves model resilience and performance under shifting conditions compared to baseline models trained at fixed time intervals. This work provides a foundation for developing resilient AI systems in dynamic, real-world environments, particularly in healthcare, where timely and accurate decision-making is critical.
Methodology
The authors employed Functional Data Analysis (FDA) with polynomial spline bases to model the temporal trajectories of AI parameters. They validated the approach using two datasets: a simulated dataset with controlled shifts (prior probability, covariate, and concept shifts) and a real-world COVID-19 dataset from Mexico. Data preprocessing included stratified bootstrapping, temporal batching, and splitting into training, validation, and test sets. Logistic regression models were used to evaluate the framework's ability to adapt to dataset shifts.
Results
The proposed pro-adaptive AI approach outperformed baseline models trained at fixed time intervals in both simulated and real-world datasets. It demonstrated enhanced resilience and performance in the face of gradual and abrupt dataset shifts without requiring updated labeled data. The results suggest that the method can anticipate and adapt to temporal variability effectively, improving decision-making accuracy in dynamic environments.
Implications
This work has significant implications for deploying AI in non-stationary environments, particularly in healthcare. The proposed framework enables AI systems to maintain performance and resilience over time, even in the absence of new labeled data. This is especially valuable in clinical settings, where dataset shifts can compromise patient safety. The approach's compatibility with data protection regulations further enhances its applicability in sensitive domains like medicine.
View on arXiv

Unsupervised Feature Transformation via In-context Generation, Generator-critic LLM Agents, and Duet-play Teaming

Nanxu Gong, Xinyuan Wang, Wangyang Ying, Haoyue Bai, Sixun Dong, Haifeng Chen, Yanjie Fu
  • Introduces a generator-critic duet-play teaming framework for unsupervised feature transformation (EUFT).
  • Leverages LLMs' in-context learning to derive pseudo-supervision from unlabeled data.
  • Proposes a three-step process: data diagnosis by a critic agent, feature generation by a generator agent, and iterative refinement through feedback.
  • Outperforms supervised baselines in efficiency, robustness, and applicability across various datasets.
  • Framework is extensible to human-agent collaboration by replacing the critic agent with human experts.
Read More
Abstract
This paper introduces a novel framework for unsupervised feature transformation (EUFT) using a generator-critic duet-play teaming approach with large language model (LLM) agents. Feature transformation is critical in domains with high-dimensional data and limited labeled samples, such as material performance screening. The proposed framework leverages LLMs' in-context learning capabilities to derive pseudo-supervision from unlabeled data. The method involves three interconnected steps: (1) a critic agent diagnoses the data and provides actionable advice to improve feature spaces, (2) a generator agent tokenizes features and generates transformed feature sets based on the critic's advice, and (3) an iterative refinement process ensures continuous improvement through feedback between the two agents. The framework can also be extended to human-agent collaboration by replacing the critic agent with human experts. Experimental results demonstrate that the proposed approach outperforms supervised baselines in efficiency, robustness, and practical applicability across diverse datasets, showcasing its potential for real-world applications.
Methodology
The framework uses LLM agents in a generator-critic teaming setup. The critic agent diagnoses semantic relationships and data distributions to provide textual advice, which acts as pseudo-gradients for optimization. The generator agent tokenizes features and transformations, using in-context learning to produce transformed feature sets based on the critic's advice. An iterative refinement loop ensures continuous improvement through feedback between the agents. The approach avoids exhaustive search in large feature spaces by leveraging LLMs' pattern recognition capabilities.
Results
The proposed framework demonstrated superior performance compared to supervised baselines in terms of feature transformation efficiency, robustness, and practical applicability. It effectively generated meaningful feature transformations without requiring labeled data, highlighting its utility in unsupervised settings. Extensive experiments validated its effectiveness across diverse datasets.
Implications
This framework has significant implications for domains where labeled data is scarce or expensive to obtain, such as material synthesis, bioinformatics, and other scientific fields. By enabling efficient and unsupervised feature transformation, it can accelerate data readiness and improve the utility of AI models in these areas. Additionally, its extensibility to human-agent collaboration opens new possibilities for integrating domain expertise with AI-driven feature engineering.
View on arXiv

Whispers of Data: Unveiling Label Distributions in Federated Learning Through Virtual Client Simulation

Zhixuan Ma, Haichang Gao, Junxiang Huang, Ping Wang
  • Introduces a stable and adaptable label distribution inference attack using virtual client simulation.
  • Estimates the victim client’s dataset size and simulates their behavior under various data distributions.
  • Utilizes temporal generalization performance to train a time-series-based attack model.
  • Demonstrates effectiveness across multiple datasets, outperforming existing methods.
  • The attack remains robust even under differential privacy defense mechanisms.
Read More
Abstract
This paper addresses the challenge of label distribution inference attacks in federated learning (FL), a privacy-preserving machine learning paradigm. Existing methods for inferring label distributions often struggle with sensitivity to client-specific settings and are ineffective against defense mechanisms like differential privacy. The authors propose a novel attack method that leverages virtual client simulation and temporal generalization analysis to infer the label distribution of a target client. By estimating the size of the victim client’s dataset, the method constructs virtual clients that mimic the target client’s behavior under various data distributions. Temporal generalization performance, which reflects how well a model generalizes over time for different labels, is used as a signal to train a time-series-based attack model. The proposed approach is validated on multiple datasets (MNIST, Fashion-MNIST, FER2013, and AG-News) and demonstrates superior performance compared to state-of-the-art methods. Notably, the attack remains effective even under differential privacy defenses, highlighting its robustness and potential real-world applicability.
Methodology
The authors estimate the target client’s dataset size and construct virtual clients that simulate the target’s behavior under different data distributions (IID and non-IID). Temporal generalization performance, which measures how well a model generalizes over time for each label, is quantified. This data is used to train a time-series-based attack model to predict the label distribution proportions of the target client. Noise is introduced in virtual clients to mitigate the impact of differential privacy defenses.
Results
The proposed method outperforms state-of-the-art label distribution inference attacks across four datasets (MNIST, Fashion-MNIST, FER2013, and AG-News). It achieves higher accuracy and stability in inferring label distributions. Additionally, the attack remains effective even when differential privacy mechanisms are employed, demonstrating its robustness against common defense strategies.
Implications
This work highlights a significant vulnerability in federated learning systems, particularly in privacy-sensitive domains like healthcare and finance. The proposed attack method could be used to expose individual or institutional preferences, posing risks to collective privacy and fairness. It underscores the need for stronger defense mechanisms in federated learning to safeguard against such inference attacks.
View on arXiv

xEEGNet: Towards Explainable AI in EEG Dementia Classification

Andrea Zanola, Louis Fabrice Tshimanga, Federico Del Pup, Marco Baiesi, Manfredo Atzori
  • xEEGNet is a compact and interpretable neural network with only 168 parameters, designed for EEG-based dementia classification.
  • The model achieves comparable performance to larger architectures while significantly reducing overfitting and variability across data splits.
  • xEEGNet provides clinical interpretability by analyzing learned kernels and weights, which correspond to specific EEG spectral bands and topographies.
  • The study uses a robust Nested-Leave-N-Subjects-Out cross-validation strategy to ensure unbiased performance evaluation.
  • The work demonstrates the feasibility of small, explainable models in medical AI, challenging the trend of prioritizing large, opaque architectures.
Read More
Abstract
This paper introduces xEEGNet, a novel, compact, and explainable neural network designed for the analysis of EEG data, with a focus on dementia classification (Alzheimer's and frontotemporal dementia versus controls). The model is built upon ShallowNet, a lightweight architecture from the EEGNet family, and is progressively modified to enhance interpretability while maintaining competitive performance. xEEGNet achieves this by significantly reducing the number of parameters (168, compared to 200 times more in ShallowNet) and incorporating features that allow for clinical interpretability of learned kernels and weights. The model's performance is evaluated using a robust Nested-Leave-N-Subjects-Out cross-validation strategy, ensuring unbiased estimates. The study demonstrates that xEEGNet achieves comparable accuracy to larger models, resists overfitting, and reduces variability across data splits, all while providing insights into the spectral and spatial features relevant for dementia classification. This work highlights the potential of small, interpretable architectures in medical AI, emphasizing the importance of explainability in clinical applications.
Methodology
The authors start with ShallowNet, a lightweight EEGNet-family model, and progressively modify its architecture to enhance interpretability. They analyze the learned kernels and weights to ensure clinical relevance and evaluate the model using a Nested-Leave-N-Subjects-Out cross-validation strategy. The dataset used includes EEG recordings from healthy controls and patients with Alzheimer's and frontotemporal dementia, sourced from the OpenNeuro platform. Performance variability is analyzed using embedded EEG representations, and overfitting is assessed through training-validation loss correlation and training speed.
Results
xEEGNet achieves comparable median performance to ShallowNet (within -1.5%) while using 200 times fewer parameters. It resists overfitting, reduces performance variability across data splits, and provides interpretable insights into EEG spectral bands and topographies relevant for dementia classification. Higher accuracy correlates with greater separability between test-set controls and Alzheimer's cases, independent of training data variability.
Implications
xEEGNet demonstrates that small, interpretable neural networks can achieve competitive performance in medical AI tasks, offering a viable alternative to large, opaque models. Its explainability and compactness make it particularly suitable for clinical applications, where trust and transparency are critical. The model's ability to generalize to other neurological conditions involving spectral EEG alterations suggests broader applicability beyond dementia classification.
View on arXiv