Duy H. M. Nguyen

Universitätsstraße 32

70569 Stuttgart, Germany

Room: 2.321

I am currently a final-year Ph.D. Candidate under the supervision of Prof. Mathias Niepert at Max Planck Research School for Intelligent Systems (IMPRS-IS) and University of Stuttgart. I have also been a Researcher at the German Research Center for Artificial Intelligence (DFKI) since 2021.

My topics of interest are

Hybrid Discrete-Continuous Learning (differentiable relaxations for discrete intermediate representations)
Scalable Algorithms for Multi-modal Learning with applications for low-resouce domains such as Healthcare, Simulation Science, or Robotic Manipulation.
Efficient Deep Learning (model compression, accelerated training/inference, etc.)

Please visit my Google Scholar for a full list of publications and GitHub for source codes.

news

Feb 01, 2026	🔔 Happy to share our latest work Slot-VLA, accepted at ICRA 2026 in Vienna, Austria 🇦🇹 🎉. We show that object–relation–centric slot representations enable compact, interpretable, and efficient multi-task robotic manipulation, drastically reducing token complexity while maintaining strong performance and generalization.
Jan 26, 2026	🔔 Our work, namely FACET, which introduces scalable structure-aware and fragment-level modeling via Graph Transformers for molecular learning, has been accepted to ICLR 2026 🇧🇷!
Jan 06, 2026	🏆 The DuFal paper has been accepted at Transactions on Machine Learning Research (TMLR) 2026 and be awarded with a J2C Certification. We will present the paper at the International Conference on Machine Learning (ICML) in July 2026, South Korea 🇰🇷.
Dec 04, 2025	🔔 Our new work, Dual-Frequency-Aware Learning for High-Fidelity Extremely Sparse-View CBCT Reconstruction, has been accepted (with minor revision) to Transactions on Machine Learning Research (TMLR) 2025. Check it out (here)!
Nov 08, 2025	🔔 Exciting News! We’re thrilled to share that our two recent works have been accepted to AAAI 2026 in Singapore 🇸🇬 — one as an oral and the other as a poster presentation! 🎉 i. Multi-Mood — a multi-modal large language model that integrates video, audio, and text with psychological criteria through reinforcement learning to enable trustworthy and emotionally aligned responses. ii. LIBERO-Mem — a non-Markovian task suite for short- and long-horizon object tracking and manipulation, featuring temporally sequenced subgoals that challenge models to reason beyond the current observation. 📄 Codes will be released soon 🎉 — stay tuned!
Sep 26, 2025	🔔 Excited to share that our works on (i) ExGra-Med — a data-efficient multimodal large language model (LLM) for healthcare; (ii) Token Redundancy in 3D Point Cloud Transformers — uncovering how existing 3D transformers (e.g., Ptv-3, Sonata) are over-tokenized, and proposing an efficient token merging strategy that reduces computation by up to 90-95% while preserving accuracy; and (iii) Over-Optimization in RLHF for LLM Post-Training — exploring how reinforcement learning from human feedback can lead to alignment instability and proposing new insights into optimization LLM post-training have been accepted to NeurIPS 2025 🎉. Excited to present and discuss them at San Diego 🇺🇸 🚀
Sep 09, 2025	🌟 Excited to give a talk about my current research on Scaling Multi-Modal Learning: Hybrid Representations and Efficient Adaptation at Machine Learning Lab, School of Information and Communications Technology (SOICT), Hanoi University of Science and Technology, Vietnam and (ii) School of Computing, National University of Singapore (NUS).
Sep 02, 2025	The MGPath has been accepted to the Transactions on Machine Learning Research. Congratulations to all co-authors on this milestone!
May 01, 2025	🎉 Our first (i) preliminary version, MGPath has been accepted to the Workshop on Foundation Models in the Wild, ICLR 2025 and (ii) another one about LLaMA-Adapter’s prompt learning is accepted at ICML 2025.
Apr 20, 2025	🎉 Our work in building a new Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data has been accepted at Scientific Report, Nature Portfolio.
Feb 20, 2025	Excited to share our latest work! 🎉: (i) On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation – We introduce a Mixture of Experts (MoE) perspective to explain the mechanism behind LLaMA-Adapter’s prompt learning. (ii) MGPath – A novel multi-granular prompt learning method for few-shot WSI pathology prediction, leveraging the power of foundation vision-language models.
Oct 08, 2024	🇨🇭 Start my visiting research at ETH AI Center, ETH Zurich. The topics are about Multi-Modal LLMs for Healthcare empowered by Retrieval-Augmented Generation.
Oct 07, 2024	Excited to introduce our latest work on medical multi-modal LLMs: LoGra-Med, a novel pre-training algorithm that incorporates multi-graph alignment to effectively address the data-hungry nature of autoregressive learning.
Oct 06, 2024	The paper PiToMe has been accepted at NeurIPS 2024. Our code will be available soon!
Jun 10, 2024	Our new preprint PiToMe is online. We propose a new method to do token merging in the Transformer with spectrum-preserving.
May 01, 2024	A paper submitted to ICML 2024 on the molecular conformer aggregation network topic is accepted.
Jan 15, 2024	A paper submitted to ICLR 2024 on the topic of accelerating transformers is accepted as an oral talk.
Sep 22, 2023	A paper submitted to NeurIPS 2023 on a large-scale medical image pre-trained models using second-order graph matching is accepted.

preprints

Under Review

StructSAM: Structure- and Spectrum-Preserving Token Merging for Segment Anything Models

Duy MH Nguyen, Tuan Anh Tran, Thuy-Duong Khanh Nguyen, et al.

2026

PDF
Under Review

FOCA: Future-Oriented Conditioning for Data-Efficient Vision-Language-Action Adaptation

Duc M. Nguyen*, Nghiem T. Diep*, Binh G. Nguyen*, Trong-Bao Ho, Doanh Le, Tan Nguyen, Thien-Loc Ha, Tran Nhiem, Bao Thach, Nhat Tran, and 13 more authors

2026

PDF
Under Review

SparseSAM: Structured Sparsification of Attention in Segment Anything Models

Hoai-Chau Tran, Chi H Nguyen, Duy MH Nguyen, Mathias Niepert, Fan Lai, Khoa D Doan

2026

PDF
Under Review

Clutter-Resistant Vision–Language–Action Models through Object-Centric and Geometry Grounding

Khoa Vo, Taisei Hanyu, Yuki Ikebe, Trong Thang Pham, Nhat Chung, Minh Nhat Vu, Duy MH Nguyen, Anh Nguyen, Anthony Gunderman, Chase Rainwater, and 1 more author

2026

PDF Code
Under Review

Robust-Power-HOOT: Monte Carlo Tree Search for Continuous Stochastic MDPs Under Model Ambiguity

Nam Nguyen, Brahim Driss, Vien Anh Ngo, Duy MH Nguyen, Tuan Quang Dam

2026

PDF
Under Review

S-Chain: Structured Visual Chain-of-Thought for Medicine

Khai Le-Duc*, Duy MH Nguyen*, Phuong T.H. Trinh*, Tien-Phat Nguyen*, et al.

2025

*Co-first contributions

PDF Code
Under Review

The Reasoning Boundary Paradox: How Reinforcement Learning Constrains Language Models

Phuc Minh Nguyen, Chinh D La, Duy MH Nguyen, Nitesh V Chawla, Binh T Nguyen, Khoa D Doan

2025

PDF Code

selected publications

ICLR

FACET: A Fragment-Aware Conformer Ensemble Transformer

Duy MH Nguyen, Trung Quoc Nguyen, Ha Thi Hong Le, Mai TN Truong, TrungTin Nguyen, Nhat Ho, Khoa D Doan, Duy Duong-Tran, Li Shen, Daniel Sonntag, and 4 more authors

International Conference on Learning Representations (ICLR), 2026

PDF
ICRA

SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

Taisei Hanyu, Nhat Chung, Huy Le, Toan Nguyen, Yuki Ikebe, Anthony Gunderman, Duy MH Nguyen, Khoa Vo, Tung Kieu, Kashu Yamazaki , and 3 more authors

IEEE International Conference on Robotics and Automation (ICRA), 2026

PDF
TMLR

DuFal: Dual-Frequency-Aware Learning for High-Fidelity Extremely Sparse-view CBCT Reconstruction

Cuong Tran Van, Thang-Trong Pham, Ngoc-Son Nguyen, Duy MH Nguyen, Ngan Le

Transactions on Machine Learning Research (TMLR), 2026

J2C Certification Award, to be presented at the International Conference on Machine Learning (ICML) 2026, South Korea.

PDF Code
AAAI (Oral)

Reinforce Trustworthiness in Multimodal Emotional Support System

Huy M. Le, Dat Tien Nguyen, Ngan T. T. Vo, Tuan D. Q. Nguyen, Nguyen Le Binh, Duy MH Nguyen, Daniel Sonntag, Lizi Liao, Binh T. Nguyen

Proceedings of the AAAI Conference on Artificial Intelligence, 2026

PDF
AAAI

Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective

Nhat Chung, Taisei Hanyu, Toan Nguyen, Huy Le, Frederick Bumgarner, Duy MH Nguyen, Khoa Vo, Kashu Yamazaki, Chase Rainwater, Tung Kieu , and 2 more authors

Proceedings of the AAAI Conference on Artificial Intelligence, 2026

PDF Code
NeurIPS

ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models

Duy MH Nguyen, Nghiem T. Diep, Trung Q. Nguyen, Hoang-Bao Le, Tai Nguyen, Tien Nguyen, TrungTin Nguyen, Nhat Ho, Pengtao Xie, Roger Wattenhofer, and 3 more authors

Advances in Neural Information Processing Systems (NeurIPS), 2025

Short version was accepted at Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences, ICML 2025

PDF Code
NeurIPS

How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?

Tuan Anh Tran, Duy MH Nguyen, Hoai-Chau Tran, Michael Barz, Khoa D Doan, Roger Wattenhofer, Vien Anh Ngo, Mathias Niepert, Daniel Sonntag, Paul Swoboda

Advances in Neural Information Processing Systems (NeurIPS), 2025

Short version was accepted at 3rd Workshop on Efficient Systems for Foundation Models, ICML 2025

PDF Code
NeurIPS

Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling

Phuc Minh Nguyen, Ngoc-Hieu Nguyen, Duy MH Nguyen, Anji Liu, An Mai, Binh T. Nguyen, Daniel Sonntag, Khoa D. Doan

Advances in Neural Information Processing Systems (NeurIPS), 2025

PDF Code
TMLR

MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification

Anh-Tien Nguyen, Duy MH Nguyen, Nghiem Tuong Diep, Trung Quoc Nguyen, Nhat Ho, Jacqueline Michelle Metsch, Miriam Cindy Maurer, Daniel Sonntag, Hanibal Bohnenberger, Anne-Christin Hauschild

Transactions on Machine Learning Research (TMLR), 2025

Short version was accepted at Workshop on Foundation Models in the Wild, ICLR 2025

PDF Code
ICML

On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation

Nghiem T. Diep*, Huy Nguyen*, Chau Nguyen*, Minh Le, Duy MH Nguyen, Daniel Sonntag, Mathias Niepert, Nhat Ho

International Conference on Machine Learning (ICML), 2025

PDF Code
NeurIPS

Accelerating Transformers with Spectrum-Preserving Token Merging

Hoai-Chau Tran*, Duy MH Nguyen*, Duy M Nguyen, Trung-Tin Nguyen, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y Zou, Binh T Nguyen, Mathias Niepert

Advances in Neural Information Processing Systems (NeurIPS), 2024

*Co-first contributions

PDF Code
ICML

Structure-aware E(3)-invariant molecular conformer aggregation networks

Duy MH Nguyen, Nina Lukashina, Tai Nguyen, An T Le, TrungTin Nguyen, Nhat Ho, Jan Peters, Daniel Sonntag, Viktor Zaverkin, Mathias Niepert

International Conference on Machine Learning (ICML), 2024

PDF Code
ICLR (Oral)

Energy minimizing-based token merging for accelerating Transformers

Hoai-Chau Tran*, Duy MH Nguyen*, Manh-Duy Nguyen, Ngan Hoang Le, Binh T Nguyen

5th Workshop on practical ML for limited/low resource settings, International Conference on Learning Representations (ICLR), 2024

*Co-first contributions

PDF
NeurIPS

LVM-Med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching

Duy MH Nguyen, Hoang Nguyen, Nghiem Diep, Tan Ngoc Pham, Tri Cao, Binh Nguyen, Paul Swoboda, Nhat Ho, Shadi Albarqouni, Pengtao Xie, and 1 more author

Advances in Neural Information Processing Systems (NeurIPS), 2023

PDF Code
AAAI

Joint self-supervised image-volume representation learning with intra-inter contrastive clustering

Duy MH Nguyen, Hoang Nguyen, Truong TN Mai, Tri Cao, Binh T Nguyen, Nhat Ho, Paul Swoboda, Shadi Albarqouni, Pengtao Xie, Daniel Sonntag

Proceedings of the AAAI Conference on Artificial Intelligence, 2023

PDF Code
CVPR

LMGP: Lifted multicut meets geometry projections for multi-camera multi-object tracking

Duy MH Nguyen, Roberto Henschel, Bodo Rosenhahn, Daniel Sonntag, Paul Swoboda

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

PDF