About Me

I am an AI researcher. Prior to this, I was a postdoctoral researcher at Imperial College London. I received my PhD from the University of Manchester and previously worked as a research intern at Microsoft Research. My research focuses on generative modelling, with a recent emphasis on foundation models and diffusion modelling, particularly for healthcare applications. My work has been published in top conferences such as NeurIPS, ICML, KDD, ACL, EMNLP, and NAACL, and has collectively received over a hundred citations.

News

Research Statement

Foundation Models

Keywords: Pretraining, Modular Architectures, Generalization across Modalities

My core research focuses on developing foundation models that generalize across tasks, modalities, and domains. I explore modular architectures—such as sparse experts, adaptive attention, and continuous-time representations—that can support forecasting, reasoning, and generation across diverse data types including time-series, images, and texts.

Generative Modelling

Keywords: Diffusion Models, Controllable Generation, Text-to-Time Series

I design generative systems that synthesize structured data such as multivariate time series or clinical signals from natural language. My work includes diffusion-based models that allow fine-grained control over the output, enabling precise and faithful generation for simulation, augmentation, and decision support.

Health AI

Keywords: EHR, Medical Time Series, Clinical Decision Support

I work on building trustworthy AI systems for healthcare, with a particular focus on modeling real-world clinical data such as EHRs, physiological signals (e.g., ECG), and longitudinal records. My research aims to support clinical reasoning, outcome prediction, and health system optimization.

LLM Evaluation & Attribution

Keywords: Zero-shot Evaluation, Stability, Influence Functions

I investigate how large language models perform across tasks, domains, and inputs, with a focus on robustness and attribution. My work includes instance-level analysis of LLM behavior, tracing dataset influence on model predictions, and identifying failure modes in biomedical or instruction-following settings.

Cross-modal Representation Learning

Keywords: Vision-Language Alignment, Knowledge Distillation, Multimodal Understanding

I study how representations can be aligned across modalities—such as language, vision, and structured signals—to enable joint reasoning and transfer. My work explores cross-modal distillation, token pruning, and contrastive alignment for both general-purpose LLMs and medical multimodal applications.

Research Works

Publications ( show selected / show all by date / show all by topic )

Topics: Foundation Model / Generative Modelling / Health AI / LLM Evaluation & Attribution / Cross-modal Representation Learning / NLP & Argument Mining (* indicates equal contribution)

MIRA
MIRA: Medical Time Series Foundation Model for Real-World Health Data
Hao Li, Bowen Deng, Chang Xu, Zhiyuan Feng, Viktor Schlegel, Yu-Hao Huang, Yizheng Sun, Jingyuan Sun, Kailai Yang, Yiyao Yu, Jiang Bian

NeurIPS 2025 / Paper

BRIDGE
BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling
Hao Li, Yu-Hao Huang, Chang Xu, Viktor Schlegel, Renhe Jiang, Riza Batista-Navarro , Goran Nenadic, Jiang Bian

ICML 2025 / Paper

TarDiff
TarDiff: Target-Oriented Diffusion Guidance for Synthetic Electronic Health Record Time Series Generation
Bowen Deng, Chang Xu, Hao Li,Yu-Hao Huang, Min Hou, Jiang Bian

KDD 2025 / Paper

AGENT
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training
Kailai Yang, Xiao Liu, Lei Ji, Hao Li, Yeyun Gong, Peng Cheng, Mao Yang

Preprint 2025

accelerated
Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study
Yizheng Sun, Hao Li, Chang Xu, Hongpeng Zhou, Chenghua Lin, Riza Batista-Navarro, Jingyuan Sun

EMNLP 2025 / Paper

lvpruning
LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models
Yizheng Sun, Yanze Xin, Hao Li, Jingyuan Sun, Chenghua Lin, Riza Batista-Navarro

NAACL 2025 / Paper

dataset
Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation
Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran Nenadic

ACL 2024 / Paper

cif-bench
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
Yizhi Li, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W Huang, Chenghua Lin, Wenhu Chen, Jie Fu

ACL 2024 / Paper

KPA
Do You Hear the People Sing? Key Point Analysis via Iterative Clustering and Abstractive Summarisation
Hao Li, Viktor Schlegel, Riza Theresa Batista-Navarro, Goran Nenadic

ACL 2023 / Paper

thrain
Not All Quantifiers Are Equal: Probing Transformer-based Language Models’ Understanding of Generalised Quantifiers
Tharindu Madusanka, Iqra Zahid, Hao Li, Ian Pratt-Hartmann, Riza Batista-Navarro

EMNLP 2023 / Paper

Visitor Analytics

Real-time visitor tracking and global reach visualization

0
Total Visitors
0
Countries
0
Cities
1
Online Now

Global Visitor Map

You
Other Visitors

Loading visitor data...