About Me
Hao is a PhD from the Text Mining and NLP group at the University of Manchester, UK. He currently working as a research intern at Microsoft Research. His research centres on Natural Language Process, Computer Vision and Health AI, with a recent focus on foundation models and diffusion modeling, especially in the healthcare domain. His work has been published in leading venues such as ICML, KDD, ACL, EMNLP, and NAACL, accumulating over a hundred citations.
Open to work!
I am actively seeking full-time positions, including research scientist, researcher, assistant professor and postdoc.News
-
We are releasing TimeCraft, an innovative framework for synthetic time series generation!
TimeCraft introduces a novel approach by learning a universal latent space of semantic prototypes for time series, enabling impressive cross-domain generalization, while leveraging a diffusion-based framework to ensure high-fidelity and diverse data generation.
Key technical highlights:- 🔹 Prototype Assignment Module (PAM): Adapts to new domains with few-shot examples.
- 🔹 Text-based Control: Leverage natural language to guide the generation process.
- 🔹 Influence-Guided Diffusion: Generates synthetic samples optimized for downstream task performance.
- Our papers TarDiff: Target-Oriented Diffusion Guidance for Synthetic Electronic Health Record Time Series Generation Diffusion Models have been accepted to KDD 2025 !
- (First Author) Our paper BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling has been accepted to ICML 2025 ! ! Here is the preprint version for preview in advance.
🔗 Explore the repository:
https://github.com/microsoft/TimeCraft
📄 中文宣传稿:
宣传稿链接
Research Statement
Foundation Models
My core research focuses on developing foundation models that generalize across tasks, modalities, and domains. I explore modular architectures—such as sparse experts, adaptive attention, and continuous-time representations—that can support forecasting, reasoning, and generation across diverse data types including time-series, images, and texts.
Generative Modelling
I design generative systems that synthesize structured data such as multivariate time series or clinical signals from natural language. My work includes diffusion-based models that allow fine-grained control over the output, enabling precise and faithful generation for simulation, augmentation, and decision support.
Health AI
I work on building trustworthy AI systems for healthcare, with a particular focus on modeling real-world clinical data such as EHRs, physiological signals (e.g., ECG), and longitudinal records. My research aims to support clinical reasoning, outcome prediction, and health system optimization.
LLM Evaluation & Attribution
I investigate how large language models perform across tasks, domains, and inputs, with a focus on robustness and attribution. My work includes instance-level analysis of LLM behavior, tracing dataset influence on model predictions, and identifying failure modes in biomedical or instruction-following settings.
Cross-modal Representation Learning
I study how representations can be aligned across modalities—such as language, vision, and structured signals—to enable joint reasoning and transfer. My work explores cross-modal distillation, token pruning, and contrastive alignment for both general-purpose LLMs and medical multimodal applications.
Research Works
Publications ( show selected / show all by date / show all by topic )
Topics: Foundation Model / Generative Modelling / Health AI / LLM Evaluation & Attribution / Cross-modal Representation Learning / NLP & Argument Mining (* indicates equal contribution)

MIRA: Medical Time Series Foundation Model for Real-World Health Data
Hao Li, Bowen Deng, Chang Xu, Zhiyuan Feng, Viktor Schlegel, Yu-Hao Huang, Yizheng Sun, Jingyuan Sun, Kailai Yang, Yiyao Yu, Jiang Bian

BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling
Hao Li, Yu-Hao Huang, Chang Xu, Viktor Schlegel, Renhe Jiang, Riza Batista-Navarro , Goran Nenadic, Jiang Bian

TarDiff: Target-Oriented Diffusion Guidance for Synthetic Electronic Health Record Time Series Generation
Bowen Deng, Chang Xu, Hao Li,Yu-Hao Huang, Min Hou, Jiang Bian

Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training
Kailai Yang, Xiao Liu, Lei Ji, Hao Li, Yeyun Gong, Peng Cheng, Mao Yang

Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study
Yizheng Sun, Hao Li, Chang Xu, Hongpeng Zhou, Chenghua Lin, Riza Batista-Navarro, Jingyuan Sun

LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models
Yizheng Sun, Yanze Xin, Hao Li, Jingyuan Sun, Chenghua Lin, Riza Batista-Navarro

Generating Realistic Multi-Beat ECG Signals
Paul Pöhl, Viktor Schlegel, Hao Li, Anil Bharath

Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation
Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran Nenadic

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
Yizhi Li, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W Huang, Chenghua Lin, Wenhu Chen, Jie Fu

Do You Hear the People Sing? Key Point Analysis via Iterative Clustering and Abstractive Summarisation
Hao Li, Viktor Schlegel, Riza Theresa Batista-Navarro, Goran Nenadic

Not All Quantifiers Are Equal: Probing Transformer-based Language Models’ Understanding of Generalised Quantifiers
Tharindu Madusanka, Iqra Zahid, Hao Li, Ian Pratt-Hartmann, Riza Batista-Navarro

PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients’ Problems and Data Augmentation with Black-box Large Language Models
Hao Li, Yuping Wu, Viktor Schlegel, Riza Theresa Batista-Navarro, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Xiao-Jun Zeng, Daniel Beck, Stefan Winkler, Goran Nenadic
Visitor Analytics
Real-time visitor tracking and global reach visualization
Global Visitor Map
Loading visitor data...