About Me

Hao is a PhD from the Text Mining and NLP group at the University of Manchester, UK. He currently working as a research intern at Microsoft Research. His research centres on Natural Language Process, Computer Vision and Health AI, with a recent focus on foundation models and diffusion modeling, especially in the healthcare domain. His work has been published in leading venues such as ICML, KDD, ACL, EMNLP, and NAACL, accumulating over a hundred citations.

Open to work!

I am actively seeking full-time positions, including research scientist, researcher, assistant professor and postdoc.
  • I am open to working in China (including Hong Kong), UK, Switzerland, USA, Canada and Singapore.
  • I am open to opportunities in both industry and academia, particularly in technology, quantitative finance, and biomedicine.
  • Please feel free to contact me through email: hao.li-2@manchester.ac.uk or telephone: (+86) 18686876728; (+44) 07432578928
  • News

    Research Statement

    Foundation Models

    Keywords: Pretraining, Modular Architectures, Generalization across Modalities

    My core research focuses on developing foundation models that generalize across tasks, modalities, and domains. I explore modular architectures—such as sparse experts, adaptive attention, and continuous-time representations—that can support forecasting, reasoning, and generation across diverse data types including time-series, images, and texts.

    Generative Modelling

    Keywords: Diffusion Models, Controllable Generation, Text-to-Time Series

    I design generative systems that synthesize structured data such as multivariate time series or clinical signals from natural language. My work includes diffusion-based models that allow fine-grained control over the output, enabling precise and faithful generation for simulation, augmentation, and decision support.

    Health AI

    Keywords: EHR, Medical Time Series, Clinical Decision Support

    I work on building trustworthy AI systems for healthcare, with a particular focus on modeling real-world clinical data such as EHRs, physiological signals (e.g., ECG), and longitudinal records. My research aims to support clinical reasoning, outcome prediction, and health system optimization.

    LLM Evaluation & Attribution

    Keywords: Zero-shot Evaluation, Stability, Influence Functions

    I investigate how large language models perform across tasks, domains, and inputs, with a focus on robustness and attribution. My work includes instance-level analysis of LLM behavior, tracing dataset influence on model predictions, and identifying failure modes in biomedical or instruction-following settings.

    Cross-modal Representation Learning

    Keywords: Vision-Language Alignment, Knowledge Distillation, Multimodal Understanding

    I study how representations can be aligned across modalities—such as language, vision, and structured signals—to enable joint reasoning and transfer. My work explores cross-modal distillation, token pruning, and contrastive alignment for both general-purpose LLMs and medical multimodal applications.

    Research Works

    Publications ( show selected / show all by date / show all by topic )

    Topics: Foundation Model / Generative Modelling / Health AI / LLM Evaluation & Attribution / Cross-modal Representation Learning / NLP & Argument Mining (* indicates equal contribution)

    MIRA
    MIRA: Medical Time Series Foundation Model for Real-World Health Data
    Hao Li, Bowen Deng, Chang Xu, Zhiyuan Feng, Viktor Schlegel, Yu-Hao Huang, Yizheng Sun, Jingyuan Sun, Kailai Yang, Yiyao Yu, Jiang Bian

    Preprint 2025

    BRIDGE
    BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling
    Hao Li, Yu-Hao Huang, Chang Xu, Viktor Schlegel, Renhe Jiang, Riza Batista-Navarro , Goran Nenadic, Jiang Bian

    ICML 2025 / Paper

    TarDiff
    TarDiff: Target-Oriented Diffusion Guidance for Synthetic Electronic Health Record Time Series Generation
    Bowen Deng, Chang Xu, Hao Li,Yu-Hao Huang, Min Hou, Jiang Bian

    KDD 2025 / Paper

    AGENT
    Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training
    Kailai Yang, Xiao Liu, Lei Ji, Hao Li, Yeyun Gong, Peng Cheng, Mao Yang

    Preprint 2025

    accelerated
    Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study
    Yizheng Sun, Hao Li, Chang Xu, Hongpeng Zhou, Chenghua Lin, Riza Batista-Navarro, Jingyuan Sun

    Preprint 2025

    lvpruning
    LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models
    Yizheng Sun, Yanze Xin, Hao Li, Jingyuan Sun, Chenghua Lin, Riza Batista-Navarro

    NAACL 2025 / Paper

    DSP
    Generating Realistic Multi-Beat ECG Signals
    Paul Pöhl, Viktor Schlegel, Hao Li, Anil Bharath

    DSP 2025 / Paper

    dataset
    Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation
    Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran Nenadic

    ACL 2024 / Paper

    cif-bench
    CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
    Yizhi Li, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W Huang, Chenghua Lin, Wenhu Chen, Jie Fu

    ACL 2024 / Paper

    KPA
    Do You Hear the People Sing? Key Point Analysis via Iterative Clustering and Abstractive Summarisation
    Hao Li, Viktor Schlegel, Riza Theresa Batista-Navarro, Goran Nenadic

    ACL 2023 / Paper

    thrain
    Not All Quantifiers Are Equal: Probing Transformer-based Language Models’ Understanding of Generalised Quantifiers
    Tharindu Madusanka, Iqra Zahid, Hao Li, Ian Pratt-Hartmann, Riza Batista-Navarro

    EMNLP 2023 / Paper

    PULSAR
    PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients’ Problems and Data Augmentation with Black-box Large Language Models
    Hao Li, Yuping Wu, Viktor Schlegel, Riza Theresa Batista-Navarro, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Xiao-Jun Zeng, Daniel Beck, Stefan Winkler, Goran Nenadic

    BioNLP @ ACL 2023 / Paper

    Visitor Analytics

    Real-time visitor tracking and global reach visualization

    0
    Total Visitors
    0
    Countries
    0
    Cities
    1
    Online Now

    Global Visitor Map

    You
    Other Visitors

    Loading visitor data...