๐ About Me
Hi! I am an incoming Ph.D. student at the University of Hong Kong, starting in the fall of 2026. Currectly, I am a second-year masterโs student at Tsinghua University, under the supervision of Prof. Xiu Li. I received my bachelorโs degree with honors from Shandong University in June 2023.
I am currently a research intern at Kuaishou (Kwai Star Plan). Previously, I was a research intern at Shanghai AI Laboratory, advised by Dr. Biqing Qi (Large Model Center) and Dr. Chenjia Bai (Intelligent Photonics and Electronics Center). Before that, I was a research intern at Peking University, advised by Prof. Yali Du and Prof. Yaodong Yang.
Research Interests: My research centers around Large Language Models (LLMs) and Reinforcement Learning (RL). Specifically, I am interested in:
- Reasoning: Enhancing the reasoning and generalization abilities of LLMs and Multi-modal LLMs (MLLMs).
- Evaluation: Developing reliable and comprehensive evaluation methods to assess LLM performance across diverse scenarios.
- LLM4RL: Leveraging the power of LLMs/MLLMs to improve RL algorithms in embodied AI tasks, particularly in the context of reward design and RL from Human/AI Feedback (RLHF/RLAIF).
If you are interested in collaboration, please feel free to reach out via e-mail!
๐ News
- [2025.07] ย ๐ One paper accepted by Knowledge-Based Systems
- [2025.05] ย ๐ฅ Our multi-agent RL framework for LLM reasoning released (GitHub)!
- [2025.03] ย ๐ One paper accepted by Reasoning and Planning for LLMs Workshop @ ICLR 2025
- [2025.02] ย ๐ฅ Preprint Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling released on arXiv (Project Page)
- [2025.01] ย ๐ One paper accepted by ICLR 2025
- [2024.12] ย ๐ One paper accepted by AAAI 2025 and selected for oral presentation (Top 4.6%)
- [2024.05] ย ๐ One paper accepted by ICML 2024
- [2024.01] ย ๐ One paper accepted by ICLR 2024
- [2023.10] ย ๐ One paper accepted by OTML Workshop @ NeurIPS 2023
- [2022.09] ย ๐ One paper accepted by NeurIPS 2022
๐ Publications
(* denotes equal contribution, โ indicates project lead)
Preprints
- Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR
Jiakang Wang, Runze Liu, Fuzheng Zhang, Xiu Li, Guorui Zhou
[GitHub] [QbitAI (้ๅญไฝ)]
Preprint, 2025
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou
[Project Page] [GitHub 200+ Stars] [HuggingFace Daily Papers Top 1] [QbitAI (้ๅญไฝ)] [AI Era (ๆฐๆบๅ )]
Preprint, 2025
- GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
Jian Zhao*, Runze Liu*โ , Kaiyan Zhang, Zhimu Zhou, Junqi Gao, Dong Li, Jiafei Lyu, Zhouyi Qian, Biqing Qi, Xiu Li, Bowen Zhou
[Project Page] [GitHub] [Awesome Process Reward Models] [Synced (ๆบๅจไนๅฟ)]
Preprint, 2025
- VLP: Vision-Language Preference Learning for Embodied Manipulation
Runze Liu, Chenjia Bai, Jiafei Lyu, Shengjie Sun, Yali Du, Xiu Li
Preprint, 2025
- ReviewRL: Towards Automated Scientific Review with RL
Sihang Zeng, Kai Tian, Kaiyan Zhang, Yuru Wang, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, Bowen Zhou
Preprint, 2025
- Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Junqi Gao, Zhichang Guo, Dazhi Zhang, Dong Li, Runze Liu, Pengfei Li, Kai Tian, Biqing Qi,
Preprint, 2025
Conference Papers
- PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li
ICML 2024
- Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning
Runze Liu, Fengshuo Bai, Yali Du, Yaodong Yang
NeurIPS 2022
- RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang
AAAI 2025 Oral (Top 4.6%)
- Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint
Jiafei Lyu, Mengbei Yan, Zhongjian Qiao, Runze Liu, Xiaoteng Ma, Deheng Ye, Jing-Wen Yang, Zongqing Lu, Xiu Li
ICLR 2025
- SEABO: A Simple Search-Based Method for Offline Imitation Learning
Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu
ICLR 2024
Journal Papers
- A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning
Shengjie Sun*, Runze Liu*, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang, Xiu Li
Knowledge-Based Systems, 2025
Workshop Papers
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou
Reasoning and Planning for LLMs @ ICLR 2025
- Zero-shot Cross-task Preference Alignment for Offline RL via Optimal Transport
Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li
Optimal Transport and Machine Learning @ NeurIPS 2023
๐ Education
- The University of Hong Kong, 2026 - 2030 (expected)
Incoming Ph.D. Student in Computer Science - Tsinghua University, 2023.09 - 2026.06
M.Eng. in Electronic and Information Engineering (AI) - Shandong University, 2019.09 - 2023.06
B.S. in Statistics (Data Science & AI) with honors
๐ Honors and Awards
- National Scholarship (Top 1%), 2022.12
- National Scholarship (Top 1%), 2021.12
- First Prize in China Undergraduate Mathematical Contest in Modeling (CUMCM) (Top 0.65%), 2021.11
- Outstanding Student of Shandong Province (Top 0.6%), 2022.05
- Outstanding Graduate of Shandong Province (Top 6%), 2023.04
- Dishang Scholarship, 2022.10
๐ป Internships
- Research Intern, Kuaishou (Kwai Star Plan), 2025.06 - present.
- Research Intern, Large Model Center, Shanghai AI Laboratory, 2024.10 - 2025.03.
- Research Intern, Intelligent Photonics and Electronics Center (IPEC), Shanghai AI Laboratory, 2024.03 - 2024.09.
- Research Intern, Institute for AI, Peking University, 2022.01 - 2022.09.
๐ Invited Talks
- Scaling Test-Time Compute of LLMs and PRMs for Mathematical Reasoning. ASAP Seminar. 2025.06.
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling. Huawei Noahโs Ark Lab. 2025.03.
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling. Xiaohongshu. 2025.02.
๐ ๏ธ Services
- Conference Reviewer: NeurIPS (2024 - 2025), ICLR (2025), ICML (2025), AAAI (2026), AAMAS (2024), AISTATS (2025), ECAI (2024)
- Journal Reviewer: IEEE Transactions on Artificial Intelligence (TAI)
- Workshop Reviewer: NeurIPS OTML (2023)