π About Me
Hi! I am a second-year masterβs student at Tsinghua University, under the supervision of Prof. Xiu Li. I received my bachelorβs degree with honors from Shandong University in June 2023.
I have been fortunate to collaborate with exceptional researchers who have generously shared their guidance and insights. Currently, I am a research intern at Large Model Center, Shanghai AI Laboratory, advised by Dr. Biqing Qi. Previously, I interned at Intelligent Photonics and Electronics Center (IPEC), Shanghai AI Laboratory, advised by Dr. Chenjia Bai. Before that, I was a research intern at Peking University, advised by Prof. Yali Du and Prof. Yaodong Yang.
Research Interests: My research centers around Large Language Models (LLMs) and Reinforcement Learning (RL). Specifically, I am interested in:
- Reasoning Capabilities: Enhancing the reasoning and generalization abilities of LLMs and Multi-modal LLMs (MLLMs), from both training-time and test-time perspectives.
- Efficiency: Improving the training and inference efficiency of LLMs while maintaining or enhancing performance.
- Evaluation: Developing more reliable and comprehensive evaluation methods to better assess LLM performance across diverse scenarios.
- LLM4RL: Leveraging the power of LLMs/MLLMs to improve RL algorithms in embodied AI tasks, particularly in the context of reward design and RL from Human/AI Feedback (RLHF/RLAIF).
If you are interested in collaboration, please feel free to reach out via e-mail!
π News
- [2025.03] Β π One paper accepted by Reasoning and Planning for LLMs Workshop @ ICLR 2025
- [2025.02] Β π₯ Preprint Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling released on arXiv (Project Page)
- [2025.01] Β π One paper accepted by ICLR 2025
- [2024.12] Β π One paper accepted by AAAI 2025 and selected for oral presentation (Top 4.6%)
- [2024.05] Β π One paper accepted by ICML 2024
- [2024.01] Β π One paper accepted by ICLR 2024
- [2023.10] Β π One paper accepted by OTML Workshop @ NeurIPS 2023
- [2022.09] Β π One paper accepted by NeurIPS 2022
π Publications
(* indicates equal contribution)
Preprints
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou
[Project Page] [GitHub 200+ Stars] [HuggingFace Daily Papers Top 1] [QbitAI (ιεδ½)] [AI Era (ζ°ζΊε )]
Preprint, 2025
- VLP: Vision-Language Preference Learning for Embodied Manipulation
Runze Liu, Chenjia Bai, Jiafei Lyu, Shengjie Sun, Yali Du, Xiu Li
Preprint, 2025
- A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning
Shengjie Sun*, Runze Liu*, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang, Xiu Li
Preprint, 2024
Conference Papers
- PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li
ICML 2024
- Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning
Runze Liu, Fengshuo Bai, Yali Du, Yaodong Yang
NeurIPS 2022
- RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang
AAAI 2025 Oral (Top 4.6%)
- Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint
Jiafei Lyu, Mengbei Yan, Zhongjian Qiao, Runze Liu, Xiaoteng Ma, Deheng Ye, Jing-Wen Yang, Zongqing Lu, Xiu Li
ICLR 2025
- SEABO: A Simple Search-Based Method for Offline Imitation Learning
Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu
ICLR 2024
Workshop Papers
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou
Reasoning and Planning for LLMs, ICLR 2025
- Zero-shot Cross-task Preference Alignment for Offline RL via Optimal Transport
Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li
Optimal Transport and Machine Learning, NeurIPS 2023
π Education
- Tsinghua University, 2023.09 - 2026.06
M.Eng. in Electronic and Information Engineering (AI) - Shandong University, 2019.09 - 2023.06
B.S. in Statistics (Data Science & AI) with honors
π Honors and Awards
- National Scholarship (Top 1%), 2022.12
- National Scholarship (Top 1%), 2021.12
- First Prize in China Undergraduate Mathematical Contest in Modeling (CUMCM) (Top 0.65%), 2021.11
- Outstanding Student of Shandong Province (Top 0.6%), 2022.05
- Outstanding Graduate of Shandong Province (Top 6%), 2023.04
- Dishang Scholarship, 2022.10
π» Internships
- Research Intern, Large Model Center, Shanghai AI Laboratory, 2024.10 - 2025.03.
- Research Intern, Intelligent Photonics and Electronics Center (IPEC), Shanghai AI Laboratory, 2024.03 - 2024.09.
- Research Intern, Institute for AI, Peking University, 2022.01 - 2022.09.
π Invited Talks
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling. Xiaohongshu. 2025.02.
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling. Huawei Noahβs Ark Lab. 2025.03.
π οΈ Services
- Conference Reviewer: NeurIPS (2024), ICLR (2025), ICML (2025), AAMAS (2024), AISTATS (2025), ECAI (2024)
- Workshop Reviewer: NeurIPS OTML (2023)