๐Ÿ‘‹ About Me

Hi! I am a final-year masterโ€™s student at Tsinghua University, under the supervision of Prof. Xiu Li. I received my bachelorโ€™s degree with honors from Shandong University in June 2023.

Previously, I interned at Kuaishou (working with Jiakang Wang and Dr. Fuzheng Zhang), Shanghai AI Laboratory (working with Dr. Biqing Qi and Dr. Chenjia Bai), and Peking University (working with Prof. Yali Du and Prof. Yaodong Yang).

Research Interests: My research centers around Large Language Models (LLMs) and Reinforcement Learning (RL). Specifically, I am interested in:

  • Reasoning: Enhancing the reasoning capabilities of LLMs and Multi-modal LLMs (MLLMs).
  • Agents: Long-horizon planning agents & LLM agents for real-world workflows.
  • LLM4RL: Leveraging the power of LLMs/MLLMs to improve RL algorithms in embodied AI tasks, particularly in the context of reward design and RL from Human/AI Feedback (RLHF/RLAIF).

If you are interested in collaboration, please feel free to reach out via e-mail!

๐ŸŒŸ News

  • [2025.09] ย ๐ŸŽ‰ One paper accepted by NeurIPS 2025
  • [2025.09] ย ๐Ÿ”ฅ Preprint A Survey of Reinforcement Learning for Large Reasoning Models released at arXiv
  • [2025.08] ย ๐ŸŽ‰ Two papers accepted by EMNLP 2025
  • [2025.05] ย ๐Ÿ”ฅ Our multi-agent RL framework for LLM reasoning released (GitHub)!
  • [2025.03] ย ๐ŸŽ‰ One paper accepted by Reasoning and Planning for LLMs Workshop @ ICLR 2025
  • [2025.01] ย ๐ŸŽ‰ One paper accepted by ICLR 2025
  • [2024.12] ย ๐ŸŽ‰ One paper accepted by AAAI 2025 and selected for oral presentation (Top 4.6%)
  • [2024.05] ย ๐ŸŽ‰ One paper accepted by ICML 2024
  • [2024.01] ย ๐ŸŽ‰ One paper accepted by ICLR 2024
  • [2022.09] ย ๐ŸŽ‰ One paper accepted by NeurIPS 2022

๐Ÿ“ Publications

(* denotes equal contribution, โ€  denotes project lead)

Preprints

Conference Papers

Journal Papers

Workshop Papers

๐ŸŽ“ Education

๐ŸŽ– Honors and Awards

  • National Scholarship (Top 1%), 2022.12
  • National Scholarship (Top 1%), 2021.12
  • First Prize in China Undergraduate Mathematical Contest in Modeling (CUMCM) (Top 0.65%), 2021.11
  • Outstanding Student of Shandong Province (Top 0.6%), 2022.05
  • Outstanding Graduate of Shandong Province (Top 6%), 2023.04
  • Dishang Scholarship, 2022.10

๐Ÿ’ป Internships

๐ŸŽ™ Invited Talks

  • Scaling Test-Time Compute of LLMs and PRMs for Mathematical Reasoning. ASAP Seminar. 2025.06.
  • Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling. Huawei Noahโ€™s Ark Lab. 2025.03.
  • Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling. Xiaohongshu. 2025.02.

๐Ÿ› ๏ธ Services

  • Conference Reviewer: NeurIPS (2024 - 2025), ICLR (2025 - 2026), ICML (2025), AAAI (2026), AAMAS (2024), AISTATS (2025), ECAI (2024)
  • Journal Reviewer: IEEE Transactions on Artificial Intelligence (TAI)
  • Workshop Reviewer: NeurIPS OTML (2023)