Building Self-Bootstrapping System
👤 CV ✉️ Email 🎓 Google Scholar 🧑💻 Github 💼 LinkedIn 🐦 X 🎶 SUNO 📷 Travel
🔎Research and Personal Interests
- 🧠 Reinforcement Learning
- 🤖 Machine Learning
- :capybara_icon: Capybara
🚶 Experience
🏫 Education
- 09/2024 ~ Present: University of Southampton :southampton:
- 09/2023 ~ 09/2024: University of Liverpool :liverpool:
- 09/2019 ~ 04/2022: Nanjing University of Aeronautics and Astronautics :nuaa:
- 09/2015 ~ 09/2019: Nanjing University of Aeronautics and Astronautics :nuaa:
- Undergraduate Student in Mathematics.
🏢 Work
- 09/2025 ~ Present: Cohere :cohere:
- Intern of Technical Staff.
- CodeGen Team, London, UK
- 08/2024 ~ 02/2025: Huawei Noah's Ark Lab :noahs_ark_lab_logo:
- Research Intern
- Mentored by Dr. Kun Shao.
- AI Agent Team, London, UK
- 04/2022 ~ 09/2023: **Parametrix.AI** :ccs2:
- Gaming AI Research Engineer
- Mentored by North Yang and Heisenberg Guo.
- Game AI P2 Team, Shenzhen, CN
📄Papers
- A Unified Framework for Rethinking Policy Divergence Measures in GRPO.
- Qingyuan Wu, Yuhui Wang, Simon Sinong Zhan, Yanning Dai, Shilong Deng, Sarra Habchi, Qi Zhu, Matthias Gallé, Chao Huang.
- [Paper]
- Efficient Multi-step Reinforcement Learning with Expectation-Maximization Bootstrapping.
- Qingyuan Wu, Yuhui Wang, Simon Sinong Zhan, Qi Zhu, Jürgen Schmidhuber, Chao Huang.
- [Paper]
- Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization.
- Simon Sinong Zhan, Qingyuan Wu, Philip Wang, Frank Yang, Xiangyu Shi, Chao Huang, Qi Zhu.
- [ICLR 2026] International Conference on Learning Representations, 2026, Poster.
- [Paper]
- Enhancing Inverse Reinforcement Learning through Encoding Dynamic Information in Reward Shaping.
- Simon Sinong Zhan, Philip Wang, Qingyuan Wu, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu.
- [L4DC 2026] Learning for Dynamics and Control Conference, 2026, Poster.
- [Paper]
- VSC-RL: Advancing Autonomous Vision-Language Agents with Variational Subgoal-Conditioned Reinforcement Learning.
- Qingyuan Wu, Jianheng Liu, Jianye Hao, Jun Wang, Kun Shao.
- 1st Workshop on VLM4RWD at NeurIPS 2025, Poster.
- [Paper | Code | Website]
- Directly Forecasting Belief for Reinforcement Learning with Delays.
- Qingyuan Wu, Yuhui Wang, Simon Sinong Zhan, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang.
- [ICML 2025] International Conference on Machine Learning, 2025, Poster.
- [Paper | Code | Poster]
- Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning.
- Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber.
- [ICML 2025] International Conference on Machine Learning, 2025, Poster.
- [Paper]