Building Self-Bootstrapping System
👤 CV ✉️ Email 🎓 Google Scholar 🧑💻 Github 💼 LinkedIn 🐦 X 🎶 SUNO 📷 Travel
🔎Research and Personal Interests
- 🧠 Reinforcement Learning
- 🤖 Machine Learning
- :capybara_icon: Capybara
🚶 Experience
🏫 Education
- 09/2024 ~ Present: University of Southampton :southampton:
- 09/2023 ~ 09/2024: University of Liverpool :liverpool:
- 09/2019 ~ 04/2022: Nanjing University of Aeronautics and Astronautics :nuaa:
- 09/2015 ~ 09/2019: Nanjing University of Aeronautics and Astronautics :nuaa:
- Undergraduate Student in Mathematics.
🏢 Work
- 09/2025 ~ Present: Cohere :cohere:
- Intern of Technical Staff.
- CodeGen Team, London, UK
- 08/2024 ~ 02/2025: Huawei Noah's Ark Lab :noahs_ark_lab_logo:
- Research Intern
- Mentored by Dr. Kun Shao.
- AI Agent Team, London, UK
- 04/2022 ~ 09/2023: **Parametrix.AI** :ccs2:
- Gaming AI Research Engineer
- Mentored by North Yang and Heisenberg Guo.
- Game AI P2 Team, Shenzhen, CN
📄Publication
- Directly Forecasting Belief for Reinforcement Learning with Delays.
- Qingyuan Wu, Yuhui Wang, Simon Sinong Zhan, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang.
- [ICML 2025] International Conference on Machine Learning, 2025, Poster.
- [Paper | Code | Poster]
- Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning.
- Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber.
- [ICML 2025] International Conference on Machine Learning, 2025, Poster.
- [Paper]
- Variational Delayed Policy Optimization.
- Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang.
- [NeurIPS 2024] Conference on Neural Information Processing Systems, 2024, Spotlight.
- [Paper | Code | Poster]
- Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays.
- Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang.
- [ICML 2024] International Conference on Machine Learning, 2024, Poster.
- [Paper | Code | Poster]
- Highway Value Iteration Networks.
- Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber.
- [ICML 2024] International Conference on Machine Learning, 2024, Poster.
- [Paper]
- State-wise Safe Reinforcement Learning with Pixel Observations.
- Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu.
- [L4DC 2024] Learning for Dynamics and Control Conference, 2024, Poster.
- [Paper | Code]
📝Pre-print