|
Haozhe Wang
I am a second-year PhD student at the Hong Kong University of Science and Technology (HKUST) (commenced 2024.09), advised by Prof. Fangzhen Lin, and in close collaboration with Prof. Wenhu Chen at the University of Waterloo. I am supported by the Hong Kong PhD Fellowship Scheme (HKPFS).
My previous research focuses on reasoning RL, multimodal understanding, and agentic training. My recent focus shifts toward reward-based training and proactive agentic systems for visual generation and video world models.
I expect to graduate in 2027 and am actively looking for positions in industry.
I have mentored many students for research, feel free to reach out!
Feel free to connect! jasper.whz@outlook.com
Email /
Google Scholar /
GitHub
|
|
News
09/2024: Commenced PhD at HKUST with Hong Kong PhD Fellowship Scheme (HKPFS)
|
Research Highlights
Visual Generation (Recent)
- RationalRewards (arXiv 2026) — reasoning rewards scale visual generation
- Search-Augmented Agentic Generation (to appear) — search what visual generators cannot be taught
- RenderWorld (ICRA 2025) — world model with self-supervised 3D labels
Multimodal & Agentic
- Pixel Reasoner (NeurIPS 2025) — curiosity-driven pixel-space reasoning
- Bad Seeing or Bad Thinking (ICML 2026 Oral) — rewarding perception for multimodal reasoning
- VerlTool (TMLR 2026, ICLR Workshop Best Paper) — holistic agentic RL with tool use
- CogDoc (arXiv 2025) — unified thinking in documents
- EvoCUA (ICML 2026 Workshop) — evolving computer use agents via synthetic experience
Reasoning RL
- VL-Rethinker (NeurIPS 2025 Spotlight) — self-reflection in VLMs via RL
- Hierarchical Reasoner (ICLR 2026) — emergent hierarchical reasoning through RL
- REER (ICLR 2026) — reverse-engineered reasoning for open-ended generation
Coding
|
|
Search-Augmented Agentic Generation: Search What Visual Generators Cannot Be Taught
Haozhe Wang, et al.
Internship @ Qwen Applications Team
To appear
Addressing real-world text-to-image requests that rely on search knowledge beyond model parameters
|
|
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Haozhe Wang, Cong Wei, Weiming Ren, Jiaming Liu, Fangzhen Lin, Wenhu Chen
arXiv 2026
website /
paper /
code
Scaling visual generation quality via reasoning-based reward models at both train and test time
|
|
Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth
Yuhuan Wu, Cong Wei, Fangzhen Lin, Wenhu Chen, Haozhe Wang
arXiv 2026
paper
Constraining visual bandwidth to force more attentive perception in VLMs
|
|
Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning
Haozhe Wang, Qixin Xu, Changpeng Wang, Taofeng Xue, Chong Peng, Wenhu Chen
Internship @ Meituan LongCat Team
ICML 2026 (Oral Presentation)
paper
Disentangling perception from reasoning in VLMs via targeted reward signals
|
|
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Haozhe Wang, Qixin Xu, Che Liu, Junhong Wu, Fangzhen Lin, Wenhu Chen
ICLR 2026
website /
paper /
code
RL induces emergent hierarchical decomposition of complex reasoning tasks
|
|
Reverse-Engineered Reasoning for Open-Ended Generation
Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou, Jiazhan Feng
Internship @ ByteDance-Seed & M-A-P
ICLR 2026
website /
paper
Reverse-engineering reasoning chains for creative and open-ended generation
|
|
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Haozhe Wang, Alex Su, Weiming Ren, Fangzhen Lin, Wenhu Chen
NeurIPS 2025
website /
paper /
code
Curiosity-driven RL for pixel-level visual reasoning in multimodal models
|
|
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang, Chao Qu, Zuming Huang, Wei Chu, Fangzhen Lin, Wenhu Chen
NeurIPS 2025 (Spotlight)
website /
paper /
code
Teaching VLMs to self-reflect and correct reasoning via RL-based incentives
|
|
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Jialin Yang, Dongfu Jiang, Lipeng He, Sherman Siu, Yuxuan Zhang, ..., Haozhe Wang, ..., Wenhu Chen
TMLR 2026
website /
paper /
code
Comprehensive benchmark for evaluating structured output generation in LLMs
|
|
To Code or Not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
Haozhe Wang, Long Li, Chao Qu, Fengming Zhu, Weidi Xu, Wei Chu
ACL 2025
paper
Adaptive tool integration for math reasoning via expectation-maximization
|
|
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Huaye Zeng, Dongfu Jiang, Haozhe Wang, Ping Nie, Xiaotong Chen, Wenhu Chen
ACL 2025
paper /
code
Automated test-case synthesis for reinforcement learning of code generation
|
|
CogDoc: Towards Unified Thinking in Documents
Qixin Xu, Haozhe Wang, Che Liu, Fangzhen Lin, Wenhu Chen
arXiv 2025
paper
Unified reasoning framework for complex document understanding
|
Awards
- Hong Kong PhD Fellowship Scheme (HKPFS) — 300 awardees across Hong Kong per year
- Outstanding Graduate of Shanghai — top 1% province-wide, ShanghaiTech University
- National Scholarship — top 0.2% nation-wide, Wuhan University, 2017
|
Service
- Reviewer: NeurIPS 2024–2026, ICLR 2024–2026, ICML 2025–2026, IJCV 2026, KR 2026
|
|