Haozhe Wang

I am a second-year PhD student at the Hong Kong University of Science and Technology (HKUST) (commenced 2024.09), advised by Prof. Fangzhen Lin, and in close collaboration with Prof. Wenhu Chen at the University of Waterloo. I am supported by the Hong Kong PhD Fellowship Scheme (HKPFS).

My previous research focuses on reasoning RL, multimodal understanding, and agentic training. My recent focus shifts toward reward-based training and proactive agentic systems for visual generation and video world models.

I expect to graduate in 2027 and am actively looking for positions in industry.

I have mentored many students for research, feel free to reach out!

Feel free to connect! jasper.whz@outlook.com

Email  /  Google Scholar  /  GitHub

profile photo

News

05/2026: Bad Seeing or Bad Thinking? accepted to ICML 2026 (Oral Presentation)
02/2026: Emergent Hierarchical Reasoning accepted to ICLR 2026
02/2026: Reverse-Engineered Reasoning accepted to ICLR 2026
09/2025: VL-Rethinker accepted to NeurIPS 2025 (Spotlight)
09/2025: Pixel Reasoner accepted to NeurIPS 2025
05/2025: To Code or Not to Code accepted to ACL 2025
09/2024: Commenced PhD at HKUST with Hong Kong PhD Fellowship Scheme (HKPFS)

Research Highlights

Visual Generation (Recent)

  • RationalRewards (arXiv 2026) — reasoning rewards scale visual generation
  • Search-Augmented Agentic Generation (to appear) — search what visual generators cannot be taught
  • RenderWorld (ICRA 2025) — world model with self-supervised 3D labels

Multimodal & Agentic

  • Pixel Reasoner (NeurIPS 2025) — curiosity-driven pixel-space reasoning
  • Bad Seeing or Bad Thinking (ICML 2026 Oral) — rewarding perception for multimodal reasoning
  • VerlTool (TMLR 2026, ICLR Workshop Best Paper) — holistic agentic RL with tool use
  • CogDoc (arXiv 2025) — unified thinking in documents
  • EvoCUA (ICML 2026 Workshop) — evolving computer use agents via synthetic experience

Reasoning RL

  • VL-Rethinker (NeurIPS 2025 Spotlight) — self-reflection in VLMs via RL
  • Hierarchical Reasoner (ICLR 2026) — emergent hierarchical reasoning through RL
  • REER (ICLR 2026) — reverse-engineered reasoning for open-ended generation

Coding

First-Authored Publications

   [show selected / show all]
(*: equal contribution)
SearchGen Search-Augmented Agentic Generation: Search What Visual Generators Cannot Be Taught
Haozhe Wang, et al.
Internship @ Qwen Applications Team
To appear

Addressing real-world text-to-image requests that rely on search knowledge beyond model parameters

RationalRewards RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Haozhe Wang, Cong Wei, Weiming Ren, Jiaming Liu, Fangzhen Lin, Wenhu Chen
arXiv 2026
website / paper / code

Scaling visual generation quality via reasoning-based reward models at both train and test time

Starve to Perceive Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth
Yuhuan Wu, Cong Wei, Fangzhen Lin, Wenhu Chen, Haozhe Wang
arXiv 2026
paper

Constraining visual bandwidth to force more attentive perception in VLMs

Bad Seeing Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning
Haozhe Wang, Qixin Xu, Changpeng Wang, Taofeng Xue, Chong Peng, Wenhu Chen
Internship @ Meituan LongCat Team
ICML 2026 (Oral Presentation)
paper

Disentangling perception from reasoning in VLMs via targeted reward signals

Hierarchical Reasoning Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Haozhe Wang, Qixin Xu, Che Liu, Junhong Wu, Fangzhen Lin, Wenhu Chen
ICLR 2026
website / paper / code

RL induces emergent hierarchical decomposition of complex reasoning tasks

Reverse Reasoning Reverse-Engineered Reasoning for Open-Ended Generation
Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou, Jiazhan Feng
Internship @ ByteDance-Seed & M-A-P
ICLR 2026
website / paper

Reverse-engineering reasoning chains for creative and open-ended generation

Pixel Reasoner Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Haozhe Wang, Alex Su, Weiming Ren, Fangzhen Lin, Wenhu Chen
NeurIPS 2025
website / paper / code

Curiosity-driven RL for pixel-level visual reasoning in multimodal models

VL-Rethinker VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang, Chao Qu, Zuming Huang, Wei Chu, Fangzhen Lin, Wenhu Chen
NeurIPS 2025 (Spotlight)
website / paper / code

Teaching VLMs to self-reflect and correct reasoning via RL-based incentives

StructEval StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Jialin Yang, Dongfu Jiang, Lipeng He, Sherman Siu, Yuxuan Zhang, ..., Haozhe Wang, ..., Wenhu Chen
TMLR 2026
website / paper / code

Comprehensive benchmark for evaluating structured output generation in LLMs

To Code or Not to Code To Code or Not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
Haozhe Wang, Long Li, Chao Qu, Fengming Zhu, Weidi Xu, Wei Chu
ACL 2025
paper

Adaptive tool integration for math reasoning via expectation-maximization

ACECODER ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Huaye Zeng, Dongfu Jiang, Haozhe Wang, Ping Nie, Xiaotong Chen, Wenhu Chen
ACL 2025
paper / code

Automated test-case synthesis for reinforcement learning of code generation

CogDoc CogDoc: Towards Unified Thinking in Documents
Qixin Xu, Haozhe Wang, Che Liu, Fangzhen Lin, Wenhu Chen
arXiv 2025
paper

Unified reasoning framework for complex document understanding

Awards

  • Hong Kong PhD Fellowship Scheme (HKPFS) — 300 awardees across Hong Kong per year
  • Outstanding Graduate of Shanghai — top 1% province-wide, ShanghaiTech University
  • National Scholarship — top 0.2% nation-wide, Wuhan University, 2017

Service

  • Reviewer: NeurIPS 2024–2026, ICLR 2024–2026, ICML 2025–2026, IJCV 2026, KR 2026