About Me
I am a PhD student at the Hong Kong University of Science and Technology (HKUST), advised by Prof. Fangzhen Lin, and in close collaboration with Prof. Wenhu Chen at the University of Waterloo and Ge Zhang at ByteDance Seed. I am fortunate to be supported by the Hong Kong PhD Fellowship Scheme (HKPFS), with only 300 awardees across Hong Kong each year.
My research focuses on Large Language Models (LLMs) and Vision-Language Models (VLMs), Reasoning, RL and agents. My recent work includes developing RL-based approaches to enhance VLM and LLM reasoning, as seen in projects like REverse-Engineered Reasoning (REER), Hierarchical-Reasoner, Pixel-Reasoner, VL-Rethinker. For a comprehensive list of my publications, please visit my Google Scholar.
Prior to joining HKUST, I worked as a Research Engineer at Alibaba, under the guidance of Dr. Chao Du. These experiences allowed me to deepen my expertise in AI and machine learning while contributing to impactful industry projects.
Prior to these working experiences, I was recognized as one of the Outstanding Graduates of Shanghai (top 1% province-wide) when studying at ShanghaiTech, and I was awarded the prestigious National Scholarship (top 0.2% nation-wide) at Wuhan University in 2017.
Notice
I am actively seeking research collaboration and research opportunities in VLMs, RL and agents, preferably remote. Let’s make real impacts to both the industry and the academia!
News
2025.09We release REverse-Engineered Reasoning (REER) for Open-Ended Generation. This provides a third path for producing high-quality deep reasoning without RL or costly distillation.
2025.09We release Hierarchical-Reasoner. We analyze the training dynamics across six text and vision-language models, identifying an emergent hierarchical reasoning through RL that underpins boost in math reasoning. This reasoning hierarchy parallels the human’s cognitive model, provides direct explanations for opaque observations of “aha moments”, “length scaling”, and points out the flaws in token entropy for tracking exploration in RL.
2025.08We release VerlTool, seamlessly integrating tool-use with the widely adopted VeRL framework. 2025.06We release AlphaMed, a minimalist zero-RL approach to medical reasoning.
2025.06We release Infinity-Parser, with a comprehensive Doc-Parsing dataset, Infinity-Doc-55K, and a strong Doc-Parser through layout-aware RL.
2025.05We release Pixel Reasoner, which studies the key reasoning paradigm behind o3/o4-mini. We introduce Pixel-Space Reasoning for the first time, and identify a critical learning trap when cultivating this novel reasoning capability.
2025.05Four papers accepted to ACL 2025! Please Check Autocode, ACECoder, SynMed, Argus.
2025.04We introduce VL-Rethinker, which explores how to incentivize deliberate thinking in VLMs. It achieves superior results on a diverse collection of multimodal benchmarks.
2025.03We release a new diffusion quantization method: TR-DQ.
2025.02We release Autocode on metacognitive tool-use LLMs for math, and ACECoder for large-scale test-case synthesis for coder RL training.
2025.01RenderWorld accepted to ICRA 2025. It studied 3D world models. Congrats to Yihua!
2024.10V-PETL Benchmark accepted to NeurIPS 2024.