About Me

I am a PhD student at the Hong Kong University of Science and Technology (HKUST), advised by Prof. Fangzhen Lin, and in close collaboration with Prof. Wenhu Chen at the University of Waterloo. I am fortunate to be supported by the Hong Kong PhD Fellowship Scheme (HKPFS), with only 300 awardees across Hong Kong each year.

My research focuses on Large Language Models (LLMs) and Vision-Language Models (VLMs), Reasoning, RL and agents. My recent work includes developing RL-based approaches to enhance VLM and LLM reasoning, as seen in projects like VL-Rethinker, Autocode, ACECoder, and HTL. For a comprehensive list of my publications, please visit my Google Scholar.

Prior to joining HKUST, I worked as a Researcher at INF Technology, advised by Dr. Wei Chu, and as an AI Engineer at Alibaba, under the guidance of Dr. Chao Du. These experiences allowed me to deepen my expertise in AI and machine learning while contributing to impactful industry projects.

Prior to these working experiences, I was recognized as one of the Outstanding Graduates of Shanghai (top 1% province-wide) when studying at ShanghaiTech, and I was awarded the prestigious National Scholarship (top 0.2% nation-wide) at Wuhan University in 2017.

Notice

I am actively seeking research collaboration and research opportunities in VLMs, RL and agents, preferably remote. Let’s make real impacts to both the industry and the academia!

News

2025.05We release Pixel Reasoner, which studies the key reasoning paradigm behind o3/o4-mini. We introduce Pixel-Space Reasoning for the first time, and identify a critical learning trap when cultivating this novel reasoning capability.

2025.05Four papers accepted to ACL 2025! Please Check Autocode, ACECoder, SynMed, Argus.

2025.04We introduce VL-Rethinker, which explores how to incentivize deliberate thinking in VLMs. It achieves superior results on a diverse collection of multimodal benchmarks.

2025.03We release a new diffusion quantization method: TR-DQ.

2025.02We release Autocode on metacognitive tool-use LLMs for math, and ACECoder for large-scale test-case synthesis for coder RL training.

2025.01RenderWorld accepted to ICRA 2025. It studied 3D world models. Congrats to Yihua!

2024.10V-PETL Benchmark accepted to NeurIPS 2024.

2024.08HTL accepted to EMNLP 2024. It studied tool-integrated reasoning for math reasoning.