Publications

Below is a list of recent publications that are representative of my current research. A full list is on Google Scholar.
* denotes co-first authors

Language Agents

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Junlong Li*, Wenshuo Zhao*, Jian Zhao*, Weihao Zeng*, Haoze Wu*, Xiaochen Wang, Rui Ge, Yuxuan Cao, Yuzhen Huang, Wei Liu, Junteng Liu, Zhaochen Su, Yiyang Guo, Fan Zhou, Lueyang Zhang, Juan Michelini, Xingyao Wang, Xiang Yue, Shuyan Zhou, Graham Neubig, Junxian He
ICLR 2026. arxiv github website

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Junteng Liu*, Yunji Li*, Chi Zhang, Jingyang Li, Aili Chen, Ke Ji, Weiyu Cheng, Zijia Wu, Chengyu Du, Qidi Xu, Jiayuan Song, Zhengmao Zhu, Wenhu Chen, Pengyu Zhao, Junxian He
Preprint 2025. arxiv github

SWE-RM: Execution-free Feedback For Software Engineering Agents
KaShun Shum*, Binyuan Hui*, Jiawei Chen, Lei Zhang, X. W., Jiaxi Yang, Yuzhen Huang, Junyang Lin, Junxian He
ICLR 2026. arxiv

Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Weihao Zeng, Keqing He, Chuqiao Kuang, Xiaoguang Li, Junxian He
ICLR 2026. arxiv

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Chang Ma*, Junlei Zhang*, Zhihao Zhu*, Cheng Yang*, Yujiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong, Junxian He
NeurIPS 2024 (Datasets and Benchmarks Track). Oral arxiv github

Reinforcement Learning, Self-Improving, Synthetic Data

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng*, Yuzhen Huang*, Qian Liu*, Wei Liu, Keqing He, Zejun MA, Junxian He
COLM 2025. arxiv github

CodeIO: Condensing Reasoning Patterns via Code Input-Output Prediction
Junlong Li, Daya Guo, Dejian Yang, Runxin Xu, Yu Wu, Junxian He
ICML 2025. Oral arxiv github

Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions
Haoze Wu*, Cheng Wang*, Wenshuo Zhao, Junxian He
ICLR 2026. arxiv github

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng*, Yuzhen Huang*, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He
ICLR 2025. arxiv github

DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Yuxuan Tong, Xiwen Zhang, Rui Wang, Ruidong Wu, Junxian He
NeurIPS 2024. arxiv github

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Wei Liu*, Weihao Zeng*, Keqing He, Yong Jiang, Junxian He
ICLR 2024. arxiv github

Evaluation

Compression Represents Intelligence Linearly
Yuzhen Huang*, Jinghan Zhang*, Zifei Shan, Junxian He
COLM 2024. arxiv code

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Yuzhen Huang*, Yuzhuo Bai*, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, Junxian He
NeurIPS 2023 (Datasets and Benchmarks track). arxiv github website dataset

FELM: Benchmarking Factuality Evaluation of Large Language Models
Shiqi Chen, Yiran Zhao, Jinghan Zhang, I-Chun Chern, Siyang Gao, Pengfei Liu, Junxian He
NeurIPS 2023 (Datasets and Benchmarks track). arxiv github website dataset

Some Earlier Interesting Works (Post-Training, Synthetic Data, PGMs)

Towards a Unified View of Parameter-Efficient Transfer Learning
Junxian He*, Chunting Zhou*, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig
ICLR 2022. Spotlight OpenReview arxiv code

CTRLsum: Towards Generic Controllable Text Summarization
Junxian He, Wojciech Kryściński, Bryan McCann, Nazneen Rajani, Caiming Xiong
EMNLP 2022. arxiv code huggingface demo streamlit demo

Revisiting Self-Training for Neural Sequence Generation
Junxian He*, Jiatao Gu*, Jiajun Shen, Marc’Aurelio Ranzato
ICLR 2020. arxiv code

Lagging Inference Networks and Posterior Collapse in Variational Autoencoders
Junxian He, Daniel Spokoyny, Graham Neubig, Taylor Berg-Kirkpatrick
ICLR 2019. arxiv code

Unsupervised Learning of Syntactic Structure with Invertible Neural Projections
Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick
EMNLP 2018. arxiv code