Homepage

Welcome to Pengyu Cheng’s homepage!

I am a researcher at Alibaba Qwen Applications Business Group, leading RL training of the Qwen Large Model Application Team. We primarily focus on enhancing LLMs’ foundational capacity via training techniques such as RLHF, RLVR, and Agentic RL. I am an enthusiast about LLM self-evolution via multi-agent gaming, which I believe is the most promising path to unleashing the potential of large models.

I was previously a member of the RL & Agent Team at Moonshot (Kimi) AI, and the Hunyuan LLM Team at Tencent AI Lab. Besides, I have substantial experience in NLP fairness, text generation, representation learning, and information theory.

I received my Ph.D. from the Department of Electrical and Computer Engineering at Duke University in 2021. My Ph.D. advisor is Dr. Lawrence Carin. I graduated with my B.S. from the Department of Mathematical Sciences at Tsinghua University in 2017.

Selected Publications:


Self-playing Adversarial Language Game Enhances LLM Reasoning	P. Cheng, T. Hu, H. Xu, Z. Zhang, Y. Dai, L. Han, N. Du	2024
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Games	P. Cheng, Y. Yang, J. Li*, Y. Dai, T. Hu, P. Cao, N. Du, X. Li	2024
Everyone Deserves A Reward: Learning Customized Human Preferences	P. Cheng, J. Xie, K. Bai, Y. Dai, N. Du	2023
Replacing Language Model for Style Transfer	P. Cheng, R. Li	2022
FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders	P. Cheng, W. Hao, S. Yuan, S. Si, L. Carin	2021
CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information	P. Cheng, W. Hao, S. Dai, J. Liu, Z. Gan, L. Carin	2020

Timeline:

[2026/01/26] Two papers (SSP and DIR, as the corresponding author) got accepted by ICLR 2026.
[2025/12/21] Began serving as an Area Chair (AC) of ARR 2026.
[2025/12/19] One co-first-author paper (VAR) got accepted by TMLR 2026.
[2025/02/22] Began serving as an Area Chair (AC) of ARR 2025.
[2025/01/23] One LLM safety paper (Atoxia) got accepted by NAACL 2025.
[2024/09/26] One LLM self-play paper (SPAG) got accepted by NeurIPS 2024.
[2024/09/22] One RLHF paper (MORe) got accepted by EMNLP 2024.
[2024/05/16] Two LLM papers (APO and SimCAS) got accepted by ACL 2024.
[2023/01/21] Two co-first-author papers (TC-estimation and FairTextGen) got accepted by AISTATS 2023.
[2022/02/28] One paper (SoTead) got accepted by findings of ACL 2022.
[2021/03/30] Finished my Ph.D. final defense! What an unforgettable journey!
[2021/01/12] Two co-first-author papers (FairFil and IDE-VC) got accepted by ICLR 2021.
[2020/12/17] Began serving as a senior PC for IJCAI 2021.
[2020/12/14] Passed the preliminary exam and became a Ph.D. candidate.
[2020/06/08] Started internship at Microsoft, supervised by Dr. Jingjing Liu.
[2020/06/01] One paper got accepted by ICML 2020.
[2020/04/03] One paper got accepted by ACL 2020.
[2020/02/09] Provided an oral presentation at AAAI 2020.
[2019/11/09] One paper got accepted for an oral presentation at AAAI 2020.
[2019/08/07] Began serving as a reviewer for AAAI 2020.
[2019/05/28] Started internship at NEC Labs America, advised by Dr. Martin Renqiang Min.
[2019/05/13] Two papers got accepted by ACL 2019. My co-first-author paper was selected for an oral presentation.
[2019/04/24] One paper got accepted by ICML 2019.
[2018/11/16] One paper got accepted by NeurIPS 2018 Bayesian Deep Learning workshop as a spotlight presentation.