11 12

Zhenru Zhang

Zhenru

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

upvoted a paper 11 days ago

Soft Adaptive Policy Optimization

authored a paper about 2 months ago

DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

View all activity

Organizations

upvoted a paper 5 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 6 days ago • 77

upvoted a paper 11 days ago

Soft Adaptive Policy Optimization

Paper • 2511.20347 • Published 12 days ago • 33

authored 7 papers about 2 months ago

DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Paper • 2201.03335 • Published Jan 10, 2022 • 1

Contrastive Demonstration Tuning for Pre-trained Language Models

Paper • 2204.04392 • Published Apr 9, 2022 • 1

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18, 2024 • 4

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317

Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning

Paper • 2412.14780 • Published Dec 19, 2024

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Paper • 2510.08276 • Published Oct 9 • 9

upvoted a paper 4 months ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 313

authored a paper 6 months ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 187

upvoted a paper 6 months ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 187

upvoted 2 papers 7 months ago

WorldPM: Scaling Human Preference Modeling

Paper • 2505.10527 • Published May 15 • 34

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317

updated a model 7 months ago

Qwen/WorldPM-72B-UltraFeedback

Text Classification • 73B • Updated May 17 • 207 • 5

authored a paper 7 months ago

WorldPM: Scaling Human Preference Modeling

Paper • 2505.10527 • Published May 15 • 34

authored a paper 9 months ago

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113

upvoted a paper 9 months ago

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113

upvoted a paper 11 months ago

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 66

updated a model 11 months ago

Qwen/Qwen2.5-Math-7B-PRM800K

Text Classification • 8B • Updated Jan 17 • 2.2k • 20

Zhenru Zhang

AI & ML interests

Recent Activity

Organizations

Zhenru's activity