Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
3 days ago
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution
upvoted
a
paper
about 1 month ago
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
upvoted
a
paper
about 1 month ago
NVIDIA Nemotron 3: Efficient and Open Intelligence
Organizations
None yet