Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
34
73
108
Li Dong
unilm
Follow
Talha1920's profile picture
tusharg92's profile picture
shuyuej's profile picture
50 followers
ยท
18 following
AI & ML interests
Language Model Pre-Training
Recent Activity
authored
a paper
about 13 hours ago
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
authored
a paper
about 13 hours ago
Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts
authored
a paper
about 13 hours ago
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
View all activity
Organizations
Articles
1
Article
6
Differential Transformer V2
Papers
81
arxiv:
2601.08808
arxiv:
2511.10643
arxiv:
2510.26658
arxiv:
2510.24514
Expand 81 papers
spaces
1
Runtime error
4
Promptist
๐
models
0
None public yet
datasets
0
None public yet