Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
naufalso 's Collections
Open Source LLM Datasets
Paper to Read (Agent Safety Benchmark)
Paper to Read (LLM Training and Function Calling)
LLM in Cybersecurity
State-of-the-art Open-Source LLM (General)

Open Source LLM Datasets

updated 4 days ago

List of open-source LLM datasets

Upvote
-

  • Nemotron-Post-Training-v3

    Collection
    Collection of datasets used in the post-training phase of Nemotron Nano v3. • 8 items • Updated 18 days ago • 65

  • Nemotron-Pre-Training-Datasets

    Collection
    Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 18 days ago • 97

  • Tulu 3 Datasets

    Collection
    All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated Dec 23, 2025 • 95

  • Olmo 3 Post-training

    Collection
    All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated Dec 23, 2025 • 50

  • Olmo 3 Pre-training

    Collection
    All artifacts related to Olmo 3 pre-training • 10 items • Updated Dec 23, 2025 • 33
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs