shuoxing/qwen3-4b-thinking-full-pretrain-control-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated about 10 hours ago
shuoxing/qwen3-4b-thinking-full-pretrain-mix-high-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated about 12 hours ago
shuoxing/qwen3-4b-thinking-full-pretrain-mix-mid-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated about 15 hours ago • 6
shuoxing/qwen3-4b-thinking-full-pretrain-mix-low-tweet-1m-en-reproduce-bs128 Text Generation • 196k • Updated about 19 hours ago • 12
shuoxing/qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8 Text Generation • 333k • Updated 1 day ago • 14
shuoxing/qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-reproduce-bs8 Text Generation • 333k • Updated 1 day ago • 22
shuoxing/qwen2-5-7b-full-pretrain-mix-mid-tweet-1m-en-reproduce-bs8 Text Generation • 333k • Updated 1 day ago • 12
shuoxing/qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8 Text Generation • 333k • Updated 1 day ago • 17
shuoxing/llama3-8b-full-sft-control-tweet-1m-en-reproduce-bs128 Text Generation • 266k • Updated 15 days ago • 33
shuoxing/llama3-8b-full-sft-mix-high-tweet-1m-en-reproduce-bs128 Text Generation • 266k • Updated 15 days ago • 27
shuoxing/llama3-8b-full-sft-mix-mid-tweet-1m-en-reproduce-bs128 Text Generation • 266k • Updated 15 days ago • 32
shuoxing/llama3-8b-full-sft-mix-low-tweet-1m-en-reproduce-bs128 Text Generation • 266k • Updated 15 days ago • 38
shuoxing/llama3-8b-full-sft-mix-high-tweet-1m-en-reproduce-bs16 Text Generation • 266k • Updated 25 days ago • 12
shuoxing/llama3-8b-full-sft-mix-mid-tweet-1m-en-reproduce-bs16 Text Generation • 266k • Updated 25 days ago • 13
shuoxing/llama3-8b-full-sft-mix-low-tweet-1m-en-reproduce-bs16 Text Generation • 266k • Updated 25 days ago • 13
shuoxing/llama3-8b-full-pretrain-control-tweet-1m-en-reproduce-bs8 Text Generation • 266k • Updated 27 days ago • 16
shuoxing/llama3-8b-full-pretrain-mix-high-tweet-1m-en-reproduce-bs8 Text Generation • 266k • Updated 27 days ago • 38
shuoxing/llama3-8b-full-pretrain-mix-mid-tweet-1m-en-reproduce-bs8 Text Generation • 266k • Updated 27 days ago • 32
shuoxing/llama3-8b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8 Text Generation • 266k • Updated 28 days ago • 36
shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce-bs8 Text Generation • 266k • Updated 29 days ago • 14
shuoxing/llama3-8b-full-pretrain-junk-tweet-1m-en-reproduce Text Generation • 8B • Updated about 1 month ago • 1
shuoxing/llama3-8b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated Dec 13, 2025 • 1
shuoxing/llama3-8b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated Dec 13, 2025
shuoxing/llama3-8b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated Dec 13, 2025 • 1
shuoxing/llama3-8b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 266k • Updated Dec 13, 2025
shuoxing/qwen-0_5b-full-pretrain-control-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated Dec 13, 2025 • 1
shuoxing/qwen-0_5b-full-pretrain-mix-high-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated Dec 13, 2025
shuoxing/qwen-0_5b-full-pretrain-mix-mid-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated Dec 13, 2025
shuoxing/qwen-0_5b-full-pretrain-mix-low-tweet-1m-en-no-packing-new-sft-bs128 Text Generation • 0.5B • Updated Dec 13, 2025 • 1