Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
JonnaMat 
posted an update 3 days ago
Post
2466
⚡ Blackwell-native Vision Reasoning at the edge ⚡

Released a NVFP4A16-variant of nvidia/Cosmos-Reason2-2B:
embedl/Cosmos-Reason2-2B-NVFP4A16

💖 Optimized for Blackwell with minimal accuracy drop compared to its FP16 counterpart.

Thorough on-device benchmarks on AGX Thor in the modelcard. 🤓 📊

Try it out:
docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  nvcr.io/nvidia/vllm:26.01-py3 \
  vllm serve "embedl/Cosmos-Reason2-2B-NVFP4A16" \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 1 \
    --max-model-len 16384 \
    --gpu-memory-utilization 0.9
In this post