nvidia/Alpamayo-R1-10B 4bit Model.

이λͺ¨λΈμ€ μžμœ¨μ£Όν–‰ 쀑 μˆ˜μ§‘λœ λ°μ΄ν„°λ‘œ 이벀트λ₯Ό μ˜ˆμΈ‘ν•˜λŠ” μš©λ„λ‘œ ν™œμš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μžμœ¨μ£Όν–‰μ„ ν•˜λŠ”κ²Œ μ•„λ‹ˆλΌ μžμœ¨μ£Όν–‰ 쀑 νŠΉμ • 상황이 λ°œμƒν•  것을 μ•Œλ €μ£ΌλŠ” κΈ°λŠ₯을 ν•©λ‹ˆλ‹€.

model download  ./Alpamayo-R1-10B-4bit 
 
GPU 12G/16G Memory Run able

12G Memory is  num_frames is 1 ~ 8, over OOM
 
Transformers is 4.57.5 ( 5.0.0rc not run)

nvidia/Alpamayo-R1-10B 이 λŒ€μš©λŸ‰ λ©”λͺ¨λ¦¬λ₯Ό μš”κ΅¬ν•˜κ³  4bit 둜 λ‘œλ”©ν•˜μ—¬ μ €μž₯ν•œ λͺ¨λΈμž…λ‹ˆλ‹€.
12G μ—μ„œλ„ μ‹€ν–‰κ°€λŠ₯ν•΄μ‘ŒμŠ΅λ‹ˆλ‹€λ§Œ  μ£Όμ–΄μ§€λŠ” ν”„λ ˆμž„μˆ˜λŠ” 1~8정도, κ·Έ 이상이면 OOM이 λ–¨μ–΄μ§‘λ‹ˆλ‹€.
트랜슀포머 버전 5.0.0rcμ—μ„œλŠ” λ™μž‘ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

git clone https://github.com/NVlabs/alpamayo ν•˜κ³ 
cd alpamayo
pip install .  둜 μ„€μΉ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€λ§Œ

pyproject.toml을 μˆ˜μ •ν•˜λŠ”κ²Œ μ’‹μŠ΅λ‹ˆλ‹€.
python 3.13을 μ‚¬μš©ν•˜λ©΄ requires-python = "==3.13.*"
transformers 와 torchλ₯Ό 라인을 μ œκ±°ν•˜κ³  μ„€μΉ˜ν•˜λ©΄ μ„€μΉ˜λœ 버전이 κ΅μ²΄λ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

import torch
import numpy as np
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1 import helper

model_path = "Alpamayo-R1-10B-4bit"
model = AlpamayoR1.from_pretrained(model_path, dtype=torch.bfloat16).to("cuda")

processor = helper.get_processor(model.tokenizer)

clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
print(f"Loading dataset for clip_id: {clip_id}...")
#need set access token or huggingface-cli login...
data = load_physical_aiavdataset(clip_id, t0_us=15_100_000,num_frames=1)
print("Dataset loaded.")

messages = helper.create_message(data["image_frames"].flatten(0, 1))

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=False,
    continue_final_message=True,
    return_dict=True,
    return_tensors="pt",
)

model_inputs = {
    "tokenized_data": inputs,
    "ego_history_xyz": data["ego_history_xyz"],
    "ego_history_rot": data["ego_history_rot"],
}

model_inputs = helper.to_device(model_inputs, "cuda")
torch.cuda.manual_seed_all(42)
with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
        data=model_inputs,
        top_p=0.98,
        temperature=0.6,
        num_traj_samples=1,  # Feel free to raise this for more output trajectories and CoC traces.
        max_generation_length=256,
        return_extra=True,
    )

 
print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])
gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
min_ade = diff.min()
print("minADE:", min_ade, "meters")
print(
    "Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, "
    "hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), "
    "variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb"
)



Chain-of-Causation (per trajectory):
[['Nudge to the left to pass the stopped truck encroaching into the lane.']]
minADE: 1.7749525 meters
Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb

λ‚˜λŠ” 1μž₯의 μ΄λ―Έμ§€λ‘œ νŒλ…ν•˜λŠ” 것을 ν…ŒμŠ€νŠΈ ν•˜λ €κ³  μ•„λž˜μ™€ 같은 예제λ₯Ό λ§Œλ“€μ—ˆλ‹€. 데이터 λ‘œλ”© 없이 κΈ°λ³Έ μ΄ˆκΈ°ν™”λ₯Ό ν•˜μ—¬ μ‹œμž‘μ μ—μ„œ μ‹œμž‘ν•˜λŠ” κ²ƒμ—μ„œ μ‹œμž‘ν•œλ‹€. κ΅¬λ™ν•˜κΈ° μœ„ν•΄μ„œ μ΅œμ†Œ 12G 이상인 GPUμΉ΄λ“œλ₯Ό μ‚¬μš©ν•΄μ•Ό ν•˜κ³ , 응닡속도 도 κ½€ 지연이 κ±Έλ € μ‹€μ œ μžλ™μ°¨μ— μ μš©ν•˜κΈ°μ— 무리인것 κ°™λ‹€.

#ZeroTime init Base Image(1 photo on load image)
import torch
import numpy as np
from PIL import Image
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1 import helper
 
num_history_steps = 16   # κ³Όκ±° μŠ€ν… 수
num_future_steps = 64    # 미래 μŠ€ν… 수

# 더미 μœ„μΉ˜ 데이터 (xyz μ’Œν‘œ)
ego_history_xyz = torch.zeros((1, 1, num_history_steps, 3))   # (batch, agent, steps, xyz)
ego_future_xyz  = torch.zeros((1, 1, num_future_steps, 3))

# 더미 νšŒμ „ 데이터 (3x3 νšŒμ „ν–‰λ ¬)
ego_history_rot = torch.eye(3).repeat(1, 1, num_history_steps, 1, 1)  # (1,1,steps,3,3)
ego_future_rot  = torch.eye(3).repeat(1, 1, num_future_steps, 1, 1)

print("ego_history_xyz:", ego_history_xyz.shape)
print("ego_future_xyz:", ego_future_xyz.shape)
print("ego_history_rot:", ego_history_rot.shape)
print("ego_future_rot:", ego_future_rot.shape)
N_cameras = 1 
camera_indices = torch.arange(N_cameras, dtype=torch.long)  # (N_cameras,) - long νƒ€μž… λͺ…μ‹œ

data={
        "camera_indices": camera_indices,  # (N_cameras,)
        "ego_history_xyz": ego_history_xyz,  # (1, 1, num_history_steps, 3)
        "ego_history_rot": ego_history_rot,  # (1, 1, num_history_steps, 3, 3)
        "ego_future_xyz": ego_future_xyz,  # (1, 1, num_future_steps, 3)
        "ego_future_rot": ego_future_rot,  # (1, 1, num_future_steps, 3, 3)
#        "relative_timestamps": relative_timestamps,  # (N_cameras, num_frames)
#        "absolute_timestamps": absolute_timestamps # (N_cameras, num_frames)
}
img_path = "IMG_20260116_065921.jpg" 
# μ˜ˆμΈ‘ν•˜κ³  싢은 JPG 파일 경둜
image = Image.open(img_path).convert("RGB") 
# helper.create_messageλŠ” tensor μž…λ ₯을 κΈ°λŒ€ν•˜λ―€λ‘œ λ³€ν™˜
# PIL Imageλ₯Ό numpy array둜 λ³€ν™˜ ν›„ float32둜 λ³€ν™˜
image_array = np.array(image).astype(np.float32) / 255.0  # 0-1 λ²”μœ„λ‘œ μ •κ·œν™”
image_tensor = torch.from_numpy(image_array).unsqueeze(0)  # [batch, H, W, C]
# λ©”μ‹œμ§€ 생성 
messages = helper.create_message(image_tensor)
 
# Example clip ID
model_path = "Alpamayo-R1-10B-4bit"
model = AlpamayoR1.from_pretrained(model_path, dtype=torch.bfloat16).to("cuda")
processor = helper.get_processor(model.tokenizer)
 
 

# μ„€μ •κ°’

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=False,
    continue_final_message=True,
    return_dict=True,
    return_tensors="pt",
)

model_inputs = {
    "tokenized_data": inputs,
    "ego_history_xyz": data["ego_history_xyz"],
    "ego_history_rot": data["ego_history_rot"],
}

model_inputs = helper.to_device(model_inputs, "cuda")

torch.cuda.manual_seed_all(42)
with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
        data=model_inputs,
        top_p=0.98,
        temperature=0.6,
        num_traj_samples=1,  # Feel free to raise this for more output trajectories and CoC traces.
        max_generation_length=256,
        return_extra=True,
    )

# the size is [batch_size, num_traj_sets, num_traj_samples]
print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])

gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
min_ade = diff.min()
print("minADE:", min_ade, "meters")
print(
    "Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, "
    "hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), "
    "variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb"
)

Chain-of-Causation (per trajectory):
 [['Keep lane to continue driving since the lane ahead is clear.']]
minADE: 0.55852604 meters
Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb
Downloads last month
205
Safetensors
Model size
11B params
Tensor type
F32
Β·
F16
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dwko/Alpamayo-R1-10B-4bit

Quantized
(1)
this model