New fine-tuning run — Physical AI

1 Base model

2 Dataset

3 Configure

4 Review & launch

Choose a base model

Pick an open-source Vision–Language–Action model to fine-tune on your data. All models share the same training and inference contract.

OpenPI π_0.5

Physical Intelligence

Selected

Generalist VLA. SigLIP vision tower + Gemma-2B LLM + a dedicated 300M action expert. Strong on pick-and-place and household manipulation. See on phail.ai →

~3.3B params· 1× H100· ~50 min / 100 ep

GR00T N1.6

NVIDIA

Foundation VLA from NVIDIA. Eagle3 vision-language backbone + diffusion-based action head. Multi-embodiment — works on arms, humanoids and mobile bases.

~3.1B params· 1× H100· ~75 min / 100 ep

SmolVLA

Hugging Face

Efficient open VLA from the LeRobot team. Best when iteration speed and edge deployment matter more than peak quality.

~450M params· 1× L40S· ~12 min / 100 ep

ACT

LeRobot

Action Chunking Transformer. The classic single-task imitation-learning baseline — small, fast, predictable.

~60M params· 1× L40S· ~6 min / 100 ep

DreamZero

NVIDIA

Generative world-model policy. Predicts video and actions per chunk — strong on multi-task, long-horizon behaviour. Heavier to fine-tune.

~4B params· 8× H100· ~3 h / 100 ep

Bring your own

Custom checkpoint

Bring a Docker container that responds to the Positronic training and inference command lines. Push to your registry, point at the image — your model shows up here.

You define the hardware

Continue: Dataset →