Experiment 3 — Adversarial Visual Patches

Goal

Test whether the language conditioning of a vision-language-action policy can be cleanly hijacked through the vision channel alone — a fixed RGB patch in the input image, no weight changes, no prompt access, no environment dynamics changes.

Hypothesis

A patch optimised by action-sequence distillation — supervising the patched policy's action chunks against a clean teacher's action chunks — can override the language signal and redirect the arm to a different target object than the operator names.

Sub-Experiments

3.1

Hijacking GR00T N1.6 with a Visible Patch

A fixed 84×320 RGB patch redirects nvidia/GR00T-N1.6-fractal across prompts: with the patch in view and the policy receiving “pick up the orange”, the arm picks up the bottle in 5 / 10 simulator-marked rollouts on a fixed 3-object scene.

First result — 50%

Status

One sub-experiment landed at 50% on a single scene seed. Transferability across seeds, embodiments, prompts, and tasks remains open and is the natural next step.