A probe is a linear classifier trained on the model's residual stream — the
running sum that flows between transformer layers at a single token position.
Green dots are labeled traces where the model was genuinely confident; red dots
where it was confused and wrong. Gold dots mark the frontier: correct answers
from an internally uncertain model — signal that output evaluation alone cannot
capture.