One task, one 3×3 grid: pick what the user wants below, then pick how the agent actually ran. When they agree, alignment is high; when they don't, it drops.
The task — same case in every cell of the grid below
Check flight prices to Lisbon for next weekend. If anything's under $400, book it and add it to my calendar.
Steps: check_flight_price → book_flight → add_calendar_event
User Preference — what they want
"Let me know exactly what you find before we go any further."
Agent Setting — how it actually ran
Confirms before every single action — no exceptions.
Agent confirmed before every action, exactly matching the need for step-by-step control (3/3 actions confirmed).
This scenario is illustrative (synthetic) — built to clearly show how Preference Alignment rises and falls with match vs. mismatch, not pulled from a single real benchmark run. The Narrative / Dialogue Control tool types shown are real, from the PrefIx interaction-tool taxonomy.