Personalization doesn't help every preference category equally. These charts break the aggregate numbers down by the 4 preference categories to show where adaptation pays off most.
Transparency & Auditability sees the largest alignment gains from personalization — agents are already better at adapting to transparency-related preferences than to ones requiring holistic changes to global interaction patterns (Strategy & Initiative, Robustness & Adaptability).
Robustness & Adaptability is the dominant driver of Interaction Efficiency gains; Transparency & Auditability dominates Cognitive Load and Initiative Timing gains — evidence that alignment isn't monolithic, and different preference categories pull different UX levers.
| Category | Initiative | Coherence | Alignment Drift | Consistency | Efficiency | Cognitive Load | Overall UX |
|---|---|---|---|---|---|---|---|
| Transparency & Auditability | +0.39 | +0.16 | +0.17 | +0.24 | +0.20 | +0.48 | +0.49 |
| Interaction Pace & Flow | +0.28 | +0.09 | +0.10 | +0.15 | +0.18 | +0.27 | +0.37 |
| Strategy & Initiative | +0.26 | +0.10 | +0.10 | +0.07 | +0.14 | +0.16 | +0.29 |
| Robustness & Adaptability | +0.25 | +0.19 | +0.13 | +0.10 | +0.23 | +0.23 | +0.31 |
Darker cells = larger gain. The outlined cell in each column is the category that dominates that UX dimension.
Even with preference history, alignment averages only 3.882/5.0 — a consistent but unsaturated +18.5% gain, not a solved problem.
It scores lowest across all UX dimensions in both baseline and personalized conditions — reducing unnecessary steps stays hard.
Specific preference categories disproportionately drive specific UX gains (e.g., Robustness & Adaptability → Efficiency), guiding where to focus model training.
Claude Opus 4.5 and Kimi K2 show more modest UX gains (+3.6–3.8%) from personalization, suggesting stronger baseline models have limited room for further improvement.
Category aggregates computed from the real per-task LLM-judge scores across all 31 preference settings and 4 models (not estimated). Takeaways above are drawn from the authors' own discussion of results (ACL rebuttal) and arXiv 2602.06714v1.