Benchmark composition and balance — demonstrating the evaluation treats every preference setting and model equally.
| Statistic | Value |
|---|---|
| System Tool Classes | 10 |
| Interaction Tools | 13 (10 Narrative + 3 Control) |
| Preference Settings | 31 |
| Preference Attributes | 14 |
| Avg Samples / Setting | ~10 |
| Total Samples | 283 |
| Tool Name | Type | Description |
|---|---|---|
| Message_tool_invocation | Narrative (Type 1) | Announces which tool will be called and why. |
| Message_tool_invocation_logic | Narrative (Type 1) | Explains the reasoning behind tool selection. |
| Message_display_params | Narrative (Type 1) | Shows parameters being passed to a tool. |
| Message_source_report | Narrative (Type 1) | Reports the source of retrieved information. |
| Message_show_sequential_output | Narrative (Type 1) | Displays output of each step in a chain. |
| Message_show_layered_presentation | Narrative (Type 1) | Presents summary first, then detail on demand. |
| Message_tool_failure_notice | Narrative (Type 1) | Notifies user of a tool failure briefly. |
| Message_tool_failure_logic | Narrative (Type 1) | Explains the root cause when a tool fails. |
| Message_tool_switch_notice | Narrative (Type 1) | Informs user when switching to an alternative tool. |
| Message_tool_abort | Narrative (Type 1) | Halts the workflow and notifies user on failure. |
| Message_confirmation | Control (Type 2) | Requests explicit user confirmation before proceeding. |
| Message_information_seeking | Control (Type 2) | Requests missing information required to proceed. |
| Message_disambiguation | Control (Type 2) | Asks for clarification when intent is ambiguous. |
| Dimension | Attribute | Settings | Count |
|---|---|---|---|
| Transparency & Auditability | Tool Transparency | HighMediumLow | 3 |
| Transparency & Auditability | Parameter Transparency | HighMediumLow | 3 |
| Transparency & Auditability | Source Transparency | HighLow | 2 |
| Interaction Pace & Flow | Confirmation | EachSilentBatch | 3 |
| Interaction Pace & Flow | Presentation | CompactLayered | 2 |
| Interaction Pace & Flow | Info Collection | UpfrontGradual | 2 |
| Interaction Pace & Flow | Disambiguation | UpfrontGradual | 2 |
| Interaction Pace & Flow | Chain Execution | ParallelSequential | 2 |
| Strategy & Initiative | Initiative | ProactiveReactive | 2 |
| Strategy & Initiative | Tool Invocation | SingleMultiple | 2 |
| Robustness & Adaptability | Tool Abortion | StopContinue | 2 |
| Robustness & Adaptability | Tool Switching | High AgencyLow Agency | 2 |
| Robustness & Adaptability | Error Retry | SilentEscalation | 2 |
| Robustness & Adaptability | Error Discovery | BriefDetail | 2 |