PlantUML sequence diagrams showing MCP tool interactions during integration test runs across models and fixture sizes
High-level comparison of agent behavior across all model and fixture combinations, showing issue reduction, tool call counts, and durations.
Basic MCP server connectivity probe and tool discovery. Validates server startup and linter enumeration (113 supported linters).
8 issues reduced to 0 in 5m17s with 19 tool calls. Agent calls skill, runs golangci-lint, applies fixes, and verifies clean build.
8 issues reduced to 0 in 1m56s with 23 tool calls. Fastest completion among simple fixture tests.
30 issues reduced to 0 in 14m44s with 35 tool calls. Demonstrates sustained multi-pass fixing strategy.
30 issues reduced to 0 in 8m51s with 36 tool calls. Nearly identical tool call count to GLM-5.1 but significantly faster.
116 issues reduced to 0 in 27m15s with 27 tool calls. Most efficient tool usage across all large fixture tests.
116 issues reduced to 0 in 20m49s with 46 tool calls. Most tool calls but fastest large fixture completion.
Multi-package fixture showing cross-package linter issue resolution. Agent navigates imports across package boundaries.
Multi-package fixture with GLM-5-Turbo. Faster cross-package resolution with higher tool call efficiency.
Simple fixture test with GLM-4-7 model. Demonstrates baseline issue resolution capability.
Medium fixture test with GLM-4-7 model. Shows sustained multi-pass fixing strategy on moderately complex codebase.
Large fixture test with GLM-4.7 model. Demonstrates issue resolution capability on complex multi-file codebases.
Multi-package fixture with GLM-4.7 model. Cross-package linter issue resolution across package boundaries.
Autofix-only fixture with GLM-5-Turbo. Agent resolves perfsprint and whitespace issues that golangci-lint --fix can handle automatically.
Autofix-only fixture with GLM-5.1. Demonstrates agent handling of automatically-fixable perfsprint and whitespace diagnostics.
Autofix-only fixture with GLM-4.7. Tests baseline model capability on simple autofixable issues.