Model Comparison Overview

High-level comparison of agent behavior across all model and fixture combinations, showing issue reduction, tool call counts, and durations.

Probe Test Sequence

Basic MCP server connectivity probe and tool discovery. Validates server startup and linter enumeration (113 supported linters).

Simple Fixture — GLM-5.1

Simple GLM-5.1

8 issues reduced to 0 in 5m17s with 19 tool calls. Agent calls skill, runs golangci-lint, applies fixes, and verifies clean build.

Simple Fixture — GLM-5-Turbo

Simple GLM-5-Turbo

8 issues reduced to 0 in 1m56s with 23 tool calls. Fastest completion among simple fixture tests.

Medium Fixture — GLM-5.1

Medium GLM-5.1

30 issues reduced to 0 in 14m44s with 35 tool calls. Demonstrates sustained multi-pass fixing strategy.

Medium Fixture — GLM-5-Turbo

Medium GLM-5-Turbo

30 issues reduced to 0 in 8m51s with 36 tool calls. Nearly identical tool call count to GLM-5.1 but significantly faster.

Large Fixture — GLM-5.1

Large GLM-5.1

116 issues reduced to 0 in 27m15s with 27 tool calls. Most efficient tool usage across all large fixture tests.

Large Fixture — GLM-5-Turbo

Large GLM-5-Turbo

116 issues reduced to 0 in 20m49s with 46 tool calls. Most tool calls but fastest large fixture completion.

Multipkg Fixture — GLM-5.1

Multipkg GLM-5.1

Multi-package fixture showing cross-package linter issue resolution. Agent navigates imports across package boundaries.

Multipkg Fixture — GLM-5-Turbo

Multipkg GLM-5-Turbo

Multi-package fixture with GLM-5-Turbo. Faster cross-package resolution with higher tool call efficiency.

Simple Fixture — GLM-4-7

Simple GLM-4-7

Simple fixture test with GLM-4-7 model. Demonstrates baseline issue resolution capability.

Medium Fixture — GLM-4-7

Medium GLM-4-7

Medium fixture test with GLM-4-7 model. Shows sustained multi-pass fixing strategy on moderately complex codebase.

Large Fixture — GLM-4.7

Large GLM-4.7

Large fixture test with GLM-4.7 model. Demonstrates issue resolution capability on complex multi-file codebases.

Multipkg Fixture — GLM-4.7

Multipkg GLM-4.7

Multi-package fixture with GLM-4.7 model. Cross-package linter issue resolution across package boundaries.

Autofix Fixture — GLM-5-Turbo

Autofix GLM-5-Turbo

Autofix-only fixture with GLM-5-Turbo. Agent resolves perfsprint and whitespace issues that golangci-lint --fix can handle automatically.

Autofix Fixture — GLM-5.1

Autofix GLM-5.1

Autofix-only fixture with GLM-5.1. Demonstrates agent handling of automatically-fixable perfsprint and whitespace diagnostics.

Autofix Fixture — GLM-4.7

Autofix GLM-4.7

Autofix-only fixture with GLM-4.7. Tests baseline model capability on simple autofixable issues.