Integration Test Report

Generated: 2026-05-06 00:05:14

Pass Rate

100%
15/15 tests

Models Tested

GLM-4.7
GLM-5.1
GLM-5-Turbo

Fixtures

Large
Medium
Multipkg
Simple
Autofix

Avg Issue Reduction

100%

Model Comparison

Fixture Model Before After Reduction Builds nolint Tool Calls Tokens Cost Retries Status
Autofix GLM-4.7 4 0 100.0% 0 3 37.9k $0.00 0 PASS
Autofix GLM-5.1 4 0 100.0% 0 3 37k $0.00 0 PASS
Autofix GLM-5-Turbo 4 0 100.0% 0 2 24.2k $0.00 0 PASS
Large GLM-4.7 116 0 100.0% 0 112 915.4k $0.00 0 PASS
Large GLM-5.1 116 0 100.0% 0 54 1.6M $0.00 0 PASS
Large GLM-5-Turbo 116 0 100.0% 0 46 (119 via 6 subagents) 2.9M $0.00 0 PASS
Medium GLM-4.7 30 0 100.0% 0 82 2.4M $0.00 0 PASS
Medium GLM-5.1 30 0 100.0% 0 29 717.6k $0.00 0 PASS
Medium GLM-5-Turbo 30 0 100.0% 0 30 1M $0.00 0 PASS
Multipkg GLM-4.7 29 0 100.0% 0 18 (74 via 4 subagents) 1.2M $0.00 0 PASS
Multipkg GLM-5.1 29 0 100.0% 0 7 (69 via 4 subagents) 999.2k $0.00 0 PASS
Multipkg GLM-5-Turbo 29 0 100.0% 0 15 (64 via 4 subagents) 1.1M $0.00 0 PASS
Simple GLM-4.7 8 0 100.0% 0 33 532.6k $0.00 0 PASS
Simple GLM-5.1 8 0 100.0% 0 25 581.9k $0.00 0 PASS
Simple GLM-5-Turbo 8 0 100.0% 0 32 642.2k $0.00 0 PASS