Outside of Saves, how did the replacement closer compare statistically to the established closer before the swap was made?
It sounds like your complaint isn't necessarily that stats weren't used enough to set the pitching roles, maybe just that the AI isn't respecting the inertia of team roles the way a real team might.
That might be an issue in its own right, but it wouldn't be one with the evaluation logic.
|