Indeed, I was itching for this sort of paradox to be pointed out (thank you), and some results apparently does not make sense at first glance like this.
One biggest caveat in anything being done here is that, especially for those 25-game sets, there are sampling issues and also sample size may not be big enough for some statistics to stabilize.
Since the time you asked the question, the batting average and BABIP for the 4/15 set has changed quite a bit, so at least the direction of the change is no longer as contradictory as you mentioned.
In general, I tend to trust less those numbers that I can gather in small numbers at once during a single game (like hits, doubles, HRs), because they converge to reliable averages only after a sample gets bigger.
And a 25- game set is still not that a large sample and is actually a good example where an anomalous game can skew overall averages unless interpreted with care.
One game, CWS vs. MIN game was such a crazy game in which each team had 21 hits and ended by a score of 15-13. Both teams hit 11 HRs in total (!). That game alone totally skewed the results for the rest of the measurements, and I think that game alone was a major factor why the contradiction you mentioned above happened.
If I "assume" that game did not happen, remove it from the sample and re-compute the averages, the batting average (after 20 games now) is .259 and BABIP is .310, which are less contradictory and much closer to what I wanted to see happen by decreasing Strike Frequency by one.
Of course you cannot always do this sort of deliberate selection to simply reject data you do not like or do not fit to your personal definition of what should happen. But when you are working with a small sample and you know you happened to be extremely unlucky to see anomalous game or two, sometimes it is useful to "clip" data points at both ends (reject highest and lowest "anomalies," just to be fair) to see what story the rest of data tell. (It may be more useful to compute medians instead of averages in order to avoid this issue.)
Still, the difficult thing is to evaluate what are anomalous though... it's still possible the slider combination is susceptible for that kind of "anomalous" games to happen. To be fully convinced, I may need a bigger sample to see how often that sort of anomalous game happens.... things are not quite straightforward.
Comment