The idea to create an algorithm that automatically scans scientific articles for the results of common statistical tests and evaluates the accuracy of these results seems straight-forward. Statcheck performs this, well, stat check. Now a lot of available papers have been automatically evaluated and the outcomes were posted on PubPeer.
So far, none of the (two) papers on pubpeer I co-authored raised an error flag. That’s reassuring. I went and (stat)checked my other publications, and behold: There was indeed an inconsistency in one of them. In Schult et al. (2016) I reported “chi-square(33) = 59.11, p = .004″. Statcheck expected p = .003. The cause of this discrepancy is the rounding of rounded results. The Mplus output showed a chi-square value of 59.109 and a p value of 0.0035. I rounded both values to make the results more readable, accepting that, for example, a value such as 0.00347something would be mistakenly rounded to 0.004 instead of 0.003. For the record: Whenever a test statistic’s p value is close to the chosen alpha level, I do use all available decimal places to evaluate the decision of statistical significance. Of course, I could just report all available digits all the time. Still, that smells of pseudo-accuracy, plus I like to think that I write for human readers, not for computer algorithms.
What’s the take home message here? I won’t be surprised when this error/discrepancy/inconsistency (what’s in a name?) is discovered and posted by the big machine. I will keep writing my papers with care, double-checking the results etc. (something my senior authors always condoned and enforced). And did I mention that I put replication materials online (unless privacy/copyright laws or, sadly, busyness prevent me from doing so)?