Archive for May 2013

Measuring the Popularity of Novels?

Apparently, the amount of ratings on is highly correlated with the ratings, at least for John Green’s four novels (r = .96). But is it really ‘the more, the merrier’? I picked four more authors (in a non-random fashion), had a look at the respective correlations for their novels, and made a couple of graphs to illustrate the results.

Scatter plot of amount of ratings and ratings

Novels by John Green, Maureen Johnson, J.K. Rowling, and Stephanie Meyer

The relationship is a negative one for Stephanie Meyer’s books. Two books of J.K. Rowling are outliers – her first one in terms of ratings on GoodReads, her most recent one in terms of rating. I therefore took the liberty to plot a quadratic fit (instead of a linear fit). It appears that John Green might be an exception (like the Mongols?) Also, ratings tend to be higher; and again, there is no clear relationship between the amount of reviews and the average rating.

And since I recently finished reading “On Chesil Beach”, here’s the data for Ian McEwan’s novels, along with a more appropriately scaled plot for Maureen Johnson’s books:

Scatter plot of amount of ratings and ratings

Novels by Maureen Johnson and Ian McEwan

By the way, the correlation between ratings and ratings for the 40 books I used above is r = .89. The correlation between number of reviews and ratings is r = .75.

PS: If anyone is interested in the Stata code for the graphs, let me know. I guess, I’ll add it here this weekend, anyway, but right now I should go to bed.

Null Hypothesis Significance Testing: The Fault in Our Stars


[…] The same is true on amazon, where the book’s average rating has actually gone up a bit in the past six months (although not in a statistically significant way). […]

Actually, the ratings have decreased in a statistically significant way (alpha < .05). I used the two most recently archived pages from, which do not cover exactly 6 months. Still, ratings before 2013-02-03 were higher than those after that date.

  • Before (2110 ratings): mean = 4.76 (SD = 0.014)
  • After (1232 ratings): mean = 4.67 (SD = 0.021)

A t-test (two-sided, unequal variances) yields p = 0.0009 (d = -0.12); and for the non-parametric fans, the Wilcoxon rank-sum (Mann-Whitney) test yields p = 0.0001.

Using 2012-10-19 as dividing date, yields similar results:

  • Before (1051 ratings): mean = 4.77 (SD = 0.020)
  • After (2291 ratings): mean = 4.71 (SD = 0.015)

A t-test (two-sided, unequal variances) yields p = 0.0188 (d = -0.09); the Wilcoxon rank-sum test yields p = 0.0008. Of course, significance testing might be a questionable procedure in this case – and also in general.

This is actually a census of all Amazon ratings, so there’s no need to test whether ratings differ. The sample is the population. However, the written reviews could be regarded as a subsample of the ratings of all readers.

Is it a random sample? I don’t think so. So can we draw proper conclusions from the significance test results? Nah. I won’t provide a comprehensive discussion of the benefits and problems associated with the null hypothesis significance testing (NHST). I’ll just name one of my favourite objections, which Cohen (1990, p. 1308) phrased nicely: “The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world.” In the present, the null hypothesis would mean, that average rating of newer readers is exactly the same as the average rating of those who pre-ordered the book etc.

Anyway, the effect size suggests that the drop in ratings is very small, so it should be safe to argue that the book keeps appealing to new readers.

PS: Sorry for nitpicking; this should in no way diminish the article, which I think is highly insightful.

PPS: I spend a good 15 minutes in R trying to beat the data into shape, but I feel much more comfortable in Stata, so I switched and had the analysis in a few minutes. Here’s the do-file in case anyone in curious. (Haha, as if!)

Continue reading ‘Null Hypothesis Significance Testing: The Fault in Our Stars’ »

Thoughts on “The Bestseller Job”

Today’s mail contained a copy of “The Bestseller Job” (by Greg Cox), a novel based on the televion series “Leverage“. I really like “Leverage” and I was sad to learn that its 5th season was going to be the last one. I’m not usually into novels that expand existing series, but on a whim I bought this one. I’m 76 pages in right now. (The book has 291 pages.) It is certainly too early for a final verdict. I just thought I’d put down my first impression, which, by the way, is positive. The writing style matches the editing of the television series; the plot fits the Leverage universe perfectly, and I’m thrilled that 3/4 of the story are still ahead of me. I like it when the summary on the back doesn’t spoil the whole first half of a book, so I was pleasantly surprised how fast “The Bestseller Job” took off. I was even more enthralled to find the crew en route to Germany. Heck, we learn that Parker once had an alias from Stuttgart. And it’s not just these nods to the country I live in, it’s the acurate transition from one medium to another that makes me really happy. Okay, back to reading!