Estimating the Release Date of Richard Shindell’s Next Album

Richard Shindell has been working on his next album for quite some time now. His fans (that includes me) try to be patient. Several new songs have already made their live debut. The album is supposed to be called “Viceroy Mimic” (VM), but a couple of weeks ago he also mentioned “Same River Once” as a contender. Pressed about a release date, Shindell said (during a recent concert in Boston) January 2015. Regardless of this, here’s the statistical perspective – just for fun! The linear trend across all album releases (including live albums, cover albums, Cry, Cry, Cry etc.) suggests that a new album should have been released on November 4, 2013.

Graph: Linear prediction of the release dates of Richard Shindell albums (incl. live albums etc.)

The quadratic trend across Richard’s original studio albums, however, would imply a May 11, 2014 release date for “Viceroy Mimic”. The linear prediction appears to be worse in this case; it the lag between original albums is increasing.

Graph: Quadratic prediction of the release dates of Richard Shindell albums (only original studio albums)

Given the projected 2015 release, a cubic function might be necessary, soon. Anyway, below you can find the detailed data and the Stata code to replicate the graphs.

Continue reading ‘Estimating the Release Date of Richard Shindell’s Next Album’ »

SpinTunes Feedback, Metal Influences, and Statistics

The first round of the SpinTunes #3 song writing competition is over. Lo and behold, I made it to the next round! So needless to say I’m happy with the results. But equally important, the reviewers provided a lot of feedback. One is often inclined to retort when faced with criticism. Musicians even tend to reject praise if they feel misunderstood. I’m no exception. But this time around I actually agree with everything the judges wrote about my entry. (I Love the Dead – remember?) There wasn’t even the initial urge to provide my point of view, shed light on my original intentions. I will now go into the details, before I turn to a quick statistical analysis of the ratings in the last section of this post.

The incubation period for this song was rather long. At first, I was considering writing about the death metal band Death. It would have meant stretching the challenge and alienating anyone unfamiliar with the history of death metal (read: pretty much everyone). The only reminiscence of heavy metal in my actual entry is the adaptation of Megadeth’s “Killing Is My Business and Business Is Good”. I toyed with the idea of celebrating the death of a person who has lived fully and left nothing but happy marks on the lives others. Translating this idea into an actual song was a complete failure, though. I also considered writing about mortality statistics. There’s people who estimate the space needed for future graveyards and health insurances and so on. I’m somewhat familiar with the statistics behind that. But it would have taken weeks to turn this into a cohesive songs. So I returned to the notion of the happy grave digger. (Yes, Grave Digger is the name of a German metal band.) The working title was “Grave Digger’s Delight”. The music started with the chorus while I was playing an older idea I hadn’t used so far. Basically, I threw away the old idea except for the initial G-chord and the final change to D. I did add the intro melody, more on that soon. The verses are the good, old vi-IV-I-V, but with a ii thrown in for good measure. That’s not too original, but I was already running out of time. The lyrics started out with a word cloud of related terms. Plots With a View was a big inspiration when it came to the sincerity behind the mortician’s word. Here’s a person who’s dedicated to his job! I had wanted to include a couple of fancy funeral descriptions. But the music called for more concise lyrics. All that’s left from that idea is the line “I can give you silence – I can give you thunder”, which I kept to rhyme with “six feet under”. That one is indeed very plain, but I felt that the huge number of competitors called for a straight song that brings its message across during the first listen, preferably during the first 20 seconds. I think I succeeded in this respect. (This also a major reason why I changed the title to “I Love the Dead” – keeping it straight and plain.) The 2 minute minimum length gave me headaches. This made me keep, even repeat, the intro melody. I was tempted to use a fade out. But I always see this as a lack of ideas. So I used the working title for the ending. Given a few more days I might have come up with a more adequate closure. Even as I was filming the video, I felt the need to shorten the ending. I tried to spice up the arrangement with a bridge (post-chorus?) of varying length. I wasn’t completely sure about it during the recording process, but now I’m glad that the deadline forced me to keep it as it is. At one point I had a (programmed) drum track and some piano throughout the songs. To me it sounded as if they were littering the song rather than filling in lower frequencies. So I dropped them and just used a couple of nylon-stringed guitars (one hard right, one hard left), a steel-stringed guitar (center), a couple of shakers, lead vocals plus double-tracked vocals and harmony vocals in the chorus (slightly panned) and, of course, the last tambourine.

TL;DR – I appreciate the feedback and I resolve to start working on my next entry sooner.

Russ requests statistics. I happily obliged and performed a quick factor analysis using the ratings. What this method basically does is to create a multi-dimensional space in which the ratings are represented. There is one dimension for each judge, yielding a 9-dimensional space in the present case. If everybody judged the songs in a similar way, you would expect “good” songs to have rather high ratings on all dimensions the “bad” songs to receive low ratings. A line is fitted into this space to model this relationship. If all data point (i.e., songs) are close to that line in that space, the ratings are supposed to be uni-dimensionally.  In other words, there appears to be one underlying scale of song quality that is reflected in the ratings. This would be at odds with the common assertion that judgments are purely subjective and differ from rater to rater. (It would also suggest that computing the sum score is somewhat justified and not just creating numeric artifacts void of meaning.)

Using Stata 10 to perform a factor analysis with a principal-component solution, I get the following factors:

. factor blue-popvote, pcf
(obs=37)

Factor analysis/correlation                    Number of obs    =       37
Method: principal-component factors            Retained factors =        2
Rotation: (unrotated)                          Number of params =       17

--------------------------------------------------------------------------
Factor   |   Eigenvalue   Difference        Proportion   Cumulative
---------+----------------------------------------------------------------
Factor1  |      4.44494      3.29466            0.4939       0.4939
Factor2  |      1.15028      0.33597            0.1278       0.6217
Factor3  |      0.81431      0.08112            0.0905       0.7122
Factor4  |      0.73319      0.19850            0.0815       0.7936
Factor5  |      0.53468      0.05959            0.0594       0.8530
Factor6  |      0.47510      0.11760            0.0528       0.9058
Factor7  |      0.35750      0.05932            0.0397       0.9456
Factor8  |      0.29818      0.10635            0.0331       0.9787
Factor9  |      0.19183            .            0.0213       1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated:  chi2(36) =  137.45 Prob>chi2 = 0.0000

Wait, what? Let’s just focus on one criteria for exploring the factor solution: Eigenvalues larger than 1. Here are two such factors, which suggests that the rating data represents two (independent) dimensions. (For those familiar with the method: I tried a few rotated solutions, but they yield similar results.) Now the first factor explains almost half of the variance at hand whereas the second factor has a much smaller Eigenvalue and subsequently explains only 1/8 of the variance in the data.

Let’s take a look at the so called factor loading to see how the two factor relate to the raters. Stata says:

Factor loadings (pattern matrix) and unique variances

---------------------------------------------
Variable |  Factor1   Factor2 |   Uniqueness
---------+--------------------+--------------
blue     |   0.6128   -0.0039 |      0.6244
mike     |   0.7690   -0.1880 |      0.3733
mitchell |   0.7188    0.1032 |      0.4727
glenn    |   0.7428   -0.0309 |      0.4474
randy    |   0.8830    0.0089 |      0.2202
kevin    |   0.7768    0.1219 |      0.3817
david    |   0.6764    0.3650 |      0.4092
ben      |  -0.0672    0.9439 |      0.1045
popvote  |   0.7512   -0.2534 |      0.3714
---------------------------------------------

Without going into statistical details, let’s say that the loading indicate who strongly each rater is related with each factor. For example, Blue’s ratings have less to do with the overall factor than Mike’s ratings. Both rater’s show rather high loadings, though. Given the high loading of all raters (except one) indicate a high level of general agreement. The only exception is Ben, whose ratings have little to do with the first factor. (You could argue that he even gave reverse ratings, but the loading is quite small.) Instead, his ratings play a big role in the second factor (which is by definition statistically independent from the first one). There is some agreement with the remaining variance of David’s ratings and a negative relationship with the popular vote (if you use the somewhat common notion to interpret loadings that are larger than 0.2). So there appears to be some dissent regarding the ranking. But on the other hand, the “dominant” first factor suggests that the ratings reflect the same construct to a large degree. Whether that’s song writing skills, mastering of the challenge, or simply sympathy, is different question.

PS: I must admit that I haven’t listened to all entries, yet. It’s a lot of music and I’m struggling with a few technical connection glitches. Anyway, I liked what Jason Morris and Alex Carpenter did, although their music wasn’t that happy. Another entry that necessarily caught my attention was Wake at the Sunnyside by the one and only Gödz Pöödlz. Not only did they choose the same topic I used, they also came up with a beautiful pop song and plenty of original lyrical ideas. Good work!

Jutze 52 #37 – Team Slater

This song was inspired by the season finale of Community. I’m sure I don’t have to remind you that Community is totally awesome. If you’ve seen the last episode this song will make perfectly sense to you. If not, well, go watch Community!

I recorded this song in two takes with my digicam. It’s the last one I recorded while waiting for my new computer, so next week’s track will have a better production. As for future Community fan songs, I just had the idea to write a heavy metal meets glee homage to the episode “Modern Warfare”…

#37 Team Slater

No left-wing tendencies, a firm grip on life
A grown-up character, math power +5
Statistical knowledge along with a steady income
Attractive looks and then some
Go Slater, go! I’m on Team Slater
Go Slater, go! Winger plus Slater
Go Slater, go!

She’s the one who’s serious; she’s overcome her fear
She’s very much experienced – the decision should be clear
Go Slater, go! I’m on Team Slater
Go Slater, go! Bring Conan back
Team Slater

(words and music by Johannes Schult)

Practical tips for statisticians (part 8): centering variables using Stata and SPSS

My current research requires meta-analytic procedures where variables that contain another variable’s mean come in very handy. Centering Variables is also something very reasonable to do when analysing regressions with an interaction term between a continuous variable and a dummy variable.

Centering variables sounds like an easy task. It is if you use Stata but I found it surprisingly difficult in SPSS (unless you enter the means by hand, which is error-prone and impractible for repeated analyses). Here’s how you can calculated a variable which contains the mean of another variable (which can then easily be centered or used in whatever way one wants to).

Let dres be the variable of interest. The new variable containing the mean of dres (for all obversations) will be named dresavg. I also show how to create a variable containing the number of observations (ntotal). cdres will be the centered variable.

Stata 10

Use

. egen dresavg = mean(dres)

and you’re done! You could also use summarize and generate commands:

. sum dres
. gen dresavg = r(mean)

If you want a variable that contains the total number of observations you can use

. gen ntotal = _N

or with the more flexible egen command (e.g., handy when dres has missings)

. egen ntotal = count(dres)

There are plenty ways to generate various variables containing sample statistics. As for the centered variable, use

. gen cdres = dres - dresavg

or without even generating the variable containing the mean:

. sum dres
. gen cdres = dres - r(mean)

PASW 18 (SPSS, you know)

Beware, long syntax ahead. Before you despair, there’s a simpler (but less flexible) solution below. The complicated approach starts with exporting the variable mean into a new data set. This data set is then merged with the master data set; a variable containing the mean for every observation will be attached. Continue reading ‘Practical tips for statisticians (part 8): centering variables using Stata and SPSS’ »

Jutze 52 #16 – Statistics

This is a little homage to WatchTower, written in anticipation of their show next Friday.

Eleven years ago I wrote a song called “Golden Future” for From Thy Ashes (my band back then). It was an attempt to combine a whole bunch of complicated parts. The result wasn’t very impressive. But I figured back then that the ideal unit for writing such material wasn’t a couple of bars; much rather does it boil down to chunks of maybe three or for notes. If you have a big simple thing and start adding details, the music just gets weary. So this time I didn’t really bother with the big picture and concentrated on making every single note count.

I started out with the drum track, programming some wild, odd bars of hectic noise with only very vague ideas of guitar riffs in my head. I have little (meaning no) advanced harmony knowledge, so I just played what I’d never play in an ordinary e minor setting. Half-step runs? Yes, please. I wrote pretty much every single note by trial-and-error as I went along, recording the tiniest bits separately, one by one. I was baffled how flawless it all sounded once I stuck everything together. I played the bass on keyboard, as usual. At that point I was close to keeping the song an instrumental. Most of you probably wish I had. But then I figured I could mirror the title of WatchTower’s third album, Mathematics, by singing about my profession: statistics. I dare to say that it all made sense in the end. At least to me. I know, I sound somewhat ridiculous when I try to channel Alan Tecchio’s vocal style. Still, I’m very happy with the overall outcome!

#16 Statistics

When I say what I do for a living
The response is silence
Statistics: misunderstood and ignored
Statistics: valuable and powerful

I love data
I love Stata
Statistics: misunderstood and ignored
Statistics: powerful and valuable

(words and music by Johannes Schult)

Practical tips for statisticians (part 7)

A couple of days ago I got hold of the book The Workflow of Data Analysis Using Stata by J. Scott Long. I haven’t yet delved into it. But I’m already loving and condemning it. Loving it, because it covers an integral part of scientific data analysis, filling a void that left by both the literature and the courses taught at university. Condemning it, because I had wanted to write a book on the same topic (how to ensure your data analysis is documented well, i.e., replicable) during the next years. It wouldn’t have been the same book; in fact, it would have been vastly different, possibly much worse.

It’s too early for me to review the book in a conclusive manner. Still, the content looks very promising and I think it’s telling that Long focuses on Stata as the software of choice. This is going to be fun!