Archive for the ‘Data analysis’ Category.

Practical tips for statisticians (part 9): Blogroll

Here’s a short list of blogs featuring statistical content. It’s basically the bookmarks I keep in my browser under “funny, thoughtful, helpful, interesting”. I enjoy reading them even when I’m not looking for a particular solution or inspiration.

Practical tips for statisticians (part 8): centering variables using Stata and SPSS

My current research requires meta-analytic procedures where variables that contain another variable’s mean come in very handy. Centering Variables is also something very reasonable to do when analysing regressions with an interaction term between a continuous variable and a dummy variable.

Centering variables sounds like an easy task. It is if you use Stata but I found it surprisingly difficult in SPSS (unless you enter the means by hand, which is error-prone and impractible for repeated analyses). Here’s how you can calculated a variable which contains the mean of another variable (which can then easily be centered or used in whatever way one wants to).

Let dres be the variable of interest. The new variable containing the mean of dres (for all obversations) will be named dresavg. I also show how to create a variable containing the number of observations (ntotal). cdres will be the centered variable.

Stata 10


. egen dresavg = mean(dres)

and you’re done! You could also use summarize and generate commands:

. sum dres
. gen dresavg = r(mean)

If you want a variable that contains the total number of observations you can use

. gen ntotal = _N

or with the more flexible egen command (e.g., handy when dres has missings)

. egen ntotal = count(dres)

There are plenty ways to generate various variables containing sample statistics. As for the centered variable, use

. gen cdres = dres - dresavg

or without even generating the variable containing the mean:

. sum dres
. gen cdres = dres - r(mean)

PASW 18 (SPSS, you know)

Beware, long syntax ahead. Before you despair, there’s a simpler (but less flexible) solution below. The complicated approach starts with exporting the variable mean into a new data set. This data set is then merged with the master data set; a variable containing the mean for every observation will be attached. Continue reading ‘Practical tips for statisticians (part 8): centering variables using Stata and SPSS’ »

Jutze 52 #16 – Statistics

This is a little homage to WatchTower, written in anticipation of their show next Friday.

Eleven years ago I wrote a song called “Golden Future” for From Thy Ashes (my band back then). It was an attempt to combine a whole bunch of complicated parts. The result wasn’t very impressive. But I figured back then that the ideal unit for writing such material wasn’t a couple of bars; much rather does it boil down to chunks of maybe three or for notes. If you have a big simple thing and start adding details, the music just gets weary. So this time I didn’t really bother with the big picture and concentrated on making every single note count.

I started out with the drum track, programming some wild, odd bars of hectic noise with only very vague ideas of guitar riffs in my head. I have little (meaning no) advanced harmony knowledge, so I just played what I’d never play in an ordinary e minor setting. Half-step runs? Yes, please. I wrote pretty much every single note by trial-and-error as I went along, recording the tiniest bits separately, one by one. I was baffled how flawless it all sounded once I stuck everything together. I played the bass on keyboard, as usual. At that point I was close to keeping the song an instrumental. Most of you probably wish I had. But then I figured I could mirror the title of WatchTower’s third album, Mathematics, by singing about my profession: statistics. I dare to say that it all made sense in the end. At least to me. I know, I sound somewhat ridiculous when I try to channel Alan Tecchio’s vocal style. Still, I’m very happy with the overall outcome!

#16 Statistics

When I say what I do for a living
The response is silence
Statistics: misunderstood and ignored
Statistics: valuable and powerful

I love data
I love Stata
Statistics: misunderstood and ignored
Statistics: powerful and valuable

(words and music by Johannes Schult)

Jutze 52 #7 – Lonely Hearts Ad (Bootstrapping)

This is another example of why I like the 52-second format: If this lonely hearts ad was any longer, people would actually start taking it seriously. I was somewhat uncertain about the exact wording, the organ in the background and the main chord sequence (D G E A was in there at one point). But I think, the song works in its present form (p < 0.05). The concept of the song was inspired by an old statistics lecture that featured remarks about Love@Lycos, matching algorithms and bootstrapping.

#7 Lonely Hearts Ad (Bootstrapping)

I’m looking for a woman who is capable of bootstrapping, yeah
I don’t care if she’s tall or thin or if her hair is red

It’s good for a romantic relationship to be based on common interests, yeah
Even though I’m the first to admit that bootstrapping’s uncommon

This statistical procedure
Is an important feature
Of our future late-night conversations

I’m looking for a woman who is capable of bootstrapping, yeah
If we can figure out bootstrapping we can figure out everything else

(words and music by Johannes Schult)

I just recorded a video of me playing the song at home:

Practical tips for statisticians (part 7)

A couple of days ago I got hold of the book The Workflow of Data Analysis Using Stata by J. Scott Long. I haven’t yet delved into it. But I’m already loving and condemning it. Loving it, because it covers an integral part of scientific data analysis, filling a void that left by both the literature and the courses taught at university. Condemning it, because I had wanted to write a book on the same topic (how to ensure your data analysis is documented well, i.e., replicable) during the next years. It wouldn’t have been the same book; in fact, it would have been vastly different, possibly much worse.

It’s too early for me to review the book in a conclusive manner. Still, the content looks very promising and I think it’s telling that Long focuses on Stata as the software of choice. This is going to be fun!

Practical tips for statisticians (part 6)

The homepage is a valuable tool for choosing colours for maps. The colour sets can be made colorblind-safe and photocopy-able. So you don’t get the usual (often distracting) MS Excel default rainbow, but highly usable colour palettes which can easily be used for other data plots, as well.

(via today’s Statalist digest)

Anfangen zu leben

  • Mich morgens nicht mehr vom “Little Green Frog”-Lied (gesungen von Jessica Lucas, Missy Peregrym und Kelly Osbourne) wecken lassen, sondern von “Mr. Moon” (Kate Micucci).
  • Täglich einen Apfel essen.
  • Mehr Gurken, Karotten, Paprika und Tomaten essen. Weniger Chips und Schokolade essen.
  • Nur noch Bio-Lebensmittel kaufen. Ausnahmen: Eszet-Schnitten und Ehrmann Espresso-Joghurt.
  • Nicht mehr Nägel kauen.
  • Andere Leute dazu bringen, den Atomausstieg selber zu machen.
  • Montags eine CD für Vampster besprechen.
  • Dienstags zwei Seiten Roman schreiben.
  • Mittwochs Sport treiben.
  • Donnerstags eine Postkarte verschicken.
  • Freitags die Seele baumeln lassen.
  • Samstags ein neues Rezept ausprobieren.
  • Sonntags irgendwelche Daten analysieren – zum Spaß und zur Ãœbung.
  • Vor Mitternacht ins Bett gehen.


At university I’ve learned, among other things, to be suspicious of arrows in figures. You know, the kind of figures with connected boxes and big words. I therefore found it quite refreshing when such a plot came up last Friday (during a talk about psychoneuro-endocrinology at a symposium in Luxembourg) and the speaker (Prof. Dr. Onno Meijer, University of Leiden, Netherlands) remarked: “The arrows are not arrows in real life; they are proteins.” I wish people in the social sciences would present their arrows with the same degree of clarity.

Practical tips for statisticians (part 5)

You probably heard this one before (I heard it from J.R.): don’t drink and derive!

Practical tips for statisticians (part 4)

While the first three parts where actually bogus and solely for entertainment purposes, the following two recommendations are serious, possibly helpful for non-statisticians, too – and actually free.

  • A very helpful software tool for data entry is EpiData. If you are working with manual data input (e.g., questionnaire data) you should definitely check it out – unless, of course, you already know (and use) it.
  • The Firefox extension is Zotero is capable of collecting, managing, and citing research sources. It rivals common citation software, not only because it is free, but also because it is very easy to handle. I got to know Zotero just two days ago – thanks to Biddy for the hint – and felt at home with its usage in a matter of minutes.