Replicate My Work!

Scientific work requires transparency. There is no mad genius in his/her lonely tower working for years on end on some great invention. While it may be true that professors have little time for anything but their research, they communicate their findings (along with their methods). Science is a social enterprise. Primed by Gary King‘s essay “Replication, Replication” (1995) and lectures by Rainer Schnell, I arrived at the conclusion that a scientific workflow must be a reproducible workflow. I do think that making replication material broadly available is a good thing for everyone involved.

Replication materials for my recent publications can now be found online. Maintaining a reproducible workflow is hard work but rewarding. Looking back, I could have improved a lot of things (without changing the results, mind you). It felt a bit awkward at first. Soon enough it felt even more awkward to have waited so long to put up the material. I wish I could share more of my older publications (and also raw data) but privacy laws, work contracts, and fellow psychologists who are highly skeptical of these ideas keep me from doing so.

Hopefully, the present material is just the beginning. Sadly, most psychologists do not share their materials publicly so I had to figure out most stuff on my own. I decided against third-party repositories because some focus solely on data sets whereas others are somewhat difficult to handle. So I wrote the HTML by hand hoping that a plain format allows for longevity. Let me know if you have any suggestions for improvements.

Practical tips for statisticians (part 8): centering variables using Stata and SPSS

My current research requires meta-analytic procedures where variables that contain another variable’s mean come in very handy. Centering Variables is also something very reasonable to do when analysing regressions with an interaction term between a continuous variable and a dummy variable.

Centering variables sounds like an easy task. It is if you use Stata but I found it surprisingly difficult in SPSS (unless you enter the means by hand, which is error-prone and impractible for repeated analyses). Here’s how you can calculated a variable which contains the mean of another variable (which can then easily be centered or used in whatever way one wants to).

Let dres be the variable of interest. The new variable containing the mean of dres (for all obversations) will be named dresavg. I also show how to create a variable containing the number of observations (ntotal). cdres will be the centered variable.

Stata 10


. egen dresavg = mean(dres)

and you’re done! You could also use summarize and generate commands:

. sum dres
. gen dresavg = r(mean)

If you want a variable that contains the total number of observations you can use

. gen ntotal = _N

or with the more flexible egen command (e.g., handy when dres has missings)

. egen ntotal = count(dres)

There are plenty ways to generate various variables containing sample statistics. As for the centered variable, use

. gen cdres = dres - dresavg

or without even generating the variable containing the mean:

. sum dres
. gen cdres = dres - r(mean)

PASW 18 (SPSS, you know)

Beware, long syntax ahead. Before you despair, there’s a simpler (but less flexible) solution below. The complicated approach starts with exporting the variable mean into a new data set. This data set is then merged with the master data set; a variable containing the mean for every observation will be attached. Continue reading ‘Practical tips for statisticians (part 8): centering variables using Stata and SPSS’ »

Practical tips for statisticians (part 7)

A couple of days ago I got hold of the book The Workflow of Data Analysis Using Stata by J. Scott Long. I haven’t yet delved into it. But I’m already loving and condemning it. Loving it, because it covers an integral part of scientific data analysis, filling a void that left by both the literature and the courses taught at university. Condemning it, because I had wanted to write a book on the same topic (how to ensure your data analysis is documented well, i.e., replicable) during the next years. It wouldn’t have been the same book; in fact, it would have been vastly different, possibly much worse.

It’s too early for me to review the book in a conclusive manner. Still, the content looks very promising and I think it’s telling that Long focuses on Stata as the software of choice. This is going to be fun!