Practical tips for statisticians (part 8): centering variables using Stata and SPSS

My current research requires meta-analytic procedures where variables that contain another variable’s mean come in very handy. Centering Variables is also something very reasonable to do when analysing regressions with an interaction term between a continuous variable and a dummy variable.

Centering variables sounds like an easy task. It is if you use Stata but I found it surprisingly difficult in SPSS (unless you enter the means by hand, which is error-prone and impractible for repeated analyses). Here’s how you can calculated a variable which contains the mean of another variable (which can then easily be centered or used in whatever way one wants to).

Let dres be the variable of interest. The new variable containing the mean of dres (for all obversations) will be named dresavg. I also show how to create a variable containing the number of observations (ntotal). cdres will be the centered variable.

Stata 10

Use

. egen dresavg = mean(dres)

and you’re done! You could also use summarize and generate commands:

. sum dres
. gen dresavg = r(mean)

If you want a variable that contains the total number of observations you can use

. gen ntotal = _N

or with the more flexible egen command (e.g., handy when dres has missings)

. egen ntotal = count(dres)

There are plenty ways to generate various variables containing sample statistics. As for the centered variable, use

. gen cdres = dres - dresavg

or without even generating the variable containing the mean:

. sum dres
. gen cdres = dres - r(mean)

PASW 18 (SPSS, you know)

Beware, long syntax ahead. Before you despair, there’s a simpler (but less flexible) solution below. The complicated approach starts with exporting the variable mean into a new data set. This data set is then merged with the master data set; a variable containing the mean for every observation will be attached.

* Open the data set containing dres.
* Depending on your operating system and your preferences you might have to change the paths.
*.
GET FILE='D:\master01.sav'.
DATASET NAME master WINDOW=FRONT.
*.
* OMS will send (specified) output into a new file.
* In this case the new file is a SPSS data set named killme.sav.
* Set VIEWER = NO to omit the table being shown in the output window.
*
* The mean of dres is computed via DESCRIPTIVES.
* NB: You can adapt the statistics, changing or adding statistics (e.g., SUM);
* you might need to adapt some of the following stuff, though
* (e.g., add Summe to the Mittelwert in the /KEEP option when saving).
*
* OMSEND will stop the sending.
*.
OMS
/SELECT TABLES
/DESTINATION FORMAT=SAV OUTFILE = 'D:\killme.sav' VIEWER = YES
/IF COMMANDS = ['descriptives'] subtypes = ['descriptive statistics'].
DESCRIPTIVES VARIABLES=dres
/STATISTICS=MEAN.
OMSEND.
*.
* A new data set containing the descriptive statistics has been created.
* Time to close the master data set for the time being.
*.
DATASET CLOSE master.
*.
* Open the new small data set.
*.
GET FILE='D:\killme.sav'.
DATASET NAME zwischenhalt WINDOW=FRONT.
*.
* I suggest you take a look at the data.
* The number of interest is in the variable Mittelwert.
* NB: I'm using a German version of the software; other version will give it other names.
*
* There's an additional line with the number of total observations
* and if there's missing values maybe even more lines.
* Let's get rid of them! NB: This assumes that the mean is, well not a very large negative number.
* I know that's an ugly way to do it, but so far I couldn't get a statement like ~=SYSMIS(Mittelwert) working.
*.
FILTER OFF.
USE ALL.
SELECT IF (Mittelwert > -66666666666).
EXECUTE.
*.
* A (constant) key variable is required for the upcoming merging procedure.
*.
COMPUTE cons = 1 .
EXECUTE .
*.
* Key variables must be sorted.
*.
SORT CASES BY cons(A).
*.
* Save the small data set, keeping only the key variable and the data of interest.
* NB: Here N (i.e., the number of observations) is kept as well mirroring the Stata procedures shown above.
*.
SAVE OUTFILE='D:\killme.sav'
/KEEP=N Mittelwert cons
/COMPRESSED.
*.
* SPSS saves the specified data set, but the other variables remain there in the open data set.
* Closing and re-opening solves this problem.
*.
DATASET CLOSE zwischenhalt.
*.
GET FILE='D:\killme.sav'.
DATASET NAME zwischenhalt WINDOW=FRONT.
*.
* Open the master data set, as well.
*.
GET FILE='D:\master01.sav'.
DATASET NAME master WINDOW=FRONT.
*.
* Add the key variable here, as well; then sort it.
*.
COMPUTE cons = 1 .
EXECUTE .
*.
SORT CASES BY cons(A).
*.
* Now for the big one: the data set currently in front (master, here simply *) is merged with the other data set.
* Using the /TABLES option allows to attach the one line of values to every line (observation) in the master data set.
*.
MATCH FILES /FILE=*
/TABLE='zwischenhalt'
/BY cons.
EXECUTE.
*.
* Don't need the small data set anymore.
*.
DATASET CLOSE zwischenhalt.
*.
* Out of sight, but still on your computer. I'm afraid you have to manually delete it.
* The file name killme suggests what to with it, eventually (a tip I got from Prof. Schnell).
*
* The variables that were attached have weird German names. Let's make them clearer:
*.
RENAME VARIABLES (N Mittelwert = ntotal dresavg).
EXECUTE.
*.
* It doesn't hurt to label the new variables - but I assume you'd already thought of this yourself.
*
* Phew, now there's the variables needed for centering. So let's center!
*.
COMPUTE cdres = dres - cdresavg.
EXECUTE.
*.
* There was no immediate need for save commands in the Stata examples.
* For the sake of completity (and to drop the no longer needed key variable): save and close.
*.
SAVE OUTFILE='D:\master01.sav'
/DROP=cons /COMPRESSED.
*.
DATASET CLOSE master.

So far, so complex. I promised an easier solution. You can (ab)use the regression command to achieve the same results in terms of mean variables. Just run a regression through the origin with dres as dependent variable and a constant as predictor. The predicted values (which are saved in a new variable specified in the last line of syntax) then correspond to the mean of dres.

COMPUTE cons = 1 .
EXECUTE .
*.
REGRESSION
/MISSING LISTWISE
/STATISTICS R
/CRITERIA=PIN(.05) POUT(.10)
/ORIGIN
/DEPENDENT dres
/METHOD=ENTER cons
/SAVE PRED(dresavg).
*
* Let's center!
*.
COMPUTE cdres = dres - cdresavg.
EXECUTE.

A final remark to those frightened by the extensive solution with the OMS command: It should be possible to get a variable containing the number of observation by assigning case numbers, sorting, and then using the lag function (repeatedly?), but I haven’t figured out how, yet. And who knows, maybe there’s an obvious simple way I just overlooked all the time. For the time being I’m happy to have a working solution in SPSS. The easiest way is, of course, to use Stata.

Update (an hour later): Thanks to the helpful comment below I can present another simple solution for SPSS:

COMPUTE cons = 1 .
EXECUTE .
*.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=cons
/dresavg=MEAN(dres)
/ntotal=N.

2 Comments

  1. JR Corner:

    As I already asked: why didn’t you use the “aggreate” command in SPSS//PASW? I am no friend of the program, but it provides the same easy solution as STATA, at least as far as I got what you wanted to achieve. I cannot present the code here, since I do not have any SPSS-type program at hand, but if you start searching, you will find it ;)) [UCLA has only a lame example with outfiling, but you don’t need to do that…]
    best from cambridge.
    jr

  2. Jon Peck:

    Comments about the simple SPSS solution.
    1. There is no need for the EXECUTE statement. That just forces an extra data pass.
    2. There is actually no need for the COMPUTE cons = 1 statement, although the dialog box interface used to require a BREAK variable.
    3. The AGGREGATE can be simplified to
    AGGREGATE /meanx = mean(x).
    3. The AGGREGATE solution is incomplete. You do need to subtract the mean once it is computed (but, again, don’t add an EXECUTE. Transformations will be done just in time, i.e., piggybacked on the next data pass.)
    4. If you want to standardize a set of variables, i.e., demean and scale variance to 1, DESCRIPTIVES will do that for you in one step:
    DESCRIPTIVES x y z /save.