{"id":908,"date":"2010-09-03T17:15:17","date_gmt":"2010-09-03T16:15:17","guid":{"rendered":"http:\/\/www.jutze.com\/?p=908"},"modified":"2010-09-03T18:04:06","modified_gmt":"2010-09-03T17:04:06","slug":"practical-tips-for-statisticians-part-8-centering-variables-using-stata-and-spss","status":"publish","type":"post","link":"https:\/\/www.jutze.com\/?p=908","title":{"rendered":"Practical tips for statisticians (part 8): centering variables using Stata and SPSS"},"content":{"rendered":"<p>My current research requires meta-analytic procedures where variables that contain another variable&#8217;s mean come in very handy. Centering Variables is also something very reasonable to do when analysing regressions with an interaction term between a continuous variable and a dummy variable.<\/p>\n<p>Centering variables sounds like an easy task. It is if you use Stata but I found it surprisingly difficult in SPSS (unless you enter the means by hand, which is error-prone and impractible for repeated analyses). Here&#8217;s how you can calculated a variable which contains the mean of another variable (which can then easily be centered or used in whatever way one wants to).<\/p>\n<p>Let dres be the variable of interest. The new variable containing the mean of dres (for all obversations) will be named dresavg. I also show how to create a variable containing the number of observations (ntotal). cdres will be the centered variable.<\/p>\n<p><strong>Stata 10<\/strong><\/p>\n<p>Use<\/p>\n<p><code>. egen dresavg = mean(dres)<\/code><\/p>\n<p>and you&#8217;re done! You could also use <a title=\"Stata help summarize\" href=\"http:\/\/www.stata.com\/help.cgi?summarize\" target=\"_blank\">summarize<\/a> and <a title=\"Stata help generate\" href=\"http:\/\/www.stata.com\/help.cgi?generate\" target=\"_blank\">generate<\/a> commands:<\/p>\n<p><code>. sum dres<br \/>\n. gen dresavg = r(mean)<\/code><\/p>\n<p>If you want a variable that contains the total number of observations you can use<\/p>\n<p><code>. gen ntotal = _N<\/code><\/p>\n<p>or with the more flexible <a title=\"Stata help egeb\" href=\"http:\/\/www.stata.com\/help.cgi?egeb\" target=\"_blank\">egen<\/a> command (e.g., handy when dres has missings)<\/p>\n<p><code>. egen ntotal = count(dres)<\/code><\/p>\n<p>There are plenty ways to generate various variables containing sample statistics. As for the centered variable, use<\/p>\n<p><code>. gen cdres = dres - dresavg<\/code><\/p>\n<p>or without even generating the variable containing the mean:<\/p>\n<p><code>. sum dres<br \/>\n. gen cdres = dres - r(mean)<\/code><\/p>\n<p><strong>PASW 18 (SPSS, you know)<\/strong><\/p>\n<p>Beware, long syntax ahead. Before you despair, there&#8217;s a simpler (but less flexible) solution below. The complicated approach starts with exporting the variable mean into a new data set. This data set is then merged with the master data set; a variable containing the mean for every observation will be attached.<!--more--><\/p>\n<p><code>* Open the data set containing dres.<br \/>\n* Depending on your operating system and your preferences you might have to change the paths.<br \/>\n*.<br \/>\nGET FILE='D:\\master01.sav'.<br \/>\nDATASET NAME master WINDOW=FRONT.<br \/>\n*.<br \/>\n* OMS will send (specified) output into a new file.<br \/>\n* In this case the new file is a SPSS data set named killme.sav.<br \/>\n* Set VIEWER = NO to omit the table being shown in the output window.<br \/>\n*<br \/>\n* The mean of dres is computed via DESCRIPTIVES.<br \/>\n* NB: You can adapt the statistics, changing or adding statistics (e.g., SUM);<br \/>\n* you might need to adapt some of the following stuff, though<br \/>\n* (e.g., add Summe to the Mittelwert in the \/KEEP option when saving).<br \/>\n*<br \/>\n* OMSEND will stop the sending.<br \/>\n*.<br \/>\nOMS<br \/>\n\/SELECT TABLES<br \/>\n\/DESTINATION FORMAT=SAV OUTFILE = 'D:\\killme.sav' VIEWER = YES<br \/>\n\/IF COMMANDS = ['descriptives'] subtypes = ['descriptive statistics'].<br \/>\nDESCRIPTIVES VARIABLES=dres<br \/>\n\/STATISTICS=MEAN.<br \/>\nOMSEND.<br \/>\n*.<br \/>\n* A new data set containing the descriptive statistics has been created.<br \/>\n* Time to close the master data set for the time being.<br \/>\n*.<br \/>\nDATASET CLOSE master.<br \/>\n*.<br \/>\n* Open the new small data set.<br \/>\n*.<br \/>\nGET FILE='D:\\killme.sav'.<br \/>\nDATASET NAME zwischenhalt WINDOW=FRONT.<br \/>\n*.<br \/>\n* I suggest you take a look at the data.<br \/>\n* The number of interest is in the variable Mittelwert.<br \/>\n* NB: I'm using a German version of the software; other version will give it other names.<br \/>\n*<br \/>\n* There's an additional line with the number of total observations<br \/>\n* and if there's missing values maybe even more lines.<br \/>\n* Let's get rid of them! NB: This assumes that the mean is, well not a very large negative number.<br \/>\n* I know that's an ugly way to do it, but so far I couldn't get a statement like ~=SYSMIS(Mittelwert) working.<br \/>\n*.<br \/>\nFILTER OFF.<br \/>\nUSE ALL.<br \/>\nSELECT IF (Mittelwert &gt; -66666666666).<br \/>\nEXECUTE.<br \/>\n*.<br \/>\n* A (constant) key variable is required for the upcoming merging procedure.<br \/>\n*.<br \/>\nCOMPUTE cons = 1 .<br \/>\nEXECUTE .<br \/>\n*.<br \/>\n* Key variables must be sorted.<br \/>\n*.<br \/>\nSORT CASES BY cons(A).<br \/>\n*.<br \/>\n* Save the small data set, keeping only the key variable and the data of interest.<br \/>\n* NB: Here N (i.e., the number of observations) is kept as well mirroring the Stata procedures shown above.<br \/>\n*.<br \/>\nSAVE OUTFILE='D:\\killme.sav'<br \/>\n\/KEEP=N Mittelwert cons<br \/>\n\/COMPRESSED.<br \/>\n*.<br \/>\n* SPSS saves the specified data set, but the other variables remain there in the open data set.<br \/>\n* Closing and re-opening solves this problem.<br \/>\n*.<br \/>\nDATASET CLOSE zwischenhalt.<br \/>\n*.<br \/>\nGET FILE='D:\\killme.sav'.<br \/>\nDATASET NAME zwischenhalt WINDOW=FRONT.<br \/>\n*.<br \/>\n* Open the master data set, as well.<br \/>\n*.<br \/>\nGET FILE='D:\\master01.sav'.<br \/>\nDATASET NAME master WINDOW=FRONT.<br \/>\n*.<br \/>\n* Add the key variable here, as well; then sort it.<br \/>\n*.<br \/>\nCOMPUTE cons = 1 .<br \/>\nEXECUTE .<br \/>\n*.<br \/>\nSORT CASES BY cons(A).<br \/>\n*.<br \/>\n* Now for the big one: the data set currently in front (master, here simply *) is merged with the other data set.<br \/>\n* Using the \/TABLES option allows to attach the one line of values to every line (observation) in the master data set.<br \/>\n*.<br \/>\nMATCH FILES \/FILE=*<br \/>\n\/TABLE='zwischenhalt'<br \/>\n\/BY cons.<br \/>\nEXECUTE.<br \/>\n*.<br \/>\n* Don't need the small data set anymore.<br \/>\n*.<br \/>\nDATASET CLOSE zwischenhalt.<br \/>\n*.<br \/>\n* Out of sight, but still on your computer. I'm afraid you have to manually delete it.<br \/>\n* The file name killme suggests what to with it, eventually (a tip I got from Prof. Schnell).<br \/>\n*<br \/>\n* The variables that were attached have weird German names. Let's make them clearer:<br \/>\n*.<br \/>\nRENAME VARIABLES (N Mittelwert = ntotal dresavg).<br \/>\nEXECUTE.<br \/>\n*.<br \/>\n* It doesn't hurt to label the new variables - but I assume you'd already thought of this yourself.<br \/>\n*<br \/>\n* Phew, now there's the variables needed for centering. So let's center!<br \/>\n*.<br \/>\nCOMPUTE cdres = dres - cdresavg.<br \/>\nEXECUTE.<br \/>\n*.<br \/>\n* There was no immediate need for save commands in the Stata examples.<br \/>\n* For the sake of completity (and to drop the no longer needed key variable): save and close.<br \/>\n*.<br \/>\nSAVE OUTFILE='D:\\master01.sav'<br \/>\n\/DROP=cons \/COMPRESSED.<br \/>\n*.<br \/>\nDATASET CLOSE master.<\/code><\/p>\n<p>So far, so complex. I promised an easier solution. You can (ab)use the regression command to achieve the same results in terms of mean variables. Just run a regression through the origin with dres as dependent variable and a constant as predictor. The predicted values (which are saved in a new variable specified in the last line of syntax) then correspond to the mean of dres.<\/p>\n<p><code>COMPUTE cons = 1 .<br \/>\nEXECUTE .<br \/>\n*.<br \/>\nREGRESSION<br \/>\n\/MISSING LISTWISE<br \/>\n\/STATISTICS R<br \/>\n\/CRITERIA=PIN(.05) POUT(.10)<br \/>\n\/ORIGIN<br \/>\n\/DEPENDENT dres<br \/>\n\/METHOD=ENTER cons<br \/>\n\/SAVE PRED(dresavg).<br \/>\n*<br \/>\n* Let's center!<br \/>\n*.<br \/>\nCOMPUTE cdres = dres - cdresavg.<br \/>\nEXECUTE.<\/code><\/p>\n<p>A final remark to those frightened by the extensive solution with the OMS command: It should be possible to get a variable containing the number of observation by assigning case numbers, sorting, and then using the lag function (repeatedly?), but I haven&#8217;t figured out how, yet. And who knows, maybe there&#8217;s an obvious simple way I just overlooked all the time. For the time being I&#8217;m happy to have a working solution in SPSS. The easiest way is, of course, to use Stata.<\/p>\n<p><strong>Update<\/strong> (an hour later): Thanks to the helpful comment below I can present another simple solution for SPSS:<\/p>\n<p><code>COMPUTE cons = 1 .<br \/>\nEXECUTE .<br \/>\n*.<br \/>\nAGGREGATE<br \/>\n  \/OUTFILE=* MODE=ADDVARIABLES<br \/>\n  \/BREAK=cons<br \/>\n  \/dresavg=MEAN(dres)<br \/>\n  \/ntotal=N.<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"<p>My current research requires meta-analytic procedures where variables that contain another variable&#8217;s mean come in very handy. Centering Variables is also something very reasonable to do when analysing regressions with an interaction term between a continuous variable and a dummy variable. Centering variables sounds like an easy task. It is if you use Stata but [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[349,350,351,131,348,347,346,130,134,353,352],"class_list":["post-908","post","type-post","status-publish","format-standard","hentry","category-data-analysis","tag-average","tag-centered","tag-centering","tag-data-management","tag-mean","tag-pasw","tag-spss","tag-stata","tag-statistics","tag-syntax","tag-variables"],"_links":{"self":[{"href":"https:\/\/www.jutze.com\/index.php?rest_route=\/wp\/v2\/posts\/908","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jutze.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jutze.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jutze.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jutze.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=908"}],"version-history":[{"count":5,"href":"https:\/\/www.jutze.com\/index.php?rest_route=\/wp\/v2\/posts\/908\/revisions"}],"predecessor-version":[{"id":911,"href":"https:\/\/www.jutze.com\/index.php?rest_route=\/wp\/v2\/posts\/908\/revisions\/911"}],"wp:attachment":[{"href":"https:\/\/www.jutze.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=908"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jutze.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=908"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jutze.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=908"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}