Multiple regression is not just one technique but a family of techniques that can be used to explore the relationship between one continuous dependent variable and a number of independent variables or predictors (usually continuous). Multiple regression is based on correlation, but allows a more sophisticated exploration of the interrelationship among a set of variables. This makes it ideal for the investigation of more complex real-life, rather than laboratory-based, research questions. However, you cannot just throw variables into a multiple regression and hope that, magically, answers will appear. You should have a sound theoretical or conceptual reason for the analysis and, in particular, the order of variables entering the equation. Don’t use multiple regression as a ‘fishing expedition’.
There are a number of different types of multiple regression analyses that you can use, depending on the nature of the question you wish to address. The three main types of multiple regression analyses are: standard or simultaneous, hierarchical or sequential, stepwise.
Assumptions Of Multiple Regression Multiple regression is one of the fussier of the statistical techniques. It makes a number of assumptions about the data.
Sample size – Muliple regression is not the technique to use on small samples. The issue at stake here is generalisability. That is, with small samples you may obtain a result that does not generalise (cannot be repeated) with other samples. So how many cases or participants do you need? Different authors tend to give different guidelines concerning the number of cases required for multiple regression. Stevens (1996, p. 72) recommends that ‘for social science research, about 15 participants per predictor are needed for a reliable equation’. Tabachnick and Fidell (2007, p. 123) give a formula for calculating sample size requirements, taking into account the number of independent variables that you wish to use: N > 50 + 8m (where m = number of independent variables). If you have fi ve independent variables, you will need 90 cases. More cases are needed if the dependent variable is skewed. For stepwise regression, there should be a ratio of 40 cases for every independent variable.
Multicollinearity and singularity – This refers to the relationship among the independent variables. Multicollinearity exists when the independent variables are highly correlated (r=.9 and above). Singularity occurs when one independent variable is actually a combination of other independent variables (e.g. when both subscale scores and the total score of a scale are included). Multiple regression doesn’t like multicollinearity or singularity and these certainly don’t contribute to a good regression model, so always check for these problems before you start.
Outliers – Multiple regression is very sensitive to outliers (very high or very low scores). Checking for extreme scores should be part of the initial data screening process. You should do this for all the variables, both dependent and independent, that you will be using in your regression analysis.
Normality and Linearity – These assumptions can be checked from the residuals scatterplots which are generated as part of the multiple regression procedure. Residuals are the differences between the obtained and the predicted dependent variable (DV) scores. The residuals scatterplots allow you to check: i) normality: the residuals should be normally distributed about the predicted DV scores ; ii) linearity: the residuals should have a straight-line relationship with predicted DV scores
Example of research questions: How much of the variance in life satisfaction scores can be explained by self-esteem, optimism, and motivation? Which of the variables is the best predictor?
What you need:
- one continuous dependent variable (e.g total satisfaction).
- two or more continuous independent variables (e.g self-esteem, motivation, optimism, etc.). (You can also use dichotomous independent variables, e.g. males=1, females=2.)
What it does: Multiple regression tells you how much of the variance in your dependent variable can be explained by your independent variables. It also gives you an indication of the relative contribution of each independent variable. Tests allow you to determine the statistical signifi cance of the results, in terms of both the model itself and the individual independent variables.
Procedure for standard multiple regression
- From the menu at the top of the screen, click on Analyze, then select Regression, then Linear.
- Click on your continuous dependent variable (e.g. total satisfaction) and move it into the Dependent box.
- Click on your independent variables (e.g total motivation, self-esteem, optimism) and click on the arrow to move them into the Independent box.
- For Method, make sure Stepwise is selected.
- Click on the Statistics button. • Select the following: Estimates, Model fit, R squared change, Descriptives, and Collinearity diagnostics. • In the Residuals section, select Casewise diagnostics and Outliers outside 3 standard deviations. Click on Continue.
- Click on the Options button. In the Missing Values section, select Exclude cases pairwise. Click on Continue.
- Click on the Plots button. • Click on *ZRESID and the arrow button to move this into the Y box. • Click on *ZPRED and the arrow button to move this into the X box. • In the section headed Standardized Residual Plots, tick the Normal probability plot option. Click on Continue.
- Click on the Save button. • In the section labelled Distances, select Mahalanobis box (this will identify multivariate outliers for you). • Click on Continue and then OK
- Multicollinearity & Singularity dalam kalangan IV mestilah kurang daripada 0.9.
- Nilai regresi TIDAK MUNGKIN 100%. Jika ini berlaku, kemungkinan berlakunya ‘regress’ pada subset. Ini bermakna IV dan DV perlulah berbeza sama sekali!
- Perkara-perkara berikut perlu disemak terlebih dahulu apabila output regresi diperoleh daripada SPSS.
i) Semak multicollinearity – X dan y mesti ada korelasi (>0.3) Rujuk jadual korelasi, 3 IV yang lain (selain satisfaction…motivation, self-esteem,&optimism) jangan melebihi 0.9 (multicollinearity) atau lihat nilai tolerance (menghampiri ‘O’ bermakna multicollinearity);
ii) Semak Outliers – Standardized residual melebihi 3.3 atau kurang daripada -3.3 (Tabachnick & Fidell 1996). Di luar julat tersebut, adalah multicollinearity. Delete! ‘Multivariate outliers’. Outliers juga boleh disemak melalui Mahalanobis Distance. Pastikan ambil nilai kritikal chi-square (2 IV=13.82; 3 IV=16.27; 4 IV=18.47, dan seterusnya…). Data/responden yang melebihi nilai kritikal tersebut perlu di ‘delete’.
R squared change = nilai sumbangan bagi setiap IV
Adjusted R squared = mengambil kira kovarians, bandingkan nilai R2 dengan adjusted R2; jika berbeza, bermakna ada outliers.
Beta = perubahan yang akan berlaku pada y jika x diubah. (jika tambah 1 unit x, akan meningkatkan y).
Setelah outliers dibuang, re-run analisis. Semak semula, jika masih terdapat lagi ‘mahal’ yang perlu dibuang, delete dan re-run analisis. Nilai ‘mahal’ yang ketara memang perlu dibuang.
Nilai Beta perlu dilihat pada Standardised Coefficients.
Perbincangan: Method-Enter akan memberikan nilai output standard yang menggabungkan kesemua pemboleh ubah kajian. Nilai bagi setiap IV yang dikaji tidak ditunjukkan satu per satu. Dalam contoh nota ini, nilai ‘total satisfaction’ sahaja ditunjukkan, nilai IV seperti self-esteem, motivation, dan optimism tidak ditunjukkan. Perlu berhati-hati memilih method. Bagi hierarki-stepwise, pengkaji perlu melihat dahulu teori yang digunakan. IV yang manakah yang perlu dimasukkan dahulu dan yang manakah dimasukkan kemudian