Kuliah GB6323 Minggu ke-8 (Multipe Regression)

Kuliah disampaikan oleh Prof.Madya Dr. Muhammad Hussin.


3 syarat utama:

1. Jelas (antara variabel). Cth.:Pengaruh keluarga terhadap pendidikan.

  • Siapa keluarga (ibu, bapa, kakak, abang…dsb). Perlu didefinisikan dengan jelas.
  • Pendidikan? Apa yang ingin diukur. Adakah dari aspek pencapaian? Komunikasi? Akhlak? Perjelaskan berasaskan teori/model!

2. Data Normal.

  • Regresi hanya  boleh dianalisis bagi data yang normal sahaja. Data yang tidak normal bermakna sampel tidak representatif dan tidak boleh digeneralisasikan terhadap populasi.
  • Bagi mendapatkan normality data, persampelan perlulah dibuat secara rawak.
  • Skewness..menguji sama ada data berada pada jajaran normal. Pastikan nilai skewness berada dalam julat ± 1.
  • Kurtosis…flatness. Data yang flat bermaksud tidak normal. Pastikan nilai kurtosis berada dalam julat ± 1.

3. Linearity

  • Mesti linear.
  • Jika tidak, sukar untuk mengukur sumbangan.
  • Jika tidak linear, kemungkinan: 1) item tidak bagus, 2) sampling silap, 3) item tidak di compute daripada negatif(jika ada) kepada positif.
  • Jika korelasi tinggi, maka dikatakan multicollinearity

Semak Skewness/Curtosis: Analyze>Descriptive Statistics>Option (Distribution-skewness&curtosis)

Semak Taburan (normal): Analyze>Descriptive statistics>Frequencies (Charts-histogram+with normal curve).

Semak Linearility: Graph>scatterplots

***lazimnya, sumbangan di antara julat 20-40%


Multiple regression is not just one technique but a family of techniques that can be used to explore the relationship between one continuous dependent variable and a number of independent variables or predictors (usually continuous). Multiple regression is based on correlation, but allows a more sophisticated exploration of the interrelationship among a set of variables. This makes it ideal for the investigation of more complex real-life, rather than laboratory-based, research questions. However, you cannot just throw variables into a multiple regression and hope that, magically, answers will appear. You should have a sound theoretical or conceptual reason for the analysis and, in particular, the order of variables entering the equation. Don’t use multiple regression as a fi shing expedition. There are a number of different types of multiple regression analyses that you can use, depending on the nature of the question you wish to address. The three main types of multiple regression analyses are standard or simultaneous multiple regression, hierarchical or sequential multiple regression, and stepwise multiple regression.

Stepwise multiple regression

In stepwise regression, the researcher provides a list of independent variables and then allows the program to select which variables it will enter and in which order they go into the equation, based on a set of statistical criteria.

Assumptions Of Multiple Regression

Multiple regression is one of the fussier of the statistical techniques. It makes a number of assumptions about the data.

Sample size

The issue at stake here is generalisability. That is, with small samples you may obtain a result that does not generalise (cannot be repeated) with other samples. If your results do not generalise to other samples, they are of little scientifi c value. So how many cases or participants do you need? Different authors tend to give different guidelines concerning the number of cases required for multiple regression. Stevens (1996, p. 72) recommends that ‘for social science research, about 15 participants per predictor are needed for a reliable equation’. Tabachnick and Fidell (2007, p. 123) give a formula for calculating sample size requirements, taking into account the number of independent variables that you wish to use: N > 50 + 8m (where m = number of independent variables). If you have fi ve independent variables, you will need 90 cases. More cases are needed if the dependent variable is skewed. For stepwise regression, there should be a ratio of 40 cases for every independent variable.

Multicollinearity and singularity

This refers to the relationship among the independent variables. Multicollinearity exists when the independent variables are highly correlated (r=.9 and above). Singularity occurs when one independent variable is actually a combination of other independent variables (e.g. when both subscale scores and the total score of a scale are included). Multiple regression doesn’t like multicollinearity or singularity and these certainly don’t contribute to a good regression model, so always check for these problems before you start.


  • Multiple regression is very sensitive to outliers (very high or very low scores).
  • Checking for extreme scores should be part of the initial data screening process. You should do this for all the variables, both dependent and independent, that you will be using in your regression analysis.

Normality, linearity

These assumptions can be checked from the residuals scatterplots which are generated as part of the multiple regression procedure. Residuals are the differences between the obtained and the predicted dependent variable (DV) scores. The residuals scatterplots allow you to check:

  • normality: the residuals should be normally distributed about the predicted DV scores
  • • linearity: the residuals should have a straight-line relationship with predicted DV scores

Example of research questions:

  1. How well do the two measures of control (self-esteem, motivation) predict life satisfaction? How much variance in life satisfaction scores can be explained by scores on these two scales?
  2. Which is the best predictor of life satisfaction: control of external events (self-esteem) or control of internal states (motivation)?

What you need:

  • one continuous dependent variable (Total satisfaction)
  • two or more continuous independent variables (self-esteem, motivation). (You can also use dichotomous independent variables, e.g. males=1, females=2.)

What it does: Multiple regression tells you how much of the variance in your dependent variable can be explained by your independent variables. It also gives you an indication of the relative contribution of each independent variable. Tests allow you to determine the statistical signifi cance of the results, in terms of both the model itself and the individual independent variables.

Procedure for standard multiple regression

  1. From the menu at the top of the screen, click on Analyze, then select Regression, then Linear.
  2. Click on your continuous dependent variable (e.g. Total satisfaction: tsatisfaction) and move it into the Dependent box.
  3. Click on your independent variables (Total self-esteem: tslfestm; Total motivation: tpmotivation) and click on the arrow to move them into the Independent box.
  4. For Method, make sure Enter is selected. (This will give you standard multiple regression.)
  5. Click on the Statistics button.
  • Select the following: Estimates, Confi dence Intervals, Model fit, Descriptives, Part and partial correlations and Collinearity diagnostics.
  • In the Residuals section, select Casewise diagnostics and Outliers outside 3 standard deviations. Click on Continue.

6. Click on the Options button. In the Missing Values section, select Exclude cases pairwise. Click on Continue.

7. Click on the Plots button.

  • Click on *ZRESID and the arrow button to move this into the Y box.
  • Click on *ZPRED and the arrow button to move this into the X box.
  • In the section headed Standardized Residual Plots, tick the Normal probability plot option. Click on Continue.

8. Click on the Save button.

  • In the section labelled Distances, select Mahalanobis box and Cook’s.
  • Click on Continue and then OK.

Contoh pelaporan dalam Pallant (2011):

Hierarchical multiple regression was used to assess the ability of two control measures (Mastery Scale, Perceived Control of Internal States Scale: PCOISS) to predict levels of stress (Perceived Stress Scale), after controlling for the infl uence of social desirability and age. Preliminary analyses were conducted to ensure no violation of the assumptions of normality, linearity, multicollinearity and homoscedasticity. Age and social desirability were entered at Step 1, explaining 6% of the variance in perceived stress. After entry of the Mastery Scale and PCOISS Scale at Step 2 the total variance explained by the model as a whole was 47.4%, F (4, 421) = 94.78, p < .001. The two control measures explained an additional 42% of the variance in stress, after controlling for age and socially desirable responding, R squared change = .42, F change (2, 421) = 166.87, p < .001. In the fi nal model, only the two control measures were statistically signifi cant, with the Mastery Scale recording a higher beta value (beta = –.44, p < .001) than the PCOISS Scale (beta = –.33, p < .001).


Rujukan lanjut, sila baca Pallant (2011) muka surat 143-167.

One thought on “Kuliah GB6323 Minggu ke-8 (Multipe Regression)

  1. doing dissertation research using hierarchial faced with concerns of the number of cases/participants

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s