Lecture 10

Data Analysis II: Inferential Statistical Analysis

Summary Notes

Analysis of research data

In the "wheel" of the research process (see link), analyzing the research data comes as stage number four following the data collection stage. Depending upon the type of data collected, an appropriate technique of analysis is used. The first two lecture (numbers 9 and 10) focus on quantitative analysis of the data, while the third lecture (number 11) focuses on qualitative analysis of data.

Inferential statistics

Inferential statistics permits generalization from samples to populations. It has to do with hypothesis testing. Sampling error and sample bias are the two main problems that faces the challenge of generalizing from samples to populations. 

  1. Sampling error involves the role of chance in deciding on the findings or results, and it results from taking small samples from large populations. 

  2. Sample bias involves samples that do not accurately represent the population. 

Inferential statistics deals with the sampling error and NOT with the sample bias. They are used to estimate the role of chance in the findings, such as when testing for the reliability of differences between groups and deciding whether the difference is a significant finding or a result of chance only. The other way to check that chance had nothing to do with the findings, is to repeat the study several times, which can become awkward, time-consuming and impractical. 

The Logic of Testing Hypotheses

In your research project, you are usually interested in testing some sort of a hypothesis. You try to confirm the research hypothesis (sometimes called working hypothesis), which is referred to in statistics as the alternative hypothesis (referring to an alternative to the null hypothesis). Confirming the alternative hypothesis (or the research hypothesis itself) is not done directly. Instead, confirming the alternative hypothesis is done by testing (accepting or rejecting) the null hypothesis

The alternative hypothesis is stated in positive terms such as that there will be a difference between groups or that the independent variable will have a significant effect on the dependent variable. 

To confirm the existence of a significant relationship between variables or a significant difference between groups, you have to confidently REJECT the null hypothesis, in other words, to say that the difference or significance is NOT due to chance. If the null hypothesis is accepted, that means that there is NO difference between groups or NO effect of the independent variable over the dependent variable, the reason for any observed difference will be attributed solely to chance. Check diagram for testing the null hypothesis.

The researcher usually decides on the level of significance he or she is willing to accept or reject the null hypothesis. In social sciences .05 or .01 level of significance is often used. This is referred to as p<.05 or p<.01. The numbers indicate a probability of 5 in 100 or 1 in 100 respectively. If the odds (likelihood) that the difference in measurements that is due to chance is less than 5 in 100 or 1 in 100 respectively, then one would decide that the odds are small hence reject the null hypothesis (i.e. there is a significant finding and it is not merely a result of chance). If the difference is likely due to chance more than 5 times in a 100 or 1 time in a 100 respectively, then one would have no confidence in the findings therefore the null hypothesis is accepted indicating that the difference is probably due to chance and not due to the independent variable. Testing the null hypothesis is done by performing a suitable inferential statistical test on the data.

In other words, p<.05 for example means that only 5% error will be acceptable, hence the results must be 95% true. Reporting the results is done by stating that the finding is "statistically significant" at p<.01 for example. This means that the finding is NOT due to chance, i.e. the null hypothesis is rejected and the research hypothesis is accepted. A finding is statistically significant has nothing to do with it being practically important. It simply means that statistics show that chance was not a factor (high probability that it was not a factor) in determining the finding.

Be careful not to fall into TYPE I ERROR or TYPE II ERROR while testing hypotheses CHECK LINK.

Inferential statistical tests

The researcher has to decide on a suitable statistical test to check for the significance of the findings. The selection of the most suitable test depends upon several factors mainly the type of data, the distribution of data, and the nature of the research question or hypothesis.

Some of the common tests are mentioned as follows:

For continuous data the two most common tests are the t-test and the analysis of variance ANOVA. The t-test compares two groups, ANOVA compares differences among two or more groups. They both consider the overlap in the distribution of data in order to determine the significance of the difference between means. Both tests require data to be normally distributed.

For categorical data, the most common test is the Chi-square X2 (pronounced ki). It tests whether the pattern of the actual results (counts in categories) differs reliably from what would be expected by chance. 

Correlation

Correlation is the measure of association between variables. It could be used to make predictions of one variable from another variable. 

The correlation coefficient is the statistic used to report whether two variables are correlated or not and to what extent and in which direction. The correlation coefficient ranges from +1 to -1, where zero means no correlation and 1 means strong (perfect) correlation. The correlation could be a positive correlation indicating a positive relationship between variables in that the increase in one will follow by an increase in the other. A negative correlation indicating a negative (inverse) relationship between variables in that the increase in one will follow by a decrease in the other.

A scatter plot is a graphical way of showing the correlation between two variables. The horizontal axis represents one variable and the vertical represents the other variable.

Correlation does not imply causality. If two variables are correlated it does not necessarily mean that one causes the other. The cause of the relationship could be another variable.

The most common correlation measure is the Pearson Product-Moment Coefficient (r) which is used for continuous data and when the scores are normally distributed. Spearman Rank-order Coefficient (rs) is another correlation measure is used when scores are not normally distributed or when data is ordinal or ranked. 

Back to top

Topics for Discussion

The following items are topics for discussion during this lecture. Students should prepare their thoughts and ideas around these topics.

 

Back to top

Required Readings

  1. Sommer, B. & Sommer, R. (2002). A Practical Guide to Behavioral Research: Tools and Techniques, 5th ed. Oxford: Oxford University Press. (Chapter 19, pp.261-289, Inferential Statistics).

Back to top

Favorite Links

Back to top

Student Work

Back to top

Assignments

Back to top

References

Back to top

Additional Links

Back to top | Previous lecture | Next lecture

Copyright © 2000 by Hisham S. Gabr. All rights reserved.

This page last revised: 09/05/03