Association: relations between characteristics

Finally, research hypotheses may address the relation between two or more variables. Relations between variables are at stake if the research hypothesis states or implies that one (type of) characteristic is related to another (type of) characteristic. The statistical name for a relation between variables is association.

Take, for example, an analysis of the effect of a celebrity endorser on the willingness to donate. Here, the endorser to whom a person is exposed (one characteristic) is related to this person’s willingness to donate (another characteristic). Another example: If exposure to the campaign increases willingness to donate, a person’s willingness to donate is positively related to this person’s exposure to the campaign.

Score level differences

Association comes in two related flavors: a difference in score level between groups or the predominance of particular combinations of scores on different variables.

The relation between the endorser’s identity and willingness to donate is an example of the first flavor. All people are confronted with one of the celebrities as endorser of the fund-raising campaign. This is captured by a categorical variable: the endorsing celebrity.

The categorical variable clusters people into groups: One group is confronted with Celebrity A, another group with Celebrity B, and so on. If the celebrity matters to the willingness to donate, the general level of donation willingness should be higher in the group exposed to one celebrity than in the group exposed to another celebrity.

Thus, we return to statistics needed to test research hypotheses about score levels, namely measures of central tendency. If willingness to donate is a numeric variable, we can use group means to test the association between endorsing celebrity (grouping variable) and willingness to donate (score variable). The statistical hypothesis would then be that group means are not equal in the population of all people.

If you closely inspect the choice diagram in Figure 4.18, you will see that we prefer to use a t distribution if we compare two different groups (independent-samples t test) or two repeated observations for the same group (paired-samples t test). By contrast, if we have three or more groups, we use analysis of variance with an F distribution.

Comparing means in SPSS

Instructions

Figure 9.22: Independent samples t test on two means.


Figure 9.23: Paired samples t test on two means.


For an instruction and exercises on one-way analysis of variance, see Figure 5.8 and Section 5.2. For two-way analysis of variance, see Section 5.6 (instructions and exercises).

Exercises

  1. Is willingness to donate at the end of the campaign higher for those who remember the campaign than for those who do not remember it? Use the data in donors.sav.
  1. Did willingness to donate increase in the population between the start and the end of the campaign?

Answers

Answer to Exercise 3.

SPSS syntax:

* Check data.
FREQUENCIES VARIABLES=willing_post remember
/ORDER=ANALYSIS.
* Independent-samples t test.
T-TEST GROUPS=remember(0 1)
/MISSING=ANALYSIS
/VARIABLES=willing_post
/CRITERIA=CI(.95).

Check data:

There are no impossible values on the variables.

Check assumptions:

Sample sizes (N = 66 and N = 77) are of sufficient size not to worry about the shape of the distribution of willingness in the population.

Interpret the results:

Willingness to donate at the end of the campaign is significantly higher for those who remember the campaign (M = 4.94, SD = 1.60) than for those who do not remember it (M = 4.24, SD = 1.65), t (141) = -2.57, p = .011, 95%CI[-1.24, -0.16].
Willingness is 0.16 to 1.24 points higher on a 10-point scale for those who remember the campaign. Considering the range of the scale, this is quite a small difference.

Answer to Exercise 4.

SPSS syntax:

* Check data.
FREQUENCIES VARIABLES=willing_post willing_pre
/ORDER=ANALYSIS.
* Paired-samples t test.
T-TEST PAIRS=willing_pre WITH willing_post (PAIRED)
/CRITERIA=CI(.9500)
/MISSING=ANALYSIS.

Check data:

There are no impossible values on the two variables.

Check assumptions:

Sample size is sufficiently large (N = 143).

Interpret the results:

There is a small but statistically significant increase in the willingness to donate over the duration of the experiment (from M = 4.49, SD = 1.65 to M = 4.62, SD = 1.66), t (142) = 5.74, p < .001, 95%CI[0.08; 0.17].

Average willingness to donate is 0.08 to 0.17 units higher at the end of the experiment than at the start. This is a very small difference on a 10-point scale.

Combinations of scores

The other flavor of association represents situations in which some combinations of scores on different variables are much more common than other combinations of scores.

Think of the hypothesis that brand awareness is related to exposure to advertisements for that brand. If the hypothesis is true, people with high exposure and high brand awareness should occur much more often than people with high exposure and low brand awareness or low exposure and high brand awareness.

The two variables here are exposure and brand awareness. One combination of scores on the two variables is high exposure combined with high brand awareness. This combination should be more common than high exposure combined with low brand awareness.

Measures of association are statistics that put a number to the pattern in combinations of scores. The exact statistic that we use depends on the measurement level of the variables. For numerical variables, measured at the interval or ratio level, we use Pearson’s correlation coefficient or the regression coefficient. For ordinal variables with quite a lot of different scores, we use Spearman’s rank correlation.

For categorical variables, measured at the nominal or ordinal level, chi-squared indicates whether variables are statistically associated. The larger chi-squared, the more likely we are to conclude that the variables are associated in the population. If variables are not associated, they are said to be statistically independent.

Several measures exist that express the strength of the association between two categorical variables. We use Phi and Cramer’s V (two nominal variables, symmetric association), Goodman & Kruskals tau (two nominal variables, asymmetric association), Kendalls tau-b (two categorical ordinal variables, symmetric association), and Somers’ d (two categorical ordinal variables, asymmetric association).

Testing associations in SPSS

Instructions

Figure 9.24: Test on a correlation.


Figure 9.25: Chi-squared test on a contingency table (crosstab).


For regression analysis (instructions and exercises), see Section 6.2.

Exercises

  1. In the population of all consumers, is brand awareness linked to exposure to advertisements for the brand? Use consumers.sav to answer this question.
  1. How well can we predict brand awareness with ad exposure?
  1. Does word of mouth involve women rather than men? Interpret the contents, strength, and statistical significance of the association.

Answers

Answer to Exercise 5.

SPSS syntax:

* Check data.
FREQUENCIES VARIABLES=ad_expo brand_aw
/ORDER=ANALYSIS.
* Check if the association can be linear.
GRAPH
/SCATTERPLOT(BIVAR)=ad_expo WITH brand_aw
/MISSING=LISTWISE.
* Correlations.
CORRELATIONS
/VARIABLES=ad_expo brand_aw
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
NONPAR CORR
/VARIABLES=ad_expo brand_aw
/PRINT=SPEARMAN TWOTAIL NOSIG
/MISSING=PAIRWISE.

Check data:

There are no impossible values that must be changed
into missing values.

Check assumptions:

Can the association be linear? If we check a scatterplot of the two variables, the points do not clearly display a curved shape. But there is one point that may distort a linear association because it is far away from the other points, namely a consumer with an exposure score near one. If the rank correlation is substantially higher than the Pearson correlation, this single observation may be responsible.

Interpret the results:

Brand awareness is statistically significantly associated with exposure to advertisements for the brand, r = .46, p < .001. More exposure tends to go together with more brand awareness.

Answer to Exercise 6.

SPSS syntax:

* Check data.
FREQUENCIES VARIABLES=ad_expo brand_aw
/ORDER=ANALYSIS.
* Simple regression.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI(95) R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT brand_aw
/METHOD=ENTER ad_expo.

Check data:

There are no impossible values.

Check assumptions:

  • We need at least twenty observations (cases) for each predictor in the regression model. Our model contains only one predictor, so the 62 cases in our data set suffice for using the theoretical approximation (F and t distribution) here.
  • Other assumptions will be explained in the chapter on moderation with regression analysis, so let us not pay attention to the assumptions yet.

Interpret the results:

Ad exposure predicts about one fifth of the variation in brand awareness scores, R2 = .21, F (1, 60) = 15.87, p < .001.
The predictive effect of exposure to brand advertisements is moderately strong (b* = 0.46). An additional unit of exposure increases the predicted brand awareness with 0.4 points, t = 3.98, p < .001, 95%[0.22; 0.67].

Answer to Exercise 7.

SPSS syntax:

* Contingency table with chi-squared test and measure of association.
CROSSTABS
/TABLES=wom BY gender
/FORMAT=AVALUE TABLES
/STATISTICS=CHISQ PHI LAMBDA
/CELLS=COUNT COLUMN
/COUNT ROUND CELL
/BARCHART.

Check data:

There are no impossible values on the two categorical
variables.

Check assumptions:

This is a 2x2 contingency table so we have to use (Fisher)
exact test. This test makes no assumptions.

Interpret the results:

There is no statistically significant association between word of mouth and sex, p = .161 (Fisher exact), Goodman & Kruskal tau = .05. We cannot confidently conclude that either females or males experience word of mouth more frequently.

Note that a one-sided test is possible too. The p value of a one-sided exact test is .080 (one-sided) here.