statistical correlation testing

A Beginners Guide to Correlation

We will answer the questions:

Under what circumstances is a correlation needed?
What is the logic of statistically 'testing' a correlation?
How is the statistical correlation test computed and interpreted?
What do we ultimately learn from a statistical correlation test?

Under what circumstances is a correlation needed?

Very broadly, correlations tell us when two variables change together.

A typical example would be height and weight, which increase or decrease together

More specifically, correlations characterize a specific type of simultaneous change in two variables

measures the extent to which plots of those changes fall on a straight line (Fig. 1)

There are three ways the plot of two variables can form a straight line

The line is horizontal (or vertical), indicating zero correlation (lefthand plots)
The line is angled counter-clockwise from horizontal, indicating a positive correlation (middle)
The line is angled clockwise from horizontal, indicating a negative correlation (righthand plots)

A positive correlation tells us that when one variable increases the other also increases
A negative correlation tells us that when one variable increases, the other decreases (and vice versa)
A high correlation indicates that data conform closely to the straight line
A low correlation indicates the data conform loosely to the straight line

The correlation is computed from data by the equation:

which tells us the amount (or 'intensity') of correlation in the raw data.

Note, however, that it is essentially impossible (the probability is zero) that any correlation computed from raw data will be zero. In other words, any real dataset will have some correlation (positive or negative).
Thus, when we attempt to determine whether there is a straight-line relationship between two variables (X an Y in the Fig. 1), we are not asking whether there is any correlation in the raw data, but rather:
- we are asking whether the amount of correlation (corresponding to the absolute value of the r-value in Eq 1) is sufficient to make us believe that the data come from a real straight-line relationship (positive or negative)
  - and, importantly, that any computed correlation value (r) is not likely to be due to the noisy data observed from a pair of X-Y variables that are actually uncorrelated
Within classical statistics, this question (asking, 'is there a relationship between changes in the two underlying variables?') is answered by computing not the correlation itself (r-value), but rather the statistical test of that correlation (p-value)

When that statistical test is significant, it is the first step to inferring the presence of a causal network underlying the relationship among variables such as X and Y

Note that the presence of an isolated correlation does not imply a particular causal relationship, or even that any causal network exists
- a correlation must be observed consistently to be believed
However, the presence of a consistently-observed correlation between two variables does indeed suggest the presence of a a causal network
- a causal network is not the same as a simple causal relation between two variables

What is the logic of statistically 'testing' a correlation?

The basic logic of the statistical correlation test is based on the fact that, even when there is no linear relationship between the two variables of interest (X and Y), a correlation computed from raw data will nevertheless not be zero. As with all statistical testing you therefore are attempting to determine if the measured value of the statistic, the correlation computed from the data, is far enough from zero for you to conclude that there is in fact a linear relationship between X and Y.

The statistical test of the correlation parallels the logic of all statistical hypothesis tests

In brief: We assume there is zero correlation between X and Y variables, and then compute sampling probabilities based on this assumption, to see if the probability of x-y datasets at least as extreme as the observed dataset is less than a pre-set criterion.

Prior to data collection, we define:
- The null hypothesis [which posits zero correlation between X and Y variables]
- The -criterion [which is usually set to = 0.05 or 0.01]
We then compute the sampling distribution for the correlation statistic, assuming that the null hypothesis is correct
- These sampling probabilities tell us how frequently we can expect to observe data correlations (r values) of various sizes when there is ZERO underlying linear relationship
- The alpha-criterion can be seen graphically as the tail-area of the sampling distribution whose probability mass is (the area under the curve in Fig. 2 between |r| = 0.43, the observed data correlation, and |r| = 1)
From the sampling distribution, compute the p-value
- This is the area under the curve between |r| = rdat and |r| = 1 (in Fig. 2, the grey-shaded region in the tails of the sampling distribution)
If the p-value is less than the criterion (as it is in Fig. 2), you reject the null hypothesis, and assume there is indeed a linear relationship connecting the X and Y variables.

How is the statistical correlation test computed and interpreted?

To determine if the value of the observed data correlation is sufficiently far from zero, you:

Assume that the null hypothesis, H0, is correct
From this assumption, compute the sampling distribution of the r-statistic (Fig. 2)
Compute the p-value associated with the observed r-statistic
If this p-value is below your pre-set criterion, , reject H0

The first three steps of this procedure get compressed down to a single line of computer code that returns both the data correlation, r, and the associated p-value. You then simply compare the p-value returned by the computer code with your pre-set criterion.

Let's do a worked example to get a feel for the in-practice procedure:

First, note that you would normally have a dataset corresponding to experimental observations, d, taken at various levels of another variable, x. We will create such a dataset by typing:

x=[-10:10]'; d=[-10:10]'+9*randn(21,1);

Then, we obtain r and p-values by typing:

[r p]=corr(x,d)

That's really all there is to it, although I have provided more details on the computations involved in completing this hypothesis test here.

The plot in Fig. 2 makes it clear that the r-statistic obtained from our simulated data, r = 0.53 is greater than the criterion correlation value, rcrit = 0.43, derived from and therefore these data yield a ‘statistically significant’ hypothesis test

In other words, we reject H0
- rdat is further from the predictions of H0 than the threshold value rcrit defined by .

What do we ultimately learn from a statistical correlation test?

There is some subtlety in the transition between measuring the size of the data correlation and and the determination that there is a statistically significant difference between the data and the predictions of H0.

In particular, measurements and a Hypothesis tests are distinct methods with quite different computations, so it is important to see to what extent the correlation test constitutes an hypothesis test and not simply a measurement.

The key to understanding the difference is to look at the computations being performed
- A measurement is based on a single probability distribution that tells us the most likely values of the underlying signal,
- A hypothesis test (model comparison) is based on the likelihood of the hypothesis being correct, which requires that the likelihoods (or probabilities) of competing hypotheses be computed and compared.
In both cases, the dataset (D) is the important information that provides evidence for the conclusion
- In the case of a measurement, that computation uses the data to compute the likelihood over underlying correlation values
  - This is very similar to the procedure used to compute a sampling distribution, the basis for the r-test, in that both rely on a single model of the data [here, y = const + noise] for their computations.
- In the case of hypothesis testing, that computation uses the data to compute the likelihood function over possible hypotheses, where each hypothesis posits a different model of the data [e.g., y = const + noise vs. y = 1(x) + noise, vs. y = 2(x) + noise,etc., where values are constant but unknown]
Reliance on sampling distributions introduces a number of weaknesses in the statistical test, including:
- lack of an ability to differentiate among competing alternative hypotheses
- no ability for experiments to provide evidence favoring any hypothesis (null or otherwise)

Eq. 1

Screen Shot 2021-03-26 at 7.19.31 PM.png

Fig. 2

Fig. 1