Imperial Mitochondriacs: Proportionality: A Valid Alternative to Correlation for Relative Data

Wednesday, 25 March 2015

Proportionality: A Valid Alternative to Correlation for Relative Data

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004075

David Lovell, Vera Pawlowsky-Glahn, Juan José Egozcue, Samuel Marguerat, Jürg Bähler

When trying to understand when a quantity covaries with another, a standard set of tools which come to mind are correlation coefficients. But consider three statistically independent variables: X, Y and Z which have no correlation. Plotting a large number of samples from X vs Y, Y vs Z or X vs Z will indeed give small correlation coefficients. However, the quantities X/Z and Y/Z must be correlated due to their common divisor, which can mislead us in believing that X is correlated with Y, which we know is untrue (this is clearly shown in Fig. 1A). Thus, if we are interested in relationships between X and Y, searching for correlation between X/Z and Y/Z can be misleading. It should be noted that this is only a concern when Z is a random variable, with a large enough variance. If Z is a constant number across experiments, then our intuition for correlation coefficients is restored. The lesson is 'correlation between relative abundances is meaningless, if we are using different normalisations for each condition'.

This statistical trap is easy to overlook, as it is commonplace to search for correlations in quantities which are normalised (say mRNA of gene 1/total mRNA, i.e. X/Z). The authors highlight that if X/Z and Y/Z are proportional across each sample, then X must be proportional to Y. They therefore suggest a 'goodness-of-fit to proportionality' as a more appropriate statistic when searching for covariation in relative abundances. This is defined as ϕ = var(log(A/B))/var(log A), where A = X/Z and B = Y/Z. ϕ is zero when A and B are perfectly proportional.

------------------------------------

Update: For enthusiasts!

Let's use some Monte Carlo to test this out! Using the notation Unif(p,q) as a uniform distribution with p as the minimum and q as the maximum. I have generated draws from three uniform random variables: X ~ Unif(1,2), Y ~ Unif(4,8), Z ~ Unif(5, 300). We see that none of the variables correlate with each other. However, when we create new variables A = X/Z and B = Y/Z, we see a striking correlation (0.95). So one cannot claim that X is correlated with Y, just because A is correlated with B.

This is a particularly pathogenic example, since Z has a huge variance. This simulation yielded ϕ=0.11. Check out the comments on this post to see some back-and-forth between David and I on this.

6 comments:

Unknown1 April 2015 at 06:59
Nice simulation Juvid.
Actually, ϕ is a bit different to var(log(A/B)) for precisely the reason you point out (that it does not have a meaningful scale)
If you go to the end of the section "Measuring Proportionality" you will see that
ϕ(log x, log y) = var(log(x/y))/var(log x).
(...strictly speaking, you should use clr() instead of log(), but I don't think that'll make a big difference here).
I'd love to know what you get for ϕ now. Cheers, David
ReplyDelete
Replies
Juvid Aryaman21 April 2015 at 11:45
Brilliant! Thank you for getting back to us. I've updated the post, and hope you find it accurate. I think the conversation above really underscores the subtleties in thinking about this area. Please do get in touch with any further thoughts you may have! Juvid
ReplyDelete
Replies
Unknown25 April 2015 at 06:14
Well, I took a leaf out of your book Juvid and did some simulations to explore the impact of variation in Z on ϕ(log x, log y) and on ϕ(clr x,clr y)

...and I realised that the clr() tranformation is very important because it makes ϕ independent of variation in Z. So, I think I have to revise some of my suspicions in previous posts!

I have a 2-page PDF describing this additional exploration... is there a way I could share that with readers of your blog??
ReplyDelete
Replies

Add comment

Note: only a member of this blog may post a comment.