I’ve resisted the temptation to write about this, but I need to get something out. None of the push-back on the lab-leak theory is acknowledging how unusual it is for a new virus to appear in the only city in China with a BSL4 lab. Of all the large cities in China – all with wet markets I assume – how weird is it that the virus first appears in the one big city with a lab?
How weird? Well we can try to quantify how weird by looking at Bayes’ theorem.
I blogged about Bayes’ theorem last year here. Have a read about the logic of it there and also look at the other pages I link to for its derivations.
Now, let’s try a thought experiment. We are at the start of 2020. You tell me: “There’s a new coronavirus that’s appeared in a big city in China”. (By the way, let’s define “big city” as over 10m people – Wuhan has 11m). You ask me: “What’s the chance that that virus has escaped from a lab, given that there is only one such lab that deals with coronaviruses in China?”
That’s all you initially tell me. You don’t tell me that the virus has appeared in Wuhan, which has the lab. You just say, it’s appeared in a big city in China.
Well, with that much information, what’s the chance it leaked from a lab? What’s my “prior”?
Let’s call this “event A” (my prior). Event A is: “the coronavirus we have just seen in a large Chinese city is from a leak from the only lab in China with coronaviruses”. If that’s all the info you give me, I would say the probability that it’s from a lab leak is 2%. A 1 in 50 chance.
Why as high as 1 in 50? Well, if you’ve been keeping up with the media recently you would know that lab leaks are not rare. The SARS1 virus leaked a few times from labs in China and elsewhere over the years since its initial naturally caused outbreak. And the Brits had a bad foot and mouth outbreak in recent years that came from a lab leak. Some people argue that the 1977 flu, that looked identical to the 1950 flu (and was a descendant from the 1918 flu) was a lab leak in Russia. But there are other theories about that too. Anyway, my point is, lab leaks are not unheard of, so a 1 in 50 chance is reasonable.
Now. You then tell me a new piece of information. In Bayes’ language, that’s a new event. Let’s call it “event B”. You tell me: “the city that has the outbreak is Wuhan, which has the only lab in China dealing with coronaviruses”. Event B is therefore: “The city where the outbreak occurred is the city with the only lab”.
What does that new piece of information – that new “event” – do to my thinking? Well, I now have to “update my prior”. Technically I have to work out P(A|B). That is, “what is the probability that the virus is from a lab leak, given that the virus has appeared in the city with the only lab in China it can leak from?”
Well, (and I’m just going to put the maths out there, you can check my formula elsewhere), Bayes’ formula tells me how to update my prior. It says:
New forecast for A (my updated prior) =
P(A|B) = P(B|A) * P(A) / [(P(B|A) * P(A)) + (P(B|~A) * P(~A))]
So let’s unpack these terms:
- “Event B|A” is the event that I observe a virus in a big city with the only lab that does coronaviruses under the condition that the virus has leaked from that lab. Well, what are the chances that if the virus has leaked from the lab that I will see it in the city in which the lab resides? I’d say 100%. So P(B|A) = 100%.
- P(A) is my prior – the probability I first judged of the virus I see in a city being a leak from the lab. We’ve set that at 2%.
- Event “B|~A” is interesting. It’s “I observe a virus in a big city with the lab, but the virus didn’t leak from the lab.” How would I estimate the probability of B|A. Well, there are 11 cities in China over 10m people. One of those cities got the virus. What are the chances that it was the city with the lab – given there was no lab leak – I’d say 1 in 11 (I assume they are all equally likely – they all have wet markets etc) – or 9%. So P(B|~A) = 9%
- What’s the probability of ~A? Well I initially assumed that the probability of the observed virus being from a lab leak was 2%, so P(~A) is 98%.
Plug all these values for the terms above into the formula and you get an updated prior of 18.33%.
And this is my point. How weird is it that it shows up in Wuhan if it wasn’t a lab leak? Weird enough to increase my prior from 1 in 50 to almost 1 in 5.
Let me spell this out, because I don’t think the media quite gets just what a coincidence this is. Pushing my conservative prior up by a factor of 10 is pretty big.
That is, if you were to tell me that a new coronavirus has just appeared in a large city in China. And then you told me there’s only one lab in China actively studying coronaviruses. And then asked me to estimate the chance that the virus arose from a lab leak, I would say 2%.
But then, you tell me that the city where that virus has appeared is the city with that unique coronavirus lab, I would have to update my prior. And I would update it to 18%.
From a 1 in 50 chance to a 1 in 5 chance.
So – this is not a 50/50 chance, but the magnitude of the change is huge. And it should underscore what a big implicit assumption is being made by virologists who keep saying it has to be a natural origin.
Of course if you have a different prior to 1 in 50, you get a different updated result:
- If your “prior” on the lab leak is 1 in 100, then your updated prior, when you find out the city is Wuhan should go to 10%. Still a huge move – and still a probability that you shouldn’t casually dismiss.
- If your “prior” was more cynical – say a 1 in 20 chance that it’s from a lab leak (5%), then your update, as soon as you find out it’s Wuhan, goes to 37% – more than a 1 in 3 chance.
Hopefully this post shows why I think the conversation in the media, and even amongst virologists, has lacked a certain nuance. It really is very weird that Wuhan is where the virus first shows up if it wasn’t caused by a lab leak. I’m not saying it is a lab leak. I’m saying that casual dismissals too often fail to account for how weird it is.